Abstract: YOLOv10, a state-of-the-art object detection model, has revolutionized the field with its exceptional accuracy and real-time performance. This tutorial presents a comprehensive guide to training YOLOv10 using PaddleYOLO, a powerful deep learning framework, and Docker, a popular containerization platform.

By leveraging the capabilities of PaddleYOLO and Docker, users can efficiently train and deploy YOLOv10 models for a wide range of applications, including real-time object detection. This guide provides step-by-step instructions, code examples, and best practices to help users achieve optimal results.

Keywords: YOLOv10, PaddleYOLO, Docker, Real-Time Object Detection, Deep Learning.

1. Introduction

YOLOv10 is a state-of-the-art object detection model that has significantly advanced the field. As part of the You Only Look Once (YOLO) family of algorithms (see our previous article titled “Leveraging the Power of Docker for YOLOv8 with MMYOLO: A Step-by-Step Guide“, YOLOv10 builds upon the strengths of its predecessors while introducing innovative techniques to further enhance performance.

1.1 Key features of YOLOv10

1.1.1 Improved Backbone Architecture

YOLOv10 often employs a more efficient and powerful backbone network, such as a variant of ResNet or EfficientNet. This backbone extracts richer feature representations, leading to improved detection accuracy.

1.1.2 Enhanced Feature Pyramid Network (FPN)

YOLOv10 typically incorporates a refined FPN to effectively fuse features from different levels of the network. This allows the model to detect objects of various sizes more accurately.

1.1.3 Advanced Loss Function

YOLOv10 often utilizes a modified loss function that balances objectness, classification, and localization losses more effectively. This helps the model converge faster and achieve better performance.

1.1.4 Anchor Box Refinements

YOLOv10 might introduce adjustments to the anchor box sizes and aspect ratios to better match the characteristics of specific object classes.

2. PaddlePaddle and PaddleYOLO

2.1. PaddlePaddle

PaddlePaddle is a flexible and scalable deep learning platform developed by Baidu. It provides a comprehensive set of tools and APIs for building and deploying various deep learning models. PaddlePaddle is designed to be user-friendly and efficient, making it suitable for both researchers and developers.

Key Features of PaddlePaddle:

  • Flexibility: PaddlePaddle supports a wide range of deep learning algorithms and architectures, from traditional neural networks to state-of-the-art models like YOLO, ResNet, and Transformer.
  • Scalability: It can handle large-scale datasets and models, making it suitable for demanding tasks such as image recognition, natural language processing, and recommendation systems.
  • Efficiency: PaddlePaddle is optimized for performance, offering efficient training and inference times.
  • Ease of use: The platform provides a high-level API that simplifies the development process, making it accessible to users with varying levels of expertise.
  • Community support: PaddlePaddle has a growing community of developers and researchers who contribute to its development and provide support.

2.2. PaddleYOLO

PaddleYOLO is a specialized package within PaddlePaddle that is specifically designed for object detection tasks based on YOLO  family models. It provides a pre-trained YOLO model, as well as tools for fine-tuning the model on custom datasets. PaddleYOLO offers several advantages, including:

  • Pre-trained models: Users can start with a pre-trained YOLO model and fine-tune it on their specific task, saving time and effort.
  • Ease of use: PaddleYOLO provides a simplified interface for training and inference, making it accessible to users with limited deep learning experience.
  • Performance: PaddleYOLO is optimized for object detection tasks, offering competitive performance compared to other frameworks.
  • Integration with PaddlePaddle: PaddleYOLO seamlessly integrates with the rest of the PaddlePaddle ecosystem, allowing users to leverage other features and tools provided by the platform.

By combining the flexibility and scalability of PaddlePaddle with the specialized capabilities of PaddleYOLO, users can efficiently build and deploy object detection models for a wide range of applications.

3. Hands-on Step-by-Step Guide to Train a YOLOv10 with PaddleYOLO using Docker

3.1. Before Start

Prerequisites:
  • Docker installed on your system (see here).
  • Basic understanding of deep learning, computer vision concepts, and Docker.
  • Familiarity with Python programming.
Source code: 
  • Original codes can be downloaded from GitHub.

3.2. Setting Up the Docker Environment and Building Your Own Image

  • Create a Dockerfile specifying the environment setup:
Dockerfile

FROM paddlepaddle/paddle:3.0.0b1-gpu-cuda11.8-cudnn8.9-trt8.5

RUN git clone https://github.com/PaddlePaddle/PaddleYOLO -b develop &&\
    cd PaddleYOLO &&\
    pip install -r requirements.txt &&\
    pip install jupyterlab ipykernel sahi

WORKDIR /home/PaddleYOLO

  • Build the Docker image:
Command
$ DOCKER_BUILDKIT=1 docker build -f docker/paddle.Dockerfile -t paddle .

3.3. Dataset Preparation

  1. Download the “COCO-Stuff 10K dataset v1.1” dataset from cocostuff10k GitHub Repo.
  2. Unzip the dataset and place it in a suitable directory.
  3. Convert the dataset format to COCO format by following the steps described in Data_Preparation.ipynb Jupyter notebook.

3.4. Configuring the YOLOv10 Model

PaddleYOLO offers pre-trained YOLOv10 models that you can use as a starting point or customize to create your own configurations.

Select a Pre-trained Model:

  • PaddleYOLO provides a range of YOLOv10 models (e.g., YOLOv10n, YOLOv10s, YOLOv10m, YOLOv10b, YOLOv10l, YOLOv10x).
  • Choose the model that best fits your computational capacity and accuracy requirements.

Variables:

    • model_name, job_name, weights: Set the model’s details.
    • config: Path to the configuration file, relative to /home/PaddleYOLO.
    • log_dir: Path for saving logs within the container.
    • save_dir: Path for saving trained models within the container.
Bash Script

#!/bin/bash

# Define model and job details
model_name=yolov10
job_name=yolov10_n_500e_coco # select a model
weights=https://bj.bcebos.com/v1/paddledet/models/${job_name}.pdparams

# WORKDIR in the created container is /home/PaddleYOLO
config=configs/${model_name}/${job_name}.yml # Relative path to /home/PaddleYOLO
log_dir=/out/log_dir/${job_name} # Where to save logs on container, equivalent to $PWD/out/log_dir/${job_name} on the local system
save_dir=/out/trained_model # Where to save trained models on container, equivalent to $PWD/out/trained_model/ on the local system

# You can use the variables model_name, job_name, weights, config, log_dir, and save_dir in your script from this point onward.
  

3.5. Customize the configuration file

  • Modify the configuration file to match your dataset and training parameters.
  • Adjust hyperparameters such as learning rate, batch size, epochs, etc.

Variables:

    • epoch, warmup_epoch, batch_size, worker_num, mosaic_epoch, snapshot_epoch, num_classes, base_lr: These set various training parameters.
    • dataset_dir, image_dir, train_anno_path, eval_anno_path, test_anno_path: These are paths that are accessible within the container, corresponding to directories and files in your local system.
Bash Script Variables

epoch=200
warmup_epoch=3
batch_size=20
worker_num=4
mosaic_epoch=490
snapshot_epoch=10
num_classes=182
base_lr=0.01

# In the container, it is accessible as /data
# Data Structure on my local computer is as follows:
# /mnt/SSD2/coco_stuff10k/
# ├── images/
# ├── train_coco.json
# └── test_coco.json

# It will be mapped to the container, and it will look like:
# /data/
# ├── images/
# ├── train_coco.json
# └── test_coco.json

dataset_dir=/data
image_dir=images
train_anno_path=train_coco.json
eval_anno_path=test_coco.json
test_anno_path=test_coco.json
  

3.6. Configure Main Training

  • Python Command: This command uses PaddlePaddle to launch distributed training with specified configurations (applicable to multi-node multi-gpu machines).
  • Options and Variables:
    • --log_dir=${log_dir}, --gpus 0,1: Specify the logging directory and GPUs to use.
    • -c ${config}: Configuration file path.
    • --eval, --amp, --use_vdl=True: Enable evaluation, automatic mixed precision, and visualization logging.
    • -o ...: Pass multiple options such as snapshot_epoch, save_dir, pre-trained model weights, epoch, worker_num, num_classes, batch_size, mosaic_epoch, base_lr, and dataset paths.
Python Command

python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1 tools/train.py -c ${config} --eval --amp --use_vdl=True --vdl_log_dir=${log_dir} \
-o snapshot_epoch=${snapshot_epoch} save_dir=${save_dir} weights=${weights} epoch=${epoch} worker_num=${worker_num} \
num_classes=${num_classes} TrainReader.batch_size=${batch_size} TrainReader.mosaic_epoch=${mosaic_epoch} \
LearningRate.base_lr=${base_lr} \
TrainDataset.dataset_dir=${dataset_dir} TrainDataset.image_dir=${image_dir} TrainDataset.anno_path=${train_anno_path} \
EvalDataset.dataset_dir=${dataset_dir} EvalDataset.image_dir=${image_dir} EvalDataset.anno_path=${eval_anno_path} \
TestDataset.dataset_dir=${dataset_dir} TestDataset.anno_path=${eval_anno_path}
  

3.7. Start Training on Docker from BASH Terminal

To run the training process inside a Docker container from the BASH terminal, start by executing the main training script with the updated configurations within the Docker environment. This approach ensures the reproducibility and isolation of the training process.

To train using Docker from the BASH terminal, you need to specify the local paths to the configuration, data, and output folders. Assuming you’re running the BASH command from the project’s root directory and the dataset is located at `/mnt/SSD2/coco_stuff10k/` on your local machine, define the following variables in the BASH script to set up the environment.

Since the main script is used for tasks like training, evaluation, and inference, it’s more efficient to use a central script to manage these tasks within Docker. For instance, during training, you can pass the `train` script as an input parameter to the `main_run.sh` script, which then executes `train.sh` inside the container.

Docker Run Script

DATA_DIR="/mnt/SSD2/coco_stuff10k/" # In the container, it is accessible as /data
# Data Structure on my local computer is as follows:
# /mnt/SSD2/coco_stuff10k/
# ├── images/
# ├── train_coco.json
# └── test_coco.json

# It will be mapped to the container, and it will look like:
# /data/
# ├── images/
# ├── train_coco.json
# └── test_coco.json

OUT_DIR="$PWD/out" # In the container, it is accessible as /out
RUN_SCRIPT_DIR=$PWD/run_scripts/ # In the container, it is accessible as /run_scripts

docker run -it --rm \
    --gpus all \
    --mount type=bind,source=$RUN_SCRIPT_DIR,target=/run_scripts \
    --mount type=bind,source=$DATA_DIR,target=/data \
    --mount type=bind,source=$OUT_DIR,target=/out \
    --shm-size 8g \
    paddle:latest \
    bash /run_scripts/$1
  

For training, run:

Command
$ bash main_run.sh train

3.8. Model Evaluation

It is important to evaluate the trained model using the evaluation dataset. Since the program is designed to be modular, creating an evaluation script involves only minor modifications to the existing training script to create eval script. The same it true for exporting the model and inference.

3.8.1. Define Trained Model Weights Path
Bash Script Variables

weights=/out/trained_model/${job_name}/model_final.pdparams
  
3.8.2. Define Evaluation Command
Python Evaluation Command

CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} --classwise -o weights=${weights} \
num_classes=${num_classes} TrainReader.batch_size=${batch_size} TrainReader.mosaic_epoch=${mosaic_epoch} \
LearningRate.base_lr=${base_lr} \
TrainDataset.dataset_dir=${dataset_dir} TrainDataset.image_dir=${image_dir} TrainDataset.anno_path=${train_anno_path} \
EvalDataset.dataset_dir=${dataset_dir} EvalDataset.image_dir=${image_dir} EvalDataset.anno_path=${eval_anno_path} \
TestDataset.dataset_dir=${dataset_dir} TestDataset.anno_path=${eval_anno_path}
  

Where:

  • CUDA_VISIBLE_DEVICES=0: Specifies which GPU(s) to use.
  • python tools/eval.py: Runs the evaluation script.
  • -c ${config}: Path to the configuration file.
  • --classwise: Option to evaluate class-wise metrics.

Run evaluation:

Command
$ bash main_run.sh eval

3.9. Exporting Trained Model

Exporting a PaddlePaddle model (here PaddleYOLO) is a crucial step in deploying it to various platforms or integrating it into other applications. This process involves converting the trained model into a format that can be easily loaded and used by different environments.

Why Export PaddlePaddle Models?
  • Deployment: Exported models can be deployed on various platforms, such as servers, mobile devices, and embedded systems.
  • Integration: Exported models can be integrated into other applications or frameworks.
  • Sharing: Exported models can be shared with others for collaboration or distribution.

Since the program is designed to be modular, creating a model export script involves only minor modifications to the existing training script (i.e. export script).

3.9.1. Define Trained Model Weights Path
Bash Script Variables

weights=/out/trained_model/${job_name}/model_final.pdparams
output_dir=/out/inference_model/
  

Where:

  • weights: Path to the model weights file inside the container.
  • output_dir: Directory where the inference model will be saved.
3.9.2. Define Model Export Command
Python Export Model Command

CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c ${config} --output_dir=${output_dir} -o weights=${weights} \
num_classes=${num_classes} TrainReader.batch_size=${batch_size} TrainReader.mosaic_epoch=${mosaic_epoch} \
LearningRate.base_lr=${base_lr} \
TrainDataset.dataset_dir=${dataset_dir} TrainDataset.image_dir=${image_dir} TrainDataset.anno_path=${train_anno_path} \
EvalDataset.dataset_dir=${dataset_dir} EvalDataset.image_dir=${image_dir} EvalDataset.anno_path=${eval_anno_path} \
TestDataset.dataset_dir=${dataset_dir} TestDataset.anno_path=${eval_anno_path} \
trt=True
  

Where:

  • CUDA_VISIBLE_DEVICES=0: Specifies which GPU(s) to use.
  • python tools/export_model.py: Runs the model export script.
  • -c ${config}: Path to the configuration file.
  • --output_dir=${output_dir}: Directory where the exported model will be saved.
  • -o ...: Sets various options such as weights, number of classes, batch size, mosaic epoch, learning rate, and dataset paths.
  • trt=True: Option to use TensorRT for model export.

Run export:

Command
$ bash main_run.sh export

3.10. Inference

3.10.1. Define the path to the exported model from previous step
Bash Script Variables

model_dir=/out/inference_model/${job_name}
  
3.10.2. Define input image and inference result paths
Bash Script Variables

inference_results_dir=/out/inference_results
image_path=/data/images/COCO_train2014_000000269903.jpg
  

Where:

  • image_path: Path to a specific image file for inference inside the container.
  • inference_results_dir: Directory where inference results will be saved inside the container.
3.10.3. Define Inference Command
Python Inference Command

CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=${model_dir} --image_file=${image_path} --output_dir=${inference_results_dir} --save_images=True --device=GPU
  

Where:

  • CUDA_VISIBLE_DEVICES=0: Specifies which GPU to use.
  • python deploy/python/infer.py: Runs the inference script.
  • --model_dir=${model_dir}: Directory where the exported model is located.
  • --image_file=${image_path}: Path to the image file for inference.
  • --output_dir=${inference_results_dir}: Directory where inference results will be saved.
  • --save_images=True: Option to save the images with inference results.
  • --device=GPU: Specifies that the inference should use the GPU.

Run inference:

Command
$ bash main_run.sh inference

4. Summary

This comprehensive guide has provided a step-by-step process for effectively training, evaluating, exporting, and inferencing YOLOv10 models using PaddleYOLO within a Docker containerized environment. By following the outlined steps and leveraging the power of PaddleYOLO and Docker, you can efficiently build and deploy custom object detection models tailored to your specific needs.

Key takeaways include the ease of use and efficiency offered by PaddleYOLO for training YOLOv10 models, the versatility and isolation provided by Docker for developing and deploying deep learning applications, the importance of data preparation, model configuration, and evaluation in the training process, and the ability to deploy exported YOLOv10 models on various platforms and integrate them into different applications. By mastering these concepts and techniques, you can unlock the full potential of YOLOv10 and its applications in various domains, such as autonomous driving, surveillance, and augmented reality.

5. References

Disclaimer

This tutorial is intended for educational purposes only. It does not constitute professional advice, including medical or legal advice. Any application of the techniques discussed in real-world scenarios should be done cautiously and with consultation from relevant experts.

Scroll to Top