English | 简体中文
PP-OCRv3 is further upgraded on the basis of PP-OCRv2. This section describes the training steps of the PP-OCRv3 detection model. Refer to documentation for PP-OCRv3 introduction.
The PP-OCRv3 detection model is an upgrade of the CML (Collaborative Mutual Learning) collaborative mutual learning text detection distillation strategy in PP-OCRv2. PP-OCRv3 is further optimized for detecting teacher model and student model respectively. Among them, when optimizing the teacher model, the PAN structure LK-PAN with large receptive field and the DML (Deep Mutual Learning) distillation strategy are proposed. when optimizing the student model, the FPN structure RSE-FPN with residual attention mechanism is proposed.
PP-OCRv3 detection training consists of two steps:
The training data adopts icdar2015 data, and the steps to prepare the training set refer to ocr_dataset.
Runtime environment preparation reference documentation.
The configuration file for teacher model training is ch_PP-OCRv3_det_dml.yml. The Backbone, Neck, and Head of the model structure of the teacher model are Resnet50, LKPAN, and DBHead, respectively, and are trained by the distillation method of DML. Refer to documentation for a detailed introduction to configuration files.
Download ImageNet pretrained models:
# Download the pretrained model of ResNet50_vd
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/ResNet50_vd_ssld_pretrained.pdparams
Start training
# Single GPU training
python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \
-o Architecture.Models.Student.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
Architecture.Models.Student2.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
Global.save_model_dir=./output/
# If you want to use multi-GPU distributed training, use the following command:
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \
-o Architecture.Models.Student.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
Architecture.Models.Student2.pretrained=./pretrain_models/ResNet50_vd_ssld_pretrained \
Global.save_model_dir=./output/
The model saved during training is in the output directory and contains the following files:
best_accuracy.states
best_accuracy.pdparams # The model parameters with the best accuracy are saved by default
best_accuracy.pdopt # optimizer-related parameters that save optimal accuracy by default
latest.states
latest.pdparams # The latest model parameters saved by default
latest.pdopt # Optimizer related parameters of the latest model saved by default
Among them, best_accuracy is the saved model parameter with the highest accuracy, which can be directly evaluated using this model.
The model evaluation command is as follows:
python3 tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml -o Global.checkpoints=./output/best_accuracy
The trained teacher model has a larger structure and higher accuracy, which is used to improve the accuracy of the student model.
Extract teacher model parameters best_accuracy contains the parameters of two models, corresponding to Student and Student2 in the configuration file respectively. The method of extracting the parameters of Student is as follows:
import paddle
# load pretrained model
all_params = paddle.load("output/best_accuracy.pdparams")
# View the keys of the weight parameter
print(all_params.keys())
# model weight extraction
s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key}
# View the keys of the model weight parameters
print(s_params.keys())
# save
paddle.save(s_params, "./pretrain_models/dml_teacher.pdparams")
The extracted model parameters can be used for further finetune training or distillation training of the model.
The configuration file for training the student model is ch_PP-OCRv3_det_cml.yml The teacher model trained in the previous section is used as supervision, and the lightweight student model is obtained by training in CML.
Download the ImageNet pretrained model for the student model:
# Download the pre-trained model of MobileNetV3
wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/pretrained/MobileNetV3_large_x0_5_pretrained.pdparams
Start training
# Single card training
python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
-o Architecture.Models.Student.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
Architecture.Models.Student2.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
Architecture.Models.Teacher.pretrained=./pretrain_models/dml_teacher \
Global.save_model_dir=./output/
# If you want to use multi-GPU distributed training, use the following command:
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
-o Architecture.Models.Student.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
Architecture.Models.Student2.pretrained=./pretrain_models/MobileNetV3_large_x0_5_pretrained \
Architecture.Models.Teacher.pretrained=./pretrain_models/dml_teacher \
Global.save_model_dir=./output/
The model saved during training is in the output directory, The model evaluation command is as follows:
python3 tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.checkpoints=./output/best_accuracy
best_accuracy contains three model parameters, corresponding to Student, Student2, and Teacher in the configuration file. The method to extract the Student parameter is as follows:
import paddle
# load pretrained model
all_params = paddle.load("output/best_accuracy.pdparams")
# View the keys of the weight parameter
print(all_params.keys())
# model weight extraction
s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key}
# View the keys of the model weight parameters
print(s_params.keys())
# save
paddle.save(s_params, "./pretrain_models/cml_student.pdparams")
The extracted parameters of Student can be used for model deployment or further finetune training.
This section describes how to use the finetune training of the PP-OCRv3 detection model on other scenarios.
finetune training applies to three scenarios:
finetune training based on CML distillation method
Download the PP-OCRv3 training model:
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
tar xf ch_PP-OCRv3_det_distill_train.tar
ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams contains the parameters of the Student, Student2, and Teacher models in the CML configuration file.
Start training:
# Single card training
python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
-o Global.pretrained_model=./ch_PP-OCRv3_det_distill_train/best_accuracy \
Global.save_model_dir=./output/
# If you want to use multi-GPU distributed training, use the following command:
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
-o Global.pretrained_model=./ch_PP-OCRv3_det_distill_train/best_accuracy \
Global.save_model_dir=./output/
finetune training based on PP-OCRv3 lightweight detection model
Download the PP-OCRv3 training model and extract the model parameters of the Student structure:
wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
tar xf ch_PP-OCRv3_det_distill_train.tar
The method to extract the Student parameter is as follows:
import paddle
# load pretrained model
all_params = paddle.load("output/best_accuracy.pdparams")
# View the keys of the weight parameter
print(all_params.keys())
# model weight extraction
s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key}
# View the keys of the model weight parameters
print(s_params.keys())
# save
paddle.save(s_params, "./student.pdparams")
Trained using the configuration file ch_PP-OCRv3_det_student.yml.
Start training
# Single card training
python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml \
-o Global.pretrained_model=./student \
Global.save_model_dir=./output/
# If you want to use multi-GPU distributed training, use the following command:
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml \
-o Global.pretrained_model=./student \
Global.save_model_dir=./output/
finetune training based on DML distillation method
Taking the Teacher model in ch_PP-OCRv3_det_distill_train as an example, first extract the parameters of the Teacher structure as follows:
import paddle
# load pretrained model
all_params = paddle.load("ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams")
# View the keys of the weight parameter
print(all_params.keys())
# model weight extraction
s_params = {key[len("Teacher."):]: all_params[key] for key in all_params if "Teacher." in key}
# View the keys of the model weight parameters
print(s_params.keys())
# save
paddle.save(s_params, "./teacher.pdparams")
Start training
# Single card training
python3 tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \
-o Architecture.Models.Student.pretrained=./teacher \
Architecture.Models.Student2.pretrained=./teacher \
Global.save_model_dir=./output/
# If you want to use multi-GPU distributed training, use the following command:
python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml \
-o Architecture.Models.Student.pretrained=./teacher \
Architecture.Models.Student2.pretrained=./teacher \
Global.save_model_dir=./output/