Configuration

1. Optional Parameter List
2. Introduction to Global Parameters of Configuration File
3. Multilingual Config File Generation

1. Optional Parameter List

The following list can be viewed through --help

FLAG	Supported script	Use	Defaults	Note
-c	ALL	Specify configuration file to use	None	Please refer to the parameter introduction for configuration file usage
-o	ALL	set configuration options	None	Configuration using -o has higher priority than the configuration file selected with -c. E.g: -o Global.use_gpu=false

2. Introduction to Global Parameters of Configuration File

Take rec_chinese_lite_train_v2.0.yml as an example

Global

Parameter	Use	Defaults	Note
use_gpu	Set using GPU or not	true
epoch_num	Maximum training epoch number	500
log_smooth_window	Log queue length, the median value in the queue each time will be printed	20
print_batch_step	Set print log interval	10
save_model_dir	Set model save path	output/{算法名称}
save_epoch_step	Set model save interval	3
eval_batch_step	Set the model evaluation interval	2000 or [1000, 2000]	running evaluation every 2000 iters or evaluation is run every 2000 iterations after the 1000th iteration
cal_metric_during_train	Set whether to evaluate the metric during the training process. At this time, the metric of the model under the current batch is evaluated	true
load_static_weights	Set whether the pre-training model is saved in static graph mode (currently only required by the detection algorithm)	true
pretrained_model	Set the path of the pre-trained model	./pretrain_models/CRNN/best_accuracy
checkpoints	set model parameter path	None	Used to load parameters after interruption to continue training
use_visualdl	Set whether to enable visualdl for visual log display	False	Tutorial
use_wandb	Set whether to enable W&B for visual log display	False	Documentation
infer_img	Set inference image path or folder path	./infer_img	\|
character_dict_path	Set dictionary path	./ppocr/utils/ppocr_keys_v1.txt	If the character_dict_path is None, model can only recognize number and lower letters
max_text_length	Set the maximum length of text	25
use_space_char	Set whether to recognize spaces	True	\|
label_list	Set the angle supported by the direction classifier	['0','180']	Only valid in angle classifier model
save_res_path	Set the save address of the test model results	./output/det_db/predicts_db.txt	Only valid in the text detection model

Optimizer (ppocr/optimizer)

Parameter	Use	Defaults	Note
name	Optimizer class name	Adam	Currently supports`Momentum`,`Adam`,`RMSProp`, see ppocr/optimizer/optimizer.py
beta1	Set the exponential decay rate for the 1st moment estimates	0.9
beta2	Set the exponential decay rate for the 2nd moment estimates	0.999
clip_norm	The maximum norm value	-
lr	Set the learning rate decay method	-
name	Learning rate decay class name	Cosine	Currently supports`Linear`,`Cosine`,`Step`,`Piecewise`, seeppocr/optimizer/learning_rate.py
learning_rate	Set the base learning rate	0.001
regularizer	Set network regularization method	-
name	Regularizer class name	L2	Currently support`L1`,`L2`, seeppocr/optimizer/regularizer.py
factor	Regularizer coefficient	0.00001

Architecture (ppocr/modeling)

In PaddleOCR, the network is divided into four stages: Transform, Backbone, Neck and Head

Parameter	Use	Defaults	Note
model_type	Network Type	rec	Currently support`rec`,`det`,`cls`
algorithm	Model name	CRNN	See algorithm_overview for the support list
Transform	Set the transformation method	-	Currently only recognition algorithms are supported, see ppocr/modeling/transform for details
name	Transformation class name	TPS	Currently supports `TPS`
num_fiducial	Number of TPS control points	20	Ten on the top and bottom
loc_lr	Localization network learning rate	0.1
model_name	Localization network size	small	Currently support`small`,`large`
Backbone	Set the network backbone class name	-	see ppocr/modeling/backbones
name	backbone class name	ResNet	Currently support`MobileNetV3`,`ResNet`
layers	resnet layers	34	Currently support18,34,50,101,152,200
model_name	MobileNetV3 network size	small	Currently support`small`,`large`
Neck	Set network neck	-	seeppocr/modeling/necks
name	neck class name	SequenceEncoder	Currently support`SequenceEncoder`,`DBFPN`
encoder_type	SequenceEncoder encoder type	rnn	Currently support`reshape`,`fc`,`rnn`
hidden_size	rnn number of internal units	48
out_channels	Number of DBFPN output channels	256
Head	Set the network head	-	seeppocr/modeling/heads
name	head class name	CTCHead	Currently support`CTCHead`,`DBHead`,`ClsHead`
fc_decay	CTCHead regularization coefficient	0.0004
k	DBHead binarization coefficient	50
class_dim	ClsHead output category number	2

Loss (ppocr/losses)

Parameter	Use	Defaults	Note
name	loss class name	CTCLoss	Currently support`CTCLoss`,`DBLoss`,`ClsLoss`
balance_loss	Whether to balance the number of positive and negative samples in DBLossloss (using OHEM)	True
ohem_ratio	The negative and positive sample ratio of OHEM in DBLossloss	3
main_loss_type	The loss used by shrink_map in DBLossloss	DiceLoss	Currently support`DiceLoss`,`BCELoss`
alpha	The coefficient of shrink_map_loss in DBLossloss	5
beta	The coefficient of threshold_map_loss in DBLossloss	10

PostProcess (ppocr/postprocess)

Parameter	Use	Defaults	Note
name	Post-processing class name	CTCLabelDecode	Currently support`CTCLoss`,`AttnLabelDecode`,`DBPostProcess`,`ClsPostProcess`
thresh	The threshold for binarization of the segmentation map in DBPostProcess	0.3
box_thresh	The threshold for filtering output boxes in DBPostProcess. Boxes below this threshold will not be output	0.7
max_candidates	The maximum number of text boxes output in DBPostProcess	1000
unclip_ratio	The unclip ratio of the text box in DBPostProcess	2.0

Metric (ppocr/metrics)

Parameter	Use	Defaults	Note
name	Metric method name	CTCLabelDecode	Currently support`DetMetric`,`RecMetric`,`ClsMetric`
main_indicator	Main indicators, used to select the best model	acc	For the detection method is hmean, the recognition and classification method is acc

Dataset (ppocr/data)

Parameter	Use	Defaults	Note
dataset	Return one sample per iteration	-	-
name	dataset class name	SimpleDataSet	Currently support`SimpleDataSet`,`LMDBDataSet`
data_dir	Image folder path	./train_data
label_file_list	Groundtruth file path	["./train_data/train_list.txt"]	This parameter is not required when dataset is LMDBDataSet
ratio_list	Ratio of data set	[1.0]	If there are two train_lists in label_file_list and ratio_list is [0.4,0.6], 40% will be sampled from train_list1, and 60% will be sampled from train_list2 to combine the entire dataset
transforms	List of methods to transform images and labels	[DecodeImage,CTCLabelEncode,RecResizeImg,KeepKeys]	seeppocr/data/imaug
loader	dataloader related	-
shuffle	Does each epoch disrupt the order of the data set	True
batch_size_per_card	Single card batch size during training	256
drop_last	Whether to discard the last incomplete mini-batch because the number of samples in the data set cannot be divisible by batch_size	True
num_workers	The number of sub-processes used to load data, if it is 0, the sub-process is not started, and the data is loaded in the main process	8

Weights & Biases (W&B)

Parameter	Use	Defaults
project	Project to which the run is to be logged	uncategorized
name	Alias/Name of the run	Randomly generated by wandb
id	ID of the run	Randomly generated by wandb
entity	User or team to which the run is being logged	The logged in user
save_dir	local directory in which all the models and other data is saved	wandb
config	model configuration	None

3. Multilingual Config File Generation

PaddleOCR currently supports recognition for 80 languages (besides Chinese). A multi-language configuration file template is provided under the path configs/rec/multi_languages: rec_multi_language_lite_train.yml。

There are two ways to create the required configuration file:

Automatically generated by script

Script generate_multi_language_configs.py can help you generate configuration files for multi-language models.

Take Italian as an example, if your data is prepared in the following format:

|-train_data
    |- it_train.txt # train_set label
    |- it_val.txt # val_set label
    |- data
        |- word_001.jpg
        |- word_002.jpg
        |- word_003.jpg
        | ...

You can use the default parameters to generate a configuration file:

# The code needs to be run in the specified directory
cd PaddleOCR/configs/rec/multi_language/
# Set the configuration file of the language to be generated through the -l or --language parameter.
# This command will write the default parameters into the configuration file
python3 generate_multi_language_configs.py -l it

If your data is placed in another location, or you want to use your own dictionary, you can generate the configuration file by specifying the relevant parameters:

# -l or --language field is required
# --train to modify the training set
# --val to modify the validation set
# --data_dir to modify the data set directory
# --dict to modify the dict path
# -o to modify the corresponding default parameters
cd PaddleOCR/configs/rec/multi_language/
python3 generate_multi_language_configs.py -l it \  # language
--train {path/of/train_label.txt} \ # path of train_label
--val {path/of/val_label.txt} \     # path of val_label
--data_dir {train_data/path} \      # root directory of training data
--dict {path/of/dict} \             # path of dict
-o Global.use_gpu=False             # whether to use gpu
...

Italian is made up of Latin letters, so after executing the command, you will get the rec_latin_lite_train.yml.

Manually modify the configuration file

You can also manually modify the following fields in the template:

    Global:
      use_gpu: True
      epoch_num: 500
      ...
      character_dict_path:  {path/of/dict} # path of dict

   Train:
      dataset:
        name: SimpleDataSet
        data_dir: train_data/ # root directory of training data
        label_file_list: ["./train_data/train_list.txt"] # train label path
      ...

   Eval:
      dataset:
        name: SimpleDataSet
        data_dir: train_data/ # root directory of val data
        label_file_list: ["./train_data/val_list.txt"] # val label path
      ...

Currently, the multi-language algorithms supported by PaddleOCR are:

Configuration file	Algorithm name	backbone	trans	seq	pred	language
rec_chinese_cht_lite_train.yml	CRNN	Mobilenet_v3 small 0.5	None	BiLSTM	ctc	chinese traditional
rec_en_lite_train.yml	CRNN	Mobilenet_v3 small 0.5	None	BiLSTM	ctc	English(Case sensitive)
rec_french_lite_train.yml	CRNN	Mobilenet_v3 small 0.5	None	BiLSTM	ctc	French
rec_ger_lite_train.yml	CRNN	Mobilenet_v3 small 0.5	None	BiLSTM	ctc	German
rec_japan_lite_train.yml	CRNN	Mobilenet_v3 small 0.5	None	BiLSTM	ctc	Japanese
rec_korean_lite_train.yml	CRNN	Mobilenet_v3 small 0.5	None	BiLSTM	ctc	Korean
rec_latin_lite_train.yml	CRNN	Mobilenet_v3 small 0.5	None	BiLSTM	ctc	Latin
rec_arabic_lite_train.yml	CRNN	Mobilenet_v3 small 0.5	None	BiLSTM	ctc	arabic
rec_cyrillic_lite_train.yml	CRNN	Mobilenet_v3 small 0.5	None	BiLSTM	ctc	cyrillic
rec_devanagari_lite_train.yml	CRNN	Mobilenet_v3 small 0.5	None	BiLSTM	ctc	devanagari

For more supported languages, please refer to : Multi-language model

The multi-language model training method is the same as the Chinese model. The training data set is 100w synthetic data. A small amount of fonts and test data can be downloaded using the following two methods.

Baidu Netdisk,Extraction code:frgi.
Google drive

config_en.md 19 KB History Raw