2.3 KB

Table Recognition Datasets

Here are the commonly used table recognition datasets, which are being updated continuously. Welcome to contribute datasets~

Dataset Summary

dataset Image download link PPOCR format annotation download link
PubTabNet jsonl format, which can be loaded directly with
TAL Table Recognition Competition Dataset jsonl format, which can be loaded directly with
WTW Chinese scene table dataset Conversion is required to load with

1. PubTabNet

  • Data Introduction:The training set of the PubTabNet dataset contains 500,000 images and the validation set contains 9000 images. Part of the image visualization is shown below.
  • illustrate:When using this dataset, the CDLA-Permissive protocol is required.

2. TAL Table Recognition Competition Dataset

  • Data Introduction:The training set of the TAL table recognition competition dataset contains 16,000 images. The validation set does not give trainable annotations.

3. WTW Chinese scene table dataset