This paper presents an objective comparative evaluation of page analysis and recognition methods for historical documents with text mainly in Bengali language and script. It describes the competition rules, dataset, and evaluation methodology. Results are presented for five methods - three submit-ted, one re-run, and one open source state-of-the-art system. The focus is on optical character recognition (OCR) performance. Different evaluation metrics were used to gain an in-sight into the algorithms, including new character accuracy metrics to better reflect the difficult circumstances presented by the documents. The results indicate that deep learning approaches are promising, but there are still significant challenges for historic material of this nature.
|File name||Date Uploaded||Visibility||File size|