Index Catalog // British Library

2020

Dataset

Ground Truth transcriptions for training OCR of historical Bengali printed texts – Recognition of Early Indian Printed Documents competition - updated with improved XML coordinates

This dataset comprises 81 digitised images (TIFF files) drawn from a selection of early printed Bengali books (1713-1914) digitised through the Two Centuries of Indian Print project (https://www.bl.uk/projects/two-centuries-of-indian-print). Also contained are ground truth transcriptions (XML) for each page that can be used for training optical character recognition software on historical...

British Library ; Derrick, Tom

OCR, Indian, and transcription

2019

Conference paper (unpublished)

ICDAR2019 Competition on Recognition of Early Indian Printed Documents – REID2019

This paper presents an objective comparative evaluation of page analysis and recognition methods for historical documents with text mainly in Bengali language and script. It describes the competition rules, dataset, and evaluation methodology. Results are presented for five methods - three submit-ted, one re-run, and one open source state-of-the-art system....

Clausner, Christian ; Antonacopoulos, Apostolos ; Derrick, Tom ; Pletschacher, Stefan

2019

Conference paper (published)

Cross-disciplinary Collaborations to Enrich Access to Non-Western Language Material in the Cultural Heritage Sector

The British Library is home to millions of items representing every age of written civilisation, including books, manuscripts and newspapers in all written languages. Large digitisation programmes currently underway are opening up access to this rich and unique historical content on an ever increasing scale. However, particularly for historical material...

Derrick, Tom ; McGregor, Nora

HTR, page analysis, layout analysis, recognition, Bangla script, Arabic script, OCR, and datasets

2019

Dataset

Ground Truth transcriptions for training OCR of historical Bengali printed texts - Transkribus

This dataset comprises 74 digitised images (TIFF files) drawn from a selection of early printed Bengali books (1713-1914) digitised through the Two Centuries of Indian Print project (https://www.bl.uk/projects/two-centuries-of-indian-print). Also contained are ground truth transcriptions (XML) for each page that can be used for training optical character recognition software on historical...

British Library ; Derrick, Tom

OCR, transcription, and Indian

2019

Dataset

Ground Truth transcriptions for training OCR of historical Bengali printed texts - Recognition of Early Indian Printed Documents competition

This dataset comprises 81 digitised images (TIFF files) drawn from a selection of early printed Bengali books (1713-1914) digitised through the Two Centuries of Indian Print project (https://www.bl.uk/projects/two-centuries-of-indian-print). Also contained are ground truth transcriptions (XML) for each page that can be used for training optical character recognition software on historical...

British Library ; Derrick, Tom

Indian, transcription, and OCR

2018

Dataset

Digitised Quarterly Lists XML and Metadata

Two Centuries of Indian Print 1867-1947. The files in this dataset are derived from the British Library’s collection of bound volume Quarterly Lists: printed catalogue records of Indian books published quarterly and by province of British India between 1867 and 1947. The dataset comprises text from the collection of digitised...

Derrick, Tom

XML, books, Indian, ALTO, and metadata

2016

Dataset

Digitised Quarterly Lists PDFs and Metadata

The files in this dataset are derived from the British Library’s collection of bound volume Quarterly Lists: printed catalogue records of Indian books published quarterly and by province of British India between 1867 and 1947. The dataset comprises full-text searchable PDFs of 215 volumes as well as the associated metadata...

Derrick, Tom

books, Indian, and metadata

Research Repository

2020

Dataset

Ground Truth transcriptions for training OCR of historical Bengali printed texts – Recognition of Early Indian Printed Documents competition - updated with improved XML coordinates

2019

Conference paper (unpublished)

ICDAR2019 Competition on Recognition of Early Indian Printed Documents – REID2019

2019

Conference paper (published)

Cross-disciplinary Collaborations to Enrich Access to Non-Western Language Material in the Cultural Heritage Sector

2019

Dataset

Ground Truth transcriptions for training OCR of historical Bengali printed texts - Transkribus

2019

Dataset

Ground Truth transcriptions for training OCR of historical Bengali printed texts - Recognition of Early Indian Printed Documents competition

2018

Dataset

Digitised Quarterly Lists XML and Metadata

2016

Dataset

Digitised Quarterly Lists PDFs and Metadata

Affina la ricerca

Type

Resource Type

Creator

Parola chiave

Lingua

Collezione

Institution

Availability

Research Repository

Ricerca

Risultati della ricerca

2020

Dataset

2019

Conference paper (unpublished)

2019

Conference paper (published)

2019

Dataset

2019

Dataset

2018

Dataset

2016

Dataset

Affina la ricerca