Search Constraints
Search Results
-
Dataset
DeezyMatch training set for OCR
Optical character recognition (OCR) is the process of automatically transcribing text from images. The presence of OCR-induced errors in digitised text is a common problem in the digital humanities. OCR errors are usually due to the misrecognition of characters, such as "h" recognised as "b", or "c" recognised as "o".... -
Dataset
Digitised Books. c. 1510 - c. 1900. JSONL (OCR derived text + metadata)
The dataset comprises metadata and OCR generated text from 49,455 digitised books published between c. 1510 - c. 1900. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in JSON Lines (JSONL) text format.British Library Labs ; British Library
OCR and monographs
-
Dataset
Ground Truth transcriptions for training OCR of historical Bengali printed texts – Recognition of Early Indian Printed Documents competition - updated with improved XML coordinates
This dataset comprises 81 digitised images (TIFF files) drawn from a selection of early printed Bengali books (1713-1914) digitised through the Two Centuries of Indian Print project (https://www.bl.uk/projects/two-centuries-of-indian-print). Also contained are ground truth transcriptions (XML) for each page that can be used for training optical character recognition software on historical...British Library ; Derrick, Tom
OCR, Indian, and transcription
-
Dataset
Ground Truth transcriptions for training OCR of historical Bengali printed texts - Transkribus
This dataset comprises 74 digitised images (TIFF files) drawn from a selection of early printed Bengali books (1713-1914) digitised through the Two Centuries of Indian Print project (https://www.bl.uk/projects/two-centuries-of-indian-print). Also contained are ground truth transcriptions (XML) for each page that can be used for training optical character recognition software on historical...British Library ; Derrick, Tom
OCR, transcription, and Indian
-
Dataset
Ground Truth transcriptions for training OCR of historical Bengali printed texts - Recognition of Early Indian Printed Documents competition
This dataset comprises 81 digitised images (TIFF files) drawn from a selection of early printed Bengali books (1713-1914) digitised through the Two Centuries of Indian Print project (https://www.bl.uk/projects/two-centuries-of-indian-print). Also contained are ground truth transcriptions (XML) for each page that can be used for training optical character recognition software on historical...British Library ; Derrick, Tom
Indian, transcription, and OCR
-
Dataset
Volumes of performances connecting Sir Henry Irving. 1879 - 1905.
Sir Henry Irving's American and Provincial Tours 1883 - 1905; miscellaneous performances, including some given by Royal Command, 1883 - 1903; Lyceum Theatre 1879 – 1902; and Drury Lane Theatre, 1903 and 1905. The collection was formed by Bram Stoker.British Library
-
Dataset
Volumes of Lysons Collectanea (Trades), comprising advertisements, cuttings, and illustrations relating to trades, professions, medical cures. 1660-1825.
The dataset comprises the OCR text derived from four digitised volumes of a collection of advertisements, cuttings and illustrations relating to trades, professions and medical cures from 1660 - 1825.British Library
text, newspapers, OCR, trades, and adverts
-
Dataset
Volumes of Lysons Collectanea (Amusements), comprising broadsides, cuttings, advertisements on amusements 1660-1840
The dataset comprises nine digitised volumes of a collection of broadsides, cuttings and advertisements, relating to public exhibitions and places of amusement from 1660 - 1840 (with OCR-derived text.) Part of the Lysons Collectanea collection.British Library
amusements, text, newspapers, broadsides, OCR, and adverts
-
Dataset
Volumes of portraits and biographies of officers in the South African wars collected by John Malcolm Bulloch. 1900 - 1902.
The dataset comprises six digitised volumes (in PDF) of a collection of portraits and biographical details of some officers distinguished in the South African War (1900 - 1902) (with OCR-derived text.) The collection was formed by John Malcolm Bulloch..British Library
South Africa, text, portraits, war, army, OCR, biographies, and biography