Buscar
Resultados de la búsqueda
-
Dataset
Digitised Books. c. 1510 - c. 1900. JSONL (OCR derived text + metadata)
The dataset comprises metadata and OCR generated text from 49,455 digitised books published between c. 1510 - c. 1900. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in JSON Lines (JSONL) text format.British Library Labs ; British Library
OCR and monographs
-
Dataset
Theatrical playbills from Britain and Ireland (OCR text only)
The dataset comprises 264 volumes of digitised theatrical playbills published between 1660 – 1902 (mostly 19th century) from England, Scotland, Wales and Ireland. Digitised from the British Library's physical collection of over 500 volumes of playbills. The dataset contains text files (.TXT) in Optical Character Recognition (OCR) format. The playbills...British Library Labs
singlesheet, text, playbill, OCR, and playbills