OCR text derived from digitised books published 1820 - 1829 in ALTO XML - British Library Research Repository
Skip to main content
Shared Research Repository
Dataset

OCR text derived from digitised books published 1820 - 1829 in ALTO XML

2014

Abstract

This set consists 2739 volumes, published between 1820-1829. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO) Extensible Markup Language (XML) format.

Files

File nameDate UploadedVisibilityFile size
1820_1829.zip
18 Dec 2018
Public
10.7 GB