Two Centuries of Indian Print 1867-1947. The files in this dataset are derived from the British Library’s collection of bound volume Quarterly Lists: printed catalogue records of Indian books published quarterly and by province of British India between 1867 and 1947. The dataset comprises text from the collection of digitised catalogues created using Optical Character Recognition (OCR) technology. It represents a rich source for researchers interested in the publishing industry and book history in India. The dataset is in Analysed Layout and Text Object (ALTO) Extensible Markup Language (XML) format. The catalogues are predominantly in English language with some Bengali and mostly arranged in table format, capturing descriptive metadata about the books, including the name and addresses of printers and publishers, the number of copies printed and often the price, as well as much more. The catalogues have been made available through the British Library's Two Centuries of Indian Print project, which is also digitising rare Bengali books dating from 1713-1914, the datasets of which will also be made available through this website.
There is 1 file associated with this work, which is available for download.
- Resource type
- Place of publication
- Official URL
- Related URL
- Alternate identifier
- Alternate identifier: DAR00627
- type: Digital Asset Register ID
- Additional information
The 145 MB .ZIP file contains ALTO XML for 38 single volume Quarterly Lists. Also included are two Microsoft Excel workbook files in .XLSX and .XLS (2010 version) listing the 38 volumes of Quarterly Lists. These XML files can be used for text analysis of book publishing metadata. Due to the extreme sizes of some of the zip archives (often well in excess of 4GB), we recommend using a dedicated zip archive application, such as 7-zip, to open and extract these datasets. For example, the built-in zip archive handling in Microsoft Windows (ie Right-click to 'Extract Here') is not designed to handle these sizes and will throw errors, even suggesting (falsely) that the archive is corrupt.