In partnership with Microsoft, the British Library has digitised, and made freely available under Public Domain Mark, over 60,000 volumes (around 25 million pages) of out of copyright 18th & 19th century texts. Items within this collection cover a wide range of subject areas including geography, philosophy, history, poetry and literature and are published in a variety of languages.
This collection, sometimes referred to as the Microsoft Books/BL 19th Century collection, and its derived datasets have been made available on various platforms under an open license.
All volumes are available for view, download and full-text search from the British Library Catalogue - https://explore.bl.uk - via the Library’s IIIF standard enabled Universal Viewer. Use the search term “blmsd” in Explore to limit results to this specific collection.
Over 1 million images extracted from the book pages programmatically can be found on British Library’s Flickr: https://www.flickr.com/photos/britishlibrary/
The Flickr API can be used to directly download large sets of these images, and other metadata such as user-generated tag information: https://www.flickr.com/services/developer/
JISC Historic Texts holds a full copy of this collection: https://www.jisc.ac.uk/historical-texts
Wikimedia Commons offers a useful introduction to the collection, including a Synoptic Index, as well as projects to georeference maps found in the texts: https://commons.wikimedia.org/wiki/Commons:British_Library/Mechanical_Curator_collection
Title-level listings of news collections held by the British Library, comprising data extracted from the British Library catalogue, with some data cleaning and enhancements.
Full-text records of newspaper titles digitised from the British Library collection. Each file contains the newspaper’s output for one year, with OCR (Optical Character Recognition) text in XML format.
The datasets in this collection comprise snapshots in time of metadata descriptions of hundreds of thousands of PhD theses awarded by UK Higher Education institutions aggregated by the British Library's EThOS service. The data is estimated to cover around 98% of all PhDs ever awarded by UK Higher Education institutions, dating back to 1787.
Previous versions of the datasets are restricted to ensure the most accurate version of metadata is available for download. Please contact openaccess@bl.uk if you require access to an older version.
Using Convolutional Neural Networks to explore and label 400 Years of Book Illustrations by the SherlockNet team, BL Labs competition winners for 2016.
The datasets in this collection contain a number of thematic digitised collections of single sheet items or ephemera between 1628 - 1902. It includes information about portraits of actors, views of theatres, Christmas ballads and broadsides, Signs of taverns, newspaper cuttings, performances of Sir Henry Irving, pamphlets about the British Museum and portraits and biographies of officers in the South African wars.
The files in these datasets are derived from the British Library’s collection of bound volume Quarterly Lists: printed catalogue records of Indian books published quarterly and by province of British India between 1867 and 1947. The catalogues are predominantly in English language with some Indian scripts and mostly arranged in table format, capturing descriptive metadata about the books, including the name and addresses of printers and publishers, the number of copies printed and often the price, as well as much more. The catalogues have been made available through the British Library's Two Centuries of Indian Print project, which is also digitising rare Bengali books dating from 1713-1914, the datasets of which will also be made available through this website.
Images, datasets and catalogue records of maps and topographical views. Currently, comprising the archive of manuscript maps built up and maintained by the British War Office between ca 1890 and 1940. In a project funded by Indigo Trust the British Library has catalogued, conserved and digitised a portion of the archive that relates to the former British East Africa (Kenya, Uganda and surrounding areas).
The India Office Medical Archives, are the records of the East India Company and India Office contain a wealth of information relating to medicine and health in India, particularly for the period 1780-1920. The scanned images are available .JPEG files were converted from TIFF files using Irfan view, read more here: https://www.bl.uk/collection-guides/india-office-medical-archive-collections
Full-text records of historic press directories, listing newspapers and other journals, digitised from the British Library collection. Each file contains the run of a press directory over selected years, with OCR (Optical Character Recognition) text in XML format.
The datasets in this collection are linked open data subsets of the British National Bibliography (BNB). The BNB records the publishing activity of the United Kingdom and the Republic of Ireland since the 1950s. It includes metadata for published and forthcoming books as well as serials – whether print or electronic. The datasets are also available at http://bnb.data.bl.uk as well as http://bnb.data.bl.uk/sparql
The datasets in this collection relate to 587 Italian Academies (1525-1700), 7100 people and 911 works of the late Renaissance and early modern periods from the cities of Bologna, Naples, Padua and Siena, Venice, Verona, Mantua, Ferrara, Rome, Sicily and cities of southern Italy. The 'About' pages of the original project website include information on the design of the database and instructions for using the database.
The datasets in this collection are comprised of keylogging data from the author C M Taylor, https://cmtaylorstory.com, captured between 17 October 2014 to 5 March 2018, during the writing of the novel Staying On, 2018, (London: Duckworth Press). The data was captured using keylogging software Spectre Pro (SpectreSoft) installed on a dedicated IBM Thinkpad laptop.The data consists eight captured sessions with the overlap in captured data occurring from sessions two to five. Each session - with the exception of session one (keystroke logs only) – consists of screenshots and keystroke logs. Screenshots are saved individually as both jpg and BMP files. .avi files capturing the writing process exist complete for all sessions except one and eight (partial). It is possible to recreate the screenshot animation using either the .jpg or BMP screenshots. Keystrokes are saved as either as .rtf or .txt files. There are eight datasets.
There are 27 datasets of digitised AAS card catalogues from microfilm. Some of the cards were transcribed and matched by volunteers as part of the British Library’s LibCrowds platform, in a project called ‘Convert-a-Card’. The platform is no longer operational, but a UK Web Archive snapshot of ‘Convert-a-Card’ is available here:
http://www.libcrowds.com/archived/20221208133408/https://www.libcrowds.com/collection/convertacard.
The Library holds a newspaper collection of over 34,000 titles dating from 1619 to the present day, and growing collections of radio, television and web news. The datasets here relate to these collections, either in digital form or hardcopy, and we hope will be useful to explore further or to analyse the collection through its metadata. More datasets will be added over time. Contact us at newspapers@bl.uk or on Twitter @BL_newsroom if you have questions, or to let us know how you used our data.
More information on our news collections can be found here: https://www.bl.uk/subjects/news-media
Digitised newspapers from the British Library are available on https://www.britishnewspaperarchive.co.uk, which is free to use on the British Library premises.
As part of our work to open our data to wider use, we make copies of some datasets available for research and creative purposes. We aim to describe collections in terms of their data format (images, full text, metadata, etc), licences, temporal and geographic scope, originating purpose (e.g. specific digitisation projects or exhibitions) and collection, and related subjects or themes.
We'd love to hear what you've done or made with the data.