Index Catalog // British Library

2017

Dataset

SherlockNet data

Using Convolutional Neural Networks to Explore Over 400 Years of Book Illustrations: Starting from February 2016, as part of the British Library Labs Competition, we embarked on a collaboration with the British Library Labs and the British Museum to tag and caption the entire British Library 1M Collection, a set...

Zhao, Luda ; Do, Brian ; Wang, Karen

tagging, Flickr, images, tags, digitised, Microsoft, sherlocknet, and books

2023

Dataset

UK Doctoral Thesis Metadata from EThOS

The data in this collection comprises the bibliographic metadata for all UK doctoral theses listed in EThOS, the UK's national thesis service. We estimate the data covers around 98% of all PhDs ever awarded by UK Higher Education institutions, dating back to 1787. Thesis metadata from every PhD-awarding university in...

British Library ; Rosie, Heather

higher education, student, UK, dissertations, PhD, theses, doctoral, ethos, thesis, and research

2014

Dataset

Digitised Books - Images of the bound covers of books. c. 1510 - c. 1900. JPG

The dataset comprises c. 61,561 images identified as 'Book Covers' from the British Library's Flickr Commons collections, dating between c. 1510 - c. 1900.

British Library ; British Library Labs

digitised, books, Microsoft, images, and bookcovers

2014

Dataset

Digitised Books - Images identified as Plates. c. 1528 - c. 1900. JPG

The dataset comprises c. 385,237 images identified as 'Plates' from the British Library's Flickr Commons collections, dating between c. 1528 – c. 1900. The images were algorithmically gathered from 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900; Plates have currently been...

British Library ; British Library Labs

digitised, books, Microsoft, images, and plates

2014

Dataset

Digitised Books - Images identified as Embellishments. c. 1510 - c. 1900. JPG

The images were algorithmically gathered from 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The images are in .JPEG format.

British Library ; British Library Labs

embellishments, digitised, books, Microsoft, and images

2014

Dataset

Digitised Books - Images identified as Medium Sized Images. c. 1567 - c. 1900. JPG

The dataset comprises c. 217,101 images identified as 'Medium Sized Images' from the British Library's Flickr Commons collections, dating between c. 1567 - c. 1900. The images were algorithmically gathered from 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900; Medium Sized...

British Library ; British Library Labs

digitised, books, Microsoft, images, and medium sized images

2014

Dataset

Digitised Books. c. 1510 - c. 1900. JSON (OCR derived text)

The dataset comprises text created by OCR from the 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in JavaScript Object Notation (JSON) text...

British Library ; British Library Labs

digitised, books, Microsoft, and JSON

2014

Dataset

Digitised Books - Images identified as Medium Sized Images. c. 1567 - c. 1900. JPG

The dataset comprises c. 217,101 images identified as 'Medium Sized Images' from the British Library's Flickr Commons collections, dating between c. 1567 - c. 1900. The images were algorithmically gathered from 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900; Medium Sized...

British Library ; British Library Labs

medium sized images, books, images, digitised, and Microsoft

2016

Dataset

OCR text derived from digitised books published 1900 - c. 1946. ALTO XML.

Unfortunately we are unable to make this dataset available due to copyright reasons. This set consists 1251 volumes, published between 1900-1946. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history,...

British Library ; British Library Labs

2014

Dataset

OCR text derived from digitised books published 1880 - 1889 in ALTO XML

This set consists 10856 volumes, published between 1880-1889. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...

British Library ; British Library Labs

XML, books, digitised, metadata, Microsoft, ALTO, and nineteenth century

2014

Dataset

OCR text derived from digitised books published c. 1510 - 1699 in ALTO XML

This set consists 693 volumes, published between c. 1510 - 1699. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and...

British Library ; British Library Labs

XML, sixteenth century, books, digitised, seventeenth century, Microsoft, ALTO, and metadata

2014

Dataset

OCR text derived from digitised books published 1840 - 1849 in ALTO XML

This set consists 4070 volumes, published between 1840-1849. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...

British Library ; British Library Labs

XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO

2014

Dataset

OCR text derived from digitised books published 1700 - 1799 in ALTO XML.

The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO) Extensible Markup Language (XML) format.

British Library ; British Library Labs

XML, books, digitised, metadata, Microsoft, eighteenth century, and ALTO

2014

Dataset

OCR text derived from digitised books published 1800 - 1809 in ALTO XML

This set consists 1502 volumes, published between 1800-1809. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...

British Library ; British Library Labs

XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO

2014

Dataset

OCR text derived from digitised books published 1810 - 1819 in ALTO XML

This set consists 2338 volumes, published between 1810-1819. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...

British Library ; British Library Labs

XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO

2014

Dataset

OCR text derived from digitised books published 1820 - 1829 in ALTO XML

This set consists 2739 volumes, published between 1820-1829. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...

British Library ; British Library Labs

XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO

2014

Dataset

OCR text derived from digitised books published 1830 - 1839 in ALTO XML

This set consists 2639 volumes, published between 1830-1839. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...

British Library ; British Library Labs

XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO

2014

Dataset

OCR text derived from digitised books published 1870 - 1879 in ALTO XML

This set consists 8630 volumes, published between 1870-1879. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...

British Library ; British Library Labs

XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO

2014

Dataset

OCR text derived from digitised books published 1850 - 1859 in ALTO XML

This set consists 5818 volumes, published between 1850-1859. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...

British Library ; British Library Labs

XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO

2014

Dataset

OCR text derived from digitised books published 1860 - 1869 in ALTO XML

This set consists 7498 volumes, published between 1860-1869. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...

British Library ; British Library Labs

XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO

2014

Dataset

OCR text derived from digitised books published 1890 - 1899 in ALTO XML

This set consists 14847 volumes, published between 1890-1899. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...

British Library ; British Library Labs

XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO

2018

3D image

Jane Austen's Desk (closed view), Add 86841

A wooden writing desk used by Jane Austen which was given to her by her father in 1794. This portable ‘writing-box’ opens to provide a slope on which to write. It has various compartments, including a space for an ink pot and a lockable drawer for paper and valuables. When...

British Library

wooden, desk, Jane Austen, 3D model, and writing

2018

3D image

Pen Box, Foster 926

Pen box - Foster 926 Persian pen box, hand-painted lacquered, 1800-1900. http://searcharchives.bl.uk/primo_library/libweb/action/search.do?dscnt=0&frbg=&scp.scps=scope%3A%28BL%29&tab=local&dstmp=1381498036968&srt=rank&ct=search&mode=Basic&dum=true&indx=1&vl(freeText0)=032-003264750&fn=search&vid=IAMS_VU2

British Library

Persian, box, pen, painted, 3D model, and lacquer

2017

Dataset

Digitised maps of the former British East Africa

This dataset comprises 581 images of maps of the former British East Africa created between 1890 and 1940 and a spreadsheet of related catalogue records. All Open Government Licence v1.0 (OGL). A user-friendly geographical search index of the maps is available on Google Maps. These JPEG files were converted from...

Dykes, Nick

War Office Archive, documents, Intelligence, Uganda, East Africa, British East Africa, Maps, Military maps, and Kenya

2018

Dataset

UK Doctoral Thesis Metadata from EThOS

This dataset has been superseded by a more recent version: https://doi.org/10.23636/1137 If you require access to an earlier version, please email openaccess@bl.uk, including the dataset title, date, and DOI in your request. The data in this collection comprises the bibliographic metadata for all UK doctoral theses listed in EThOS, the UK's national thesis service. We...

British Library ; Rosie, Heather

Higher education, dissertations, HE, research, doctoral, student, UK, theses, ethos, and thesis

2021

Dataset

UK Doctoral Thesis Metadata from EThOS

This dataset has been superseded by a more recent version: https://doi.org/10.23636/ybpt-nh33 If you require access to an earlier version, please email openaccess@bl.uk, including the dataset title, date, and DOI in your request. The data in this collection comprises the bibliographic metadata for all UK doctoral theses listed in EThOS, the...

British Library ; Rosie, Heather

higher education, ethos, dissertations, thesis, research, PhD, doctoral, student, UK, and theses

2018

3D image

Qur'an Case, Or 13706 B

This West African carrying case is probably from Nigeria, and was made for a richly illuminated copy of the Qur’an (OR 13706A). It is made of leather, fabric and pulp board, and probably dates from the nineteenth century. In West Africa, manuscripts were not usually bound, and cases such as...

British Library

Nigeria, Quran, case, West Africa, and 3D model

2018

3D image

The Constance Graduale, IB.15154

Printed in Southern Germany c. 1473, the ‘Constance’ Graduale (IB. 15154) is the earliest extant book of printed music using moveable type. The copy in the British Library’s music collection is the only known surviving copy that is complete.

British Library

Graduale, Constance Graduale, music, printed, Medieval, and 3D model

2018

3D image

Jane Austen's Desk (open view 2), Add 86841

A wooden writing desk used by Jane Austen which was given to her by her father in 1794. This portable ‘writing-box’ opens to provide a slope on which to write. It has various compartments, including a space for an ink pot and a lockable drawer for paper and valuables. When...

British Library

wooden, desk, Jane Austen, 3D model, and writing

2018

3D image

Jane Austen's Desk (open view 1), Add 86841

A wooden writing desk used by Jane Austen which was given to her by her father in 1794. This portable ‘writing-box’ opens to provide a slope on which to write. It has various compartments, including a space for an ink pot and a lockable drawer for paper and valuables. When...

British Library

wooden, desk, Jane Austen, 3D model, and writing

2016

3D image

Oracle Bone, Or 7694/1655+1672

An inscribed oracle bone (jia gu 甲骨) from the Couling-Chalfant collection at the British Library. Oracle bones were animal bones, usually ox shoulder bones or the underside of turtle shells, used for divination rituals in ancient China. Dating to the Shang dynasty (c. 1600 – 1050 BC), they bear the...

British Library

oracle bone, Chinese, 3D model, and Shang Dynasty

2016

3D image

Silk Mantle, Or 13027

Silk mantle (textile cover) for a Torah scroll. Modelled for the British Library’s Hebrew Manuscript Digitisation Project, funded by The Polonsky Foundation. Modelling: Adi Keinan-Schoonbaert Special thanks to Tony Grant (photography) and Liz Rose (conservation). To view the mantle and its scroll, visit: http://www.bl.uk/manuscripts/FullDisplay.aspx?ref=Or_13027

British Library

scroll, manuscript, silk, mantle, 3D model, and Hebrew

2018

3D image

Brass Block

A ‘block’ is a cast in brass. One of a multitude of blocks used to impress gold decoration in patterns and designs on book covers. Some block designs signify collections (such as the Cottons or Harleys) much like individual coats of arms relate to hereditary lines. Other blocks are purely...

British Library

design, block, gold, 3D model, conservation, and brass

2018

3D image

Lacquer Box, Or 6682

Lacquer box containing a manuscript with calligraphy by the Qianlong Emperor (Qing Dynasty). China. (1711-99)

British Library

Qing Dynasty, laquer, box, 3D model, and China

2015

3D image

Book of Esther, Or 1087

The Book of Esther. Unknown origin,15th century. Modelled for the British Library’s Hebrew Manuscripts Digitisation Project, funded by The Polonsky Foundation. Modelling: Adi Keinan-Schoonbaert Special thanks to Kristin Phelps. To view the entire manuscript, visit: http://www.bl.uk/manuscripts/FullDisplay.aspx?ref=Or_1087

British Library

Megila, manuscript, Esther, Scroll, 3D model, and Hebrew

2016

3D image

Oracle Bone, Or 7694/1988 (two parts)

An inscribed oracle bone (jia gu 甲骨) in two parts from the Couling-Chalfant collection at the British Library. Oracle bones were animal bones, usually ox shoulder bones or the underside of turtle shells, used for divination rituals in ancient China. Dating to the Shang dynasty (c. 1600 – 1050 BC),...

British Library

oracle bone, Chinese, 3D model, and Shang Dynasty

2016

3D image

Oracle Bone, Or 7694/1988 Part 2

An inscribed oracle bone (jia gu 甲骨) from the Couling-Chalfant collection at the British Library. Oracle bones were animal bones, usually ox shoulder bones or the underside of turtle shells, used for divination rituals in ancient China. Dating to the Shang dynasty (c. 1600 – 1050 BC), they bear the...

British Library

oracle bone, Chinese, 3D model, and Shang Dynasty

2018

3D image

Soldier Model, Foster 979

Ami Chand (‘Ummeechund’), a trooper in Skinner’s Horse who saved the life of William Fraser (1784-1835) in 1819. Terracotta model painted in polychrome, with some evidence in wires and an armature; 28.5 cm high. The model is based on a portrait originally featured in the Fraser Albums of Company drawings...

British Library

soldier, army, terracotta, 3D model, trooper, and India

2016

3D image

Oracle Bone, Or 7694/1988 Part 1

An inscribed oracle bone (jia gu 甲骨) from the Couling-Chalfant collection at the British Library. Oracle bones were animal bones, usually ox shoulder bones or the underside of turtle shells, used for divination rituals in ancient China. Dating to the Shang dynasty (c. 1600 – 1050 BC), they bear the...

British Library

oracle bone, Chinese, 3D model, and Shang Dynasty

2018

3D image

Menak, Add MS 12309

Menak, Javanese manuscript containing stories of Amir Hamza, uncle of the Prophet Muhammad, written in Javanese in Arabic script, written between 1792 and 1812. 1,450 folios of Javanese paper. http://searcharchives.bl.uk/primo_library/libweb/action/display.do?tabs=detailsTab&ct=display&fn=search&doc=IAMS040-002042067&indx=1&recIds=IAMS040-002042067&recIdxs=0&elementId=0&renderMode=poppedOut&displayMode=full&frbrVersion=&dscnt=0&frbg=&scp.scps=scope%3A%28BL%29&tab=local&dstmp=1526286592148&srt=rank&mode=Basic&&dum=true&vl(freeText0)=menak&vid=IAMS_VU2

British Library

Javanese, manuscript, Arabic, 3D model, Muhammad, and Menak

2016

3D image

Oracle Bone, Or 7694/1580

An inscribed oracle bone (jia gu 甲骨) from the Couling-Chalfant collection at the British Library. Oracle bones were animal bones, usually ox shoulder bones or the underside of turtle shells, used for divination rituals in ancient China. Dating to the Shang dynasty (c. 1600 – 1050 BC), they bear the...

British Library

oracle bone, Chinese, 3D model, and Shang Dynasty

2015

3D image

Pentateuch, Add MS 4709

Pentateuch with the Five Scrolls, Psalms, Job and the Haftarot. Italy, 1486. Modelled for the British Library’s Hebrew Manuscript Digitisation Project, funded by The Polonsky Foundation. Modelling: Adi Keinan-Schoonbaert Special thanks to Kristin Phelps. For the entire manuscript, visit: http://www.bl.uk/manuscripts/FullDisplay.aspx?index=0&ref=Add_MS_4709

British Library

manuscript, Bible, 3D model, Hebrew, Pentateuch, and Torah

2016

3D image

Oracle Bone, Or 7694/1595

An inscribed oracle bone (jia gu 甲骨) from the Couling-Chalfant collection at the British Library. Oracle bones were animal bones, usually ox shoulder bones or the underside of turtle shells, used for divination rituals in ancient China. Dating to the Shang dynasty (c. 1600 – 1050 BC), they bear the...

British Library

oracle bone, Chinese, 3D model, and Shang Dynasty

2018

3D image

Lion Model, Foster 872

Lion - Foster 872 A model of a lion. By Gangaram, 1790. Wax, possibly dhuna, the aromatic gum of the shal tree (Shorea robusta), painted; size of wooden base: 20.5 x 9.75 x 2cm; animal 12.5cm at highest point of mane. F872 Our lion also featured on a blog: http://britishlibrary.typepad.co.uk/asian-and-african/2014/08/the-maratha-artist-gangaram-cintaman-tambat.html

British Library

India, animal, 3D model, wax, model, and lion

2016

3D image

Esther Scroll, Add MS 11831

Esther scroll (Megilat Ester) in an ivory case. Unknown origin, 17th century. Modelled for the British Library’s Hebrew Manuscript Digitisation Project, funded by The Polonsky Foundation. Modelling: Adi Keinan-Schoonbaert Special thanks to Kristin Phelps, Thomas Flynn and Ilana Tahan. Model is an edited approximation of the actual object For the...

British Library

Megila, manuscript, Esther, Scroll, 3D model, and Hebrew

2015

Dataset

UK Doctoral Theses (EThOS) Abstracts and Metadata - 01/03/2015. XLS.

This dataset has been superseded by a more recent version: https://doi.org/10.22021/ETHOSCSV201810 If you require access to an earlier version, please email openaccess@bl.uk, including the dataset title, date, and DOI in your request. The data in this collection comprises the bibliographic metadata for all UK doctoral theses listed in EThOS, the UK's national thesis service. We...

British Library ; Rosie, Heather

dissertations, Higher Education, doctoral, and student

2020

Dataset

al-Durr al-naqī fī fann al-mūsīqī (Add MS 23494)

This dataset is a PDF file containing the images and transcription the manuscript titled al-Durr al-naqī fī fann al-mūsīqī الدرّ النقيّ في فنّ الموسيقي by Aḥmad ibn 'Abd al-Raḥmān al-Mawṣilī أحمد بن عبد الرحمن الموصلي. The manuscript was digitised through the British Library Qatar Foundation Partnership, and made available through...

British Library ; Keinan-Schoonbaert, Adi

transcription, Arabic, and OCR

2020

Dataset

UK Doctoral Thesis Metadata from EThOS

This dataset has been superseded by a more recent version (5): https://doi.org/10.23636/1344 If you require access to an earlier version, please email openaccess@bl.uk, including the dataset title, date, and DOI in your request. The data in this collection comprises the bibliographic metadata for all UK doctoral theses listed in EThOS,...

British Library ; Rosie, Heather

higher education, ethos, dissertations, HE, research, PhD, doctoral, student, UK, theses, and thesis

2020

Dataset

Books related to theatre derived from the Digitised 19th Century Books dataset

A dataset derived from the Digitised 19th Century Books dataset which contains books pertaining to theatre written in English. The dataset of 841 items was created by filtering by keywords which are related to different genre of play including Drama, Act, Scene, Play, Comedy, Farce, Pantomime, Tragedy and Shakespeare and...

British Library ; British Library Labs

act, genre, books, metadata, bibliographic, theatre, and play

2020

Dataset

Books related to India from the Digitised 19th Century Books dataset

A dataset which is derived from the Digitised 19th Century books dataset focusing on books related to India. The dataset was created by refining the book title field using keywords related to names used for India during the period, places within India, cultural terms such as 'Hindu' and another term...

British Library ; British Library Labs

books, metadata, bibliographic, and India

2020

Dataset

Books related to 19th Century British Colonies derived from the Digitised 19th Century books dataset

A dataset derived from the Digitised 19th Century Books dataset which contains books related to 19th Century British Colonies. The dataset of 1288 items was created using filtering by keywords of locations and then manually checked for accuracy. The data was augmented with additional columns including 'City', 'Colony Name' and...

British Library ; British Library Labs

Africa, colonialism, Canada, Ceylon, metadata, bibliographic, India, Australia, books, British Colony, and British Colonies

2020

Dataset

Books related to War derived from the Digitised 19th Century Books Dataset

A dataset which is derived from the Digitised 19th Century Books dataset comprising all non-fiction English language books related to armed conflicts. The dataset of 1127 items was developed by refining based on keywords such as 'war', 'battle', 'uprising', 'revolt', 'rebellion', 'invasion' and 'mutiny'. This dataset was curated by students...

British Library ; British Library Labs

non-fiction, War, books, metadata, and bibliographic

2020

Dataset

Books related to the Industrial Revolution derived from the Digitised 19th Century books dataset

A dataset which is a subset of the Digitised 19th Century Books dataset comprising books related to the Industrial Revolution in Britain. The subset of 354 items was refined by using keywords associated with placenames and the topic of industrialism. This dataset was curated by the Aepyi student group at...

British Library ; British Library Labs

books, industrialism, metadata, bibliographic, and Industrial Revolution

2019

Dataset

Latin American books in Digitised 19th century books

A dataset which is derived from the 19th Century Books dataset comprising c.1,100 books which are related to Latin America, written in Spanish, English, German, French, Italian, Swedish and Dutch.

British Library ; British Library Labs

books, Latin America, metadata, and bibliographic

2019

Dataset

Books divided by Genre from the Digitised 19th century books dataset

A dataset derived from the Digitised 19th Century Books dataset which classifies the books by genre (Drama, Poetry, Prose, Music and unidentified). For Drama, Music and Prose several types were identified. For Drama: comedy, play, recitation and tragedy. For Prose: novel, parody, romance, satire, story, history subset of story and...

British Library ; British Library Labs

Music, Genre, Prose, books, Poetry, metadata, bibliographic, and Drama

2019

Dataset

Books containing images about Finland

A dataset derived from the Digitised 19th Century books dataset comprising books with images about Finland, approximately 40 titles. This dataset was compiled by Ruby Dixon a student at Graveney School who completed work experience at British Library Labs in 2016.

British Library ; British Library Labs

books, Finland, metadata, and bibliographic

2019

Dataset

Russian language books in the Digitised 19th century books dataset

A dataset which is a subset of the Digitised 19th Century books dataset comprising Russian Language books. The spreadsheet contains metadata of 585 books in Russian. This dataset was compiled by Nadya Miryanova a student at Lady Eleanor Holles who completed work experience at British Library Labs in 2017.

British Library ; British Library Labs

books, metadata, bibliographic, and Russia

2019

Dataset

UK Doctoral Thesis Metadata from EThOS

This dataset has been superseded by a more recent version: https://doi.org/10.23636/1188 If you require access to an earlier version, please email openaccess@bl.uk, including the dataset title, date, and DOI in your request. The data in this collection comprises the bibliographic metadata for all UK doctoral theses listed in EThOS, the...

British Library ; Rosie, Heather

dissertations, Higher Education, doctoral, and student

2019

Dataset

Ground Truth transcriptions for training OCR of historical Arabic handwritten texts

This dataset comprises 120 digitised images (TIFF files) drawn from a selection of historical Arabic scientific manuscripts (10th-19th century) digitised through the British Library Qatar Foundation Partnership. Also contained are ground truth transcriptions (XML) for each page that can be used for training optical character recognition (OCR) or handwritten text...

British Library ; Keinan-Schoonbaert, Adi

Arabic, transcription, and OCR

2018

Dataset

Judicial Committee of the Privy Council: Linked Appeals Data

The dataset in this collection contains Linked Data about appeal cases heard by the Judicial Committee of the Privy Council between 1860 and 1998. The Judicial Committee of the Privy Council (JCPC) is the final court of appeal for British overseas territories and Crown dependencies, as well as ecclesiastical and...

Middle, Sarah

2019

Dataset

Ground Truth transcriptions for training OCR of historical Bengali printed texts - Transkribus

This dataset comprises 74 digitised images (TIFF files) drawn from a selection of early printed Bengali books (1713-1914) digitised through the Two Centuries of Indian Print project (https://www.bl.uk/projects/two-centuries-of-indian-print). Also contained are ground truth transcriptions (XML) for each page that can be used for training optical character recognition software on historical...

British Library ; Derrick, Tom

OCR, transcription, and Indian

2019

Dataset

Ground Truth transcriptions for training OCR of historical Bengali printed texts - Recognition of Early Indian Printed Documents competition

This dataset comprises 81 digitised images (TIFF files) drawn from a selection of early printed Bengali books (1713-1914) digitised through the Two Centuries of Indian Print project (https://www.bl.uk/projects/two-centuries-of-indian-print). Also contained are ground truth transcriptions (XML) for each page that can be used for training optical character recognition software on historical...

British Library ; Derrick, Tom

Indian, transcription, and OCR

2018

Dataset

Digitised Hebrew Manuscripts: Or 74 to Stowe Ch 297

This dataset comprises 32 digitised Hebrew manuscripts (1000 - 1903), with their shelfmarks in alphabetical order (Or 74 to Stowe Ch 297). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for image...

King, Ellie

manuscripts, Jewish, and Hebrew

2018

Dataset

Digitised Hebrew Manuscripts: Or 1389 to Sloane MS 3173

This dataset comprises 32 digitised Hebrew manuscripts (1200 - 1899), with their shelfmarks in alphabetical order (Or 1389 to Sloane MS 3173). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for image...

King, Ellie

manuscripts, Jewish, and Hebrew

2018

Dataset

Digitised Hebrew Manuscripts: Or 2518 to Or 5834

This dataset comprises 33 digitised Hebrew manuscripts (900 - 1899), with their shelfmarks in alphabetical order (Or 2518 to Or 5834). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for image and...

King, Ellie

manuscripts, Jewish, and Hebrew

2018

Dataset

Digitised Hebrew Manuscripts: Add MS 27169 to Or 12983

This dataset comprises 21 digitised Hebrew manuscripts (1100 - 1899), with their shelfmarks in alphabetical order (Add MS 27169 to Or 12983). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for image...

King, Ellie

manuscripts, Jewish, and Hebrew

2018

Dataset

Italian Academies Project Database Images

Italian Academies 1525-1700 images. This dataset comprises 1084 image files in JPEG format. The images include emblems, portraits and title pages for academies, people and works. The XML metadata associated with these images contains references to their file names: https://doi.org/10.21250/iad1

Gianfrancesco, Lorenza ; Testa, Simone ; Everson, Jane ; Reidy, Dennis ; Sampson, Lisa …

Italian and Academies

2017

Dataset

Italian Academies Project Database XML Records

Italian Academies 1525-1700 metadata.This dataset comprises 8598 XML records representing 587 Academies, 7100 people and 911 works. Some of the XML files in this dataset contain links to images, which are available in the Italian Academies Project Database Images dataset: https://doi.org/10.21250/iad2 XML nodes with the 'ImageId' attribute contain the filename...

Gianfrancesco, Lorenza ; Testa, Simone ; Everson, Jane ; Reidy, Dennis ; Sampson, Lisa …

Italian, XML, and Academies

2018

Dataset

Linked Open British National Bibliography - Forthcoming Books N-Triples and RDF/XML

This dataset includes metadata for forthcoming books to be published or distributed in the UK.

Deliot, Corine

British National Bibliography, forthcoming, linked open data, BNB, N-Triples, RDF/XML, NT, CIP, and metadata

Dataset

UK Selective Web Archive Classification Dataset. 1996 - 2010. TSV.

The dataset comprises a manually curated selective archive produced by UKWA which includes the classification of sites into a two-tiered subject hierarchy. In partnership with the Internet Archive and JISC, UKWA had obtained access to the subset of the Internet Archive’s web collection that relates to the UK. The JISC...

UK Web Archive

archive, web domain dataset, JISC UK, classification dataset, UKWA Open Data, and 1996-2014

2018

Dataset

Linked Open British National Bibliography - Serials. 1950- N-Triples and RDF/XML

This dataset includes metadata for serials published or distributed in the UK since 1950.

Deliot, Corine

British National Bibliography, serials, linked open data, BNB, N-Triples, RDF/XML, NT, and metadata

Dataset

JISC UK Web Domain Dataset Host Link Graph. 1996 - 2010. TSV.

The dataset comprises ~2.5 billion 200 OK responses from the 1996 - 2010 tranche of the JISC UK Web Domain Dataset which have been scanned for hyperlinks. For each link, UKWA extracts the host that the link targets, and uses this to build up a picture of which hosts have...

UKWA Open Data

archive, 1996-2012, web domain dataset, JISC UK, host link graph, and UKWA Open Data

Dataset

JISC UK Web Domain Dataset Format Profile. 1996 - 2010.

The dataset is a format profile, summarising media type (MIME type) data formats contained within all of the HTTP 200 OK responses in the 1996 - 2010 tranche of the JISC UK Web Domain Dataset. In partnership with the Internet Archive and JISC, UKWA had obtained access to the subset...

UK Web Archive

archive, 1996-2010, web domain dataset, JISC UK, UKWA Open Data, and format profile

2013

Dataset

JISC UK Web Domain Dataset Geoindex. 1996 - 2010. TSV.

The dataset comprises ~2.5 billion 200 OK responses in the 1996 - 2010 tranche of the JISC UK Web Domain Dataset Dataset which have been scanned for geographic references - specifically postcodes. This set of postcode citations, found at particular URLs and crawled at particular times, forms an historical geoindex...

UK Web Archive

archive, 1996-2011, JISC UK, geoindex, UKWA Open Data, and web domain dataset

Dataset

JISC UK Web Domain Dataset Crawled URL Index. 1996 - 2013. CDX.

The dataset comprises original compound index (CDX) files that have been re-assembled into 18 separate CDX files for each year of crawling activity represented (1996 - 2013). Please note that the individual CDX files are not sorted. In order to enable access to web archives, UKWA uses CDX files to...

UKWA Open Data

archive, 1996-2013, crawled URL index, web domain dataset, JISC UK, and UKWA Open Data

2017

Dataset

AAS Card Catalogues: Chinese (Wade Giles)

This dataset contains digitised cards from the Wade-Giles card catalogue.

British Library

AAS, catalogue, Chinese, and card

2015

Dataset

Theatrical playbills from Britain and Ireland

The dataset comprises 264 volumes of digitised theatrical playbills published between 1660 – 1902 (mostly 19th century) from England, Scotland, Wales and Ireland. Digitised from the British Library's physical collection of over 500 volumes of playbills. The dataset in Portable Document Format (PDF). The playbills cover theatres in Bath (Royal),...

British Library Labs ; Kirk, Tanya

singlesheet, playbill, and playbills

2017

Dataset

Digitised Hebrew Manuscripts: Add MS 10456 - Add MS 17058

This dataset comprises 25 digitised Hebrew manuscripts (1200 - 1599), with their shelfmarks in alphabetical order (Add MS 10456 - Add MS 17058). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for...

Keinan-Schoonbaert, Adi

manuscripts, Jewish, and Hebrew

2017

Dataset

Digitised Hebrew Manuscripts: Or 2510 - Or 2588

This dataset comprises 40 digitised Hebrew manuscripts (900 - 1747; unknown date), with their shelfmarks in alphabetical order (Or 2510 - Or 2588). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for...

Keinan-Schoonbaert, Adi

manuscripts, Jewish, and Hebrew

2017

Dataset

Digitised Hebrew Manuscripts: Add MS 18229 - Add MS 26897

This dataset comprises 27 digitised Hebrew manuscripts (1100 - 1799; unknown date), with their shelfmarks in alphabetical order (Add MS 18229 - Add MS 26897). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be...

Keinan-Schoonbaert, Adi

manuscripts, Jewish, and Hebrew

2017

Dataset

Digitised Hebrew Manuscripts: Or 2626 - Or 6425

This dataset comprises 43 digitised Hebrew manuscripts (920 - 1845; unknown date), with their shelfmarks in alphabetical order (Or 2626 - Or 6425). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for...

Keinan-Schoonbaert, Adi

manuscripts, Jewish, and Hebrew

2017

Dataset

Digitised Hebrew Manuscripts: Harley MS 5709 - Or 11016

This dataset comprises 25 digitised Hebrew manuscripts (1100 - 1699; unknown date), with their shelfmarks in alphabetical order (Harley MS 5709 - Or 11016). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used...

Keinan-Schoonbaert, Adi

manuscripts, Jewish, and Hebrew

2017

Dataset

Digitised Hebrew Manuscripts: Add MS 9399 - Harley MS 5708

This dataset comprises 27 digitised Hebrew manuscripts (1200 - 1499; unknown date), with their shelfmarks in alphabetical order (Add MS 9399 - Harley MS 5708). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be...

Keinan-Schoonbaert, Adi

manuscripts, Jewish, and Hebrew

2017

Dataset

Digitised Hebrew Manuscripts: Add MS 26938 - Add MS 9398

This dataset comprises 35 digitised Hebrew manuscripts (1100 - 1816; unknown date), with their shelfmarks in alphabetical order (Add MS 26938 - Add MS 9398). These manuscripts are out of copyright. We would appreciate it if users could read our Ethical terms of use guide before reusing our Hebrew manuscripts...

Keinan-Schoonbaert, Adi

manuscripts, Jewish, and Hebrew

2017

Dataset

Digitised Hebrew Manuscripts: Or 2210 - Or 2364

This dataset comprises 21 digitised Hebrew manuscripts (1000 - 1655; unknown date), with their shelfmarks in alphabetical order (Or 2210 - Or 2364). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for...

Keinan-Schoonbaert, Adi

manuscripts, Jewish, and Hebrew

2017

Dataset

Digitised Hebrew Manuscripts: Or 2365 - Or 2405

This dataset comprises 27 digitised Hebrew manuscripts (1200 - 1867; unknown date), with their shelfmarks in alphabetical order (Or 2365 - Or 2405). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for...

Keinan-Schoonbaert, Adi

manuscripts, Jewish, and Hebrew

2017

Dataset

Digitised Hebrew Manuscripts: Or 1103 - Or 2201

This dataset comprises 46 digitised Hebrew manuscripts (1250 - 1699; unknown date), with their shelfmarks in alphabetical order (Or 1103 - Or 2201). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for...

Keinan-Schoonbaert, Adi

manuscripts, Jewish, and Hebrew

2017

Dataset

Digitised Hebrew Manuscripts: Or 2406 - Or 2509

This dataset comprises 41 digitised Hebrew manuscripts (1300 - 1799; unknown date), with their shelfmarks in alphabetical order (Or 2406 - Or 2509). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for...

Keinan-Schoonbaert, Adi

manuscripts, Jewish, and Hebrew

2017

Dataset

Digitised Hebrew Manuscripts: Or 63 - Or 9882

This dataset comprises 35 digitised Hebrew manuscripts (0900 - 1899), with their shelfmarks in alphabetical order (Or 63 - Or 9882). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for image and...

Keinan-Schoonbaert, Adi

manuscripts, Jewish, and Hebrew

2017

Dataset

Digitised Hebrew Manuscripts: Add 10455 to Or 1099

This dataset comprises 39 digitised Hebrew manuscripts (1200 - 1799), with their shelfmarks in alphabetical order (Add 10455 - Or 1099). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for image and...

Cronin, Catherine

manuscripts, Jewish, and Hebrew

2017

Dataset

Digitised Hebrew Manuscripts: Harley 5772 to Or 14580

This dataset comprises 42 digitised Hebrew manuscripts (1200 - 1871), with their shelfmarks in alphabetical order (Harley 5772 - Or 14580). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for image and...

Cronin, Catherine

manuscripts, Jewish, and Hebrew

2017

Dataset

Digitised Hebrew Manuscripts: Add Ch 1250 to Or 11625

This dataset comprises 78 digitised Hebrew manuscripts (1182 - 1895), with their shelfmarks in alphabetical order (Add Ch 1250 to Or 11625). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for image...

Cronin, Catherine

manuscripts, Jewish, and Hebrew

2017

Dataset

Digitised Hebrew Manuscripts: Or 46 to Sloane 2642

This dataset comprises 35 digitised Hebrew manuscripts (1000 - 1899), with their shelfmarks in alphabetical order (Or 46 - Sloane 2642). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for image and...

Cronin, Catherine

manuscripts, Jewish, and Hebrew

2017

Dataset

Digitised Hebrew Manuscripts: Add MS 5242 to Arundel Or 50

This dataset comprises 22 digitised Hebrew manuscripts (1100 - 1799), with their shelfmarks in alphabetical order (Add MS 5242 to Arundel Or 50). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for...

King, Ellie

manuscripts, Jewish, and Hebrew

2017

Dataset

Digitised Hebrew Manuscripts: Metadata

This dataset contains metadata (TEI XML catalogue records) of all British Library digitised Hebrew manuscripts. These TEI XML files can be opened and manipulated using an XML editor.

Keinan-Schoonbaert, Adi

manuscripts, Jewish, and Hebrew

2018

Dataset

Digitised Quarterly Lists XML and Metadata

Two Centuries of Indian Print 1867-1947. The files in this dataset are derived from the British Library’s collection of bound volume Quarterly Lists: printed catalogue records of Indian books published quarterly and by province of British India between 1867 and 1947. The dataset comprises text from the collection of digitised...

Derrick, Tom

XML, books, Indian, ALTO, and metadata

2016

Dataset

Digitised Quarterly Lists PDFs and Metadata

The files in this dataset are derived from the British Library’s collection of bound volume Quarterly Lists: printed catalogue records of Indian books published quarterly and by province of British India between 1867 and 1947. The dataset comprises full-text searchable PDFs of 215 volumes as well as the associated metadata...

Derrick, Tom

books, Indian, and metadata

2015

Dataset

Volumes of Lysons Collectanea (Amusements), comprising broadsides, cuttings, advertisements on amusements 1660-1840

The dataset comprises nine digitised volumes of a collection of broadsides, cuttings and advertisements, relating to public exhibitions and places of amusement from 1660 - 1840 (with OCR-derived text.) Part of the Lysons Collectanea collection.

British Library

amusements, text, newspapers, broadsides, OCR, and adverts

2015

Dataset

Volumes of Lysons Collectanea (Trades), comprising advertisements, cuttings, and illustrations relating to trades, professions, medical cures. 1660-1825.

The dataset comprises the OCR text derived from four digitised volumes of a collection of advertisements, cuttings and illustrations relating to trades, professions and medical cures from 1660 - 1825.

British Library

text, newspapers, OCR, trades, and adverts

2015

Dataset

Volumes of Madden's cuttings, views, and pamphlets about the British Museum. 1755-1870.

The dataset comprises four digitised volumes of a collection of cuttings, views and pamphlets made by Sir Frederic Madden about the British Museum, dating 1755 - 1870 (with OCR-derived text.)

British Library

British Museum, text, and OCR

Research Repository

Search Constraints

Search Results

2017

Dataset

2023

Dataset

2014

Dataset

2014

Dataset

2014

Dataset

2014

Dataset

2014

Dataset

2014

Dataset

2016

Dataset

2014

Dataset

2014

Dataset

2014

Dataset

2014

Dataset

2014

Dataset

2014

Dataset

2014

Dataset

2014

Dataset

2014

Dataset

2014

Dataset

2014

Dataset

2014

Dataset

2018

3D image

2018

3D image

2017

Dataset

2018

Dataset

2021

Dataset

2018

3D image

2018

3D image

2018

3D image

2018

3D image

2016

3D image

2016

3D image

2018

3D image

2018

3D image

2015

3D image

2016

3D image

2016

3D image

2018

3D image

2016