Search Constraints
Search Results
-
Dataset
SherlockNet data
Using Convolutional Neural Networks to Explore Over 400 Years of Book Illustrations: Starting from February 2016, as part of the British Library Labs Competition, we embarked on a collaboration with the British Library Labs and the British Museum to tag and caption the entire British Library 1M Collection, a set...Zhao, Luda ; Do, Brian ; Wang, Karen
tagging, Flickr, images, tags, digitised, Microsoft, sherlocknet, and books
-
Dataset
UK Doctoral Thesis Metadata from EThOS
The data in this collection comprises the bibliographic metadata for all UK doctoral theses listed in EThOS, the UK's national thesis service. We estimate the data covers around 98% of all PhDs ever awarded by UK Higher Education institutions, dating back to 1787. Thesis metadata from every PhD-awarding university in...British Library ; Rosie, Heather
higher education, student, UK, dissertations, PhD, theses, doctoral, ethos, thesis, and research
-
Dataset
Digitised Books - Images of the bound covers of books. c. 1510 - c. 1900. JPG
The dataset comprises c. 61,561 images identified as 'Book Covers' from the British Library's Flickr Commons collections, dating between c. 1510 - c. 1900.British Library ; British Library Labs
digitised, books, Microsoft, images, and bookcovers
-
Dataset
Digitised Books - Images identified as Plates. c. 1528 - c. 1900. JPG
The dataset comprises c. 385,237 images identified as 'Plates' from the British Library's Flickr Commons collections, dating between c. 1528 – c. 1900. The images were algorithmically gathered from 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900; Plates have currently been...British Library ; British Library Labs
-
Dataset
Digitised Books - Images identified as Embellishments. c. 1510 - c. 1900. JPG
The images were algorithmically gathered from 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The images are in .JPEG format.British Library ; British Library Labs
embellishments, digitised, books, Microsoft, and images
-
Dataset
Digitised Books - Images identified as Medium Sized Images. c. 1567 - c. 1900. JPG
The dataset comprises c. 217,101 images identified as 'Medium Sized Images' from the British Library's Flickr Commons collections, dating between c. 1567 - c. 1900. The images were algorithmically gathered from 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900; Medium Sized...British Library ; British Library Labs
digitised, books, Microsoft, images, and medium sized images
-
Dataset
Digitised Books. c. 1510 - c. 1900. JSON (OCR derived text)
The dataset comprises text created by OCR from the 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in JavaScript Object Notation (JSON) text...British Library ; British Library Labs
-
Dataset
Digitised Books - Images identified as Medium Sized Images. c. 1567 - c. 1900. JPG
The dataset comprises c. 217,101 images identified as 'Medium Sized Images' from the British Library's Flickr Commons collections, dating between c. 1567 - c. 1900. The images were algorithmically gathered from 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900; Medium Sized...British Library ; British Library Labs
medium sized images, books, images, digitised, and Microsoft
-
Dataset
OCR text derived from digitised books published 1900 - c. 1946. ALTO XML.
Unfortunately we are unable to make this dataset available due to copyright reasons. This set consists 1251 volumes, published between 1900-1946. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history,...British Library ; British Library Labs
-
Dataset
OCR text derived from digitised books published 1880 - 1889 in ALTO XML
This set consists 10856 volumes, published between 1880-1889. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, ALTO, and nineteenth century
-
Dataset
OCR text derived from digitised books published c. 1510 - 1699 in ALTO XML
This set consists 693 volumes, published between c. 1510 - 1699. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and...British Library ; British Library Labs
XML, sixteenth century, books, digitised, seventeenth century, Microsoft, ALTO, and metadata
-
Dataset
OCR text derived from digitised books published 1840 - 1849 in ALTO XML
This set consists 4070 volumes, published between 1840-1849. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO
-
Dataset
OCR text derived from digitised books published 1700 - 1799 in ALTO XML.
The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO) Extensible Markup Language (XML) format.British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, eighteenth century, and ALTO
-
Dataset
OCR text derived from digitised books published 1800 - 1809 in ALTO XML
This set consists 1502 volumes, published between 1800-1809. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO
-
Dataset
OCR text derived from digitised books published 1810 - 1819 in ALTO XML
This set consists 2338 volumes, published between 1810-1819. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO
-
Dataset
OCR text derived from digitised books published 1820 - 1829 in ALTO XML
This set consists 2739 volumes, published between 1820-1829. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO
-
Dataset
OCR text derived from digitised books published 1830 - 1839 in ALTO XML
This set consists 2639 volumes, published between 1830-1839. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO
-
Dataset
OCR text derived from digitised books published 1870 - 1879 in ALTO XML
This set consists 8630 volumes, published between 1870-1879. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO
-
Dataset
OCR text derived from digitised books published 1850 - 1859 in ALTO XML
This set consists 5818 volumes, published between 1850-1859. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO
-
Dataset
OCR text derived from digitised books published 1860 - 1869 in ALTO XML
This set consists 7498 volumes, published between 1860-1869. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO
-
Dataset
OCR text derived from digitised books published 1890 - 1899 in ALTO XML
This set consists 14847 volumes, published between 1890-1899. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO
-
3D image
Jane Austen's Desk (closed view), Add 86841
A wooden writing desk used by Jane Austen which was given to her by her father in 1794. This portable ‘writing-box’ opens to provide a slope on which to write. It has various compartments, including a space for an ink pot and a lockable drawer for paper and valuables. When...British Library
wooden, desk, Jane Austen, 3D model, and writing
-
3D image
Pen Box, Foster 926
Pen box - Foster 926 Persian pen box, hand-painted lacquered, 1800-1900. http://searcharchives.bl.uk/primo_library/libweb/action/search.do?dscnt=0&frbg=&scp.scps=scope%3A%28BL%29&tab=local&dstmp=1381498036968&srt=rank&ct=search&mode=Basic&dum=true&indx=1&vl(freeText0)=032-003264750&fn=search&vid=IAMS_VU2British Library
-
Dataset
Digitised maps of the former British East Africa
This dataset comprises 581 images of maps of the former British East Africa created between 1890 and 1940 and a spreadsheet of related catalogue records. All Open Government Licence v1.0 (OGL). A user-friendly geographical search index of the maps is available on Google Maps. These JPEG files were converted from...Dykes, Nick
War Office Archive, documents, Intelligence, Uganda, East Africa, British East Africa, Maps, Military maps, and Kenya
-
Dataset
UK Doctoral Thesis Metadata from EThOS
This dataset has been superseded by a more recent version: https://doi.org/10.23636/1137 If you require access to an earlier version, please email openaccess@bl.uk, including the dataset title, date, and DOI in your request. The data in this collection comprises the bibliographic metadata for all UK doctoral theses listed in EThOS, the UK's national thesis service. We...British Library ; Rosie, Heather
Higher education, dissertations, HE, research, doctoral, student, UK, theses, ethos, and thesis
-
Dataset
UK Doctoral Thesis Metadata from EThOS
This dataset has been superseded by a more recent version: https://doi.org/10.23636/ybpt-nh33 If you require access to an earlier version, please email openaccess@bl.uk, including the dataset title, date, and DOI in your request. The data in this collection comprises the bibliographic metadata for all UK doctoral theses listed in EThOS, the...British Library ; Rosie, Heather
higher education, ethos, dissertations, thesis, research, PhD, doctoral, student, UK, and theses
-
3D image
Qur'an Case, Or 13706 B
This West African carrying case is probably from Nigeria, and was made for a richly illuminated copy of the Qur’an (OR 13706A). It is made of leather, fabric and pulp board, and probably dates from the nineteenth century. In West Africa, manuscripts were not usually bound, and cases such as...British Library
Nigeria, Quran, case, West Africa, and 3D model
-
3D image
The Constance Graduale, IB.15154
Printed in Southern Germany c. 1473, the ‘Constance’ Graduale (IB. 15154) is the earliest extant book of printed music using moveable type. The copy in the British Library’s music collection is the only known surviving copy that is complete.British Library
Graduale, Constance Graduale, music, printed, Medieval, and 3D model
-
3D image
Jane Austen's Desk (open view 2), Add 86841
A wooden writing desk used by Jane Austen which was given to her by her father in 1794. This portable ‘writing-box’ opens to provide a slope on which to write. It has various compartments, including a space for an ink pot and a lockable drawer for paper and valuables. When...British Library
wooden, desk, Jane Austen, 3D model, and writing
-
3D image
Jane Austen's Desk (open view 1), Add 86841
A wooden writing desk used by Jane Austen which was given to her by her father in 1794. This portable ‘writing-box’ opens to provide a slope on which to write. It has various compartments, including a space for an ink pot and a lockable drawer for paper and valuables. When...British Library
wooden, desk, Jane Austen, 3D model, and writing
-
3D image
Oracle Bone, Or 7694/1655+1672
An inscribed oracle bone (jia gu 甲骨) from the Couling-Chalfant collection at the British Library. Oracle bones were animal bones, usually ox shoulder bones or the underside of turtle shells, used for divination rituals in ancient China. Dating to the Shang dynasty (c. 1600 – 1050 BC), they bear the...British Library
oracle bone, Chinese, 3D model, and Shang Dynasty
-
3D image
Silk Mantle, Or 13027
Silk mantle (textile cover) for a Torah scroll. Modelled for the British Library’s Hebrew Manuscript Digitisation Project, funded by The Polonsky Foundation. Modelling: Adi Keinan-Schoonbaert Special thanks to Tony Grant (photography) and Liz Rose (conservation). To view the mantle and its scroll, visit: http://www.bl.uk/manuscripts/FullDisplay.aspx?ref=Or_13027British Library
-
3D image
Brass Block
A ‘block’ is a cast in brass. One of a multitude of blocks used to impress gold decoration in patterns and designs on book covers. Some block designs signify collections (such as the Cottons or Harleys) much like individual coats of arms relate to hereditary lines. Other blocks are purely...British Library
-
3D image
Lacquer Box, Or 6682
Lacquer box containing a manuscript with calligraphy by the Qianlong Emperor (Qing Dynasty). China. (1711-99)British Library
Qing Dynasty, laquer, box, 3D model, and China
-
3D image
Book of Esther, Or 1087
The Book of Esther. Unknown origin,15th century. Modelled for the British Library’s Hebrew Manuscripts Digitisation Project, funded by The Polonsky Foundation. Modelling: Adi Keinan-Schoonbaert Special thanks to Kristin Phelps. To view the entire manuscript, visit: http://www.bl.uk/manuscripts/FullDisplay.aspx?ref=Or_1087British Library
-
3D image
Oracle Bone, Or 7694/1988 (two parts)
An inscribed oracle bone (jia gu 甲骨) in two parts from the Couling-Chalfant collection at the British Library. Oracle bones were animal bones, usually ox shoulder bones or the underside of turtle shells, used for divination rituals in ancient China. Dating to the Shang dynasty (c. 1600 – 1050 BC),...British Library
oracle bone, Chinese, 3D model, and Shang Dynasty
-
3D image
Oracle Bone, Or 7694/1988 Part 2
An inscribed oracle bone (jia gu 甲骨) from the Couling-Chalfant collection at the British Library. Oracle bones were animal bones, usually ox shoulder bones or the underside of turtle shells, used for divination rituals in ancient China. Dating to the Shang dynasty (c. 1600 – 1050 BC), they bear the...British Library
oracle bone, Chinese, 3D model, and Shang Dynasty
-
3D image
Soldier Model, Foster 979
Ami Chand (‘Ummeechund’), a trooper in Skinner’s Horse who saved the life of William Fraser (1784-1835) in 1819. Terracotta model painted in polychrome, with some evidence in wires and an armature; 28.5 cm high. The model is based on a portrait originally featured in the Fraser Albums of Company drawings...British Library
-
3D image
Oracle Bone, Or 7694/1988 Part 1
An inscribed oracle bone (jia gu 甲骨) from the Couling-Chalfant collection at the British Library. Oracle bones were animal bones, usually ox shoulder bones or the underside of turtle shells, used for divination rituals in ancient China. Dating to the Shang dynasty (c. 1600 – 1050 BC), they bear the...British Library
oracle bone, Chinese, 3D model, and Shang Dynasty
-
3D image
Menak, Add MS 12309
Menak, Javanese manuscript containing stories of Amir Hamza, uncle of the Prophet Muhammad, written in Javanese in Arabic script, written between 1792 and 1812. 1,450 folios of Javanese paper. http://searcharchives.bl.uk/primo_library/libweb/action/display.do?tabs=detailsTab&ct=display&fn=search&doc=IAMS040-002042067&indx=1&recIds=IAMS040-002042067&recIdxs=0&elementId=0&renderMode=poppedOut&displayMode=full&frbrVersion=&dscnt=0&frbg=&scp.scps=scope%3A%28BL%29&tab=local&dstmp=1526286592148&srt=rank&mode=Basic&&dum=true&vl(freeText0)=menak&vid=IAMS_VU2British Library
-
3D image
Oracle Bone, Or 7694/1580
An inscribed oracle bone (jia gu 甲骨) from the Couling-Chalfant collection at the British Library. Oracle bones were animal bones, usually ox shoulder bones or the underside of turtle shells, used for divination rituals in ancient China. Dating to the Shang dynasty (c. 1600 – 1050 BC), they bear the...British Library
oracle bone, Chinese, 3D model, and Shang Dynasty
-
3D image
Pentateuch, Add MS 4709
Pentateuch with the Five Scrolls, Psalms, Job and the Haftarot. Italy, 1486. Modelled for the British Library’s Hebrew Manuscript Digitisation Project, funded by The Polonsky Foundation. Modelling: Adi Keinan-Schoonbaert Special thanks to Kristin Phelps. For the entire manuscript, visit: http://www.bl.uk/manuscripts/FullDisplay.aspx?index=0&ref=Add_MS_4709British Library
manuscript, Bible, 3D model, Hebrew, Pentateuch, and Torah
-
3D image
Oracle Bone, Or 7694/1595
An inscribed oracle bone (jia gu 甲骨) from the Couling-Chalfant collection at the British Library. Oracle bones were animal bones, usually ox shoulder bones or the underside of turtle shells, used for divination rituals in ancient China. Dating to the Shang dynasty (c. 1600 – 1050 BC), they bear the...British Library
oracle bone, Chinese, 3D model, and Shang Dynasty
-
3D image
Lion Model, Foster 872
Lion - Foster 872 A model of a lion. By Gangaram, 1790. Wax, possibly dhuna, the aromatic gum of the shal tree (Shorea robusta), painted; size of wooden base: 20.5 x 9.75 x 2cm; animal 12.5cm at highest point of mane. F872 Our lion also featured on a blog: http://britishlibrary.typepad.co.uk/asian-and-african/2014/08/the-maratha-artist-gangaram-cintaman-tambat.htmlBritish Library
-
3D image
Esther Scroll, Add MS 11831
Esther scroll (Megilat Ester) in an ivory case. Unknown origin, 17th century. Modelled for the British Library’s Hebrew Manuscript Digitisation Project, funded by The Polonsky Foundation. Modelling: Adi Keinan-Schoonbaert Special thanks to Kristin Phelps, Thomas Flynn and Ilana Tahan. Model is an edited approximation of the actual object For the...British Library
-
Dataset
UK Doctoral Theses (EThOS) Abstracts and Metadata - 01/03/2015. XLS.
This dataset has been superseded by a more recent version: https://doi.org/10.22021/ETHOSCSV201810 If you require access to an earlier version, please email openaccess@bl.uk, including the dataset title, date, and DOI in your request. The data in this collection comprises the bibliographic metadata for all UK doctoral theses listed in EThOS, the UK's national thesis service. We...British Library ; Rosie, Heather
-
Dataset
al-Durr al-naqī fī fann al-mūsīqī (Add MS 23494)
This dataset is a PDF file containing the images and transcription the manuscript titled al-Durr al-naqī fī fann al-mūsīqī الدرّ النقيّ في فنّ الموسيقي by Aḥmad ibn 'Abd al-Raḥmān al-Mawṣilī أحمد بن عبد الرحمن الموصلي. The manuscript was digitised through the British Library Qatar Foundation Partnership, and made available through...British Library ; Keinan-Schoonbaert, Adi
transcription, Arabic, and OCR
-
Dataset
UK Doctoral Thesis Metadata from EThOS
This dataset has been superseded by a more recent version (5): https://doi.org/10.23636/1344 If you require access to an earlier version, please email openaccess@bl.uk, including the dataset title, date, and DOI in your request. The data in this collection comprises the bibliographic metadata for all UK doctoral theses listed in EThOS,...British Library ; Rosie, Heather
higher education, ethos, dissertations, HE, research, PhD, doctoral, student, UK, theses, and thesis
-
Dataset
Books related to theatre derived from the Digitised 19th Century Books dataset
A dataset derived from the Digitised 19th Century Books dataset which contains books pertaining to theatre written in English. The dataset of 841 items was created by filtering by keywords which are related to different genre of play including Drama, Act, Scene, Play, Comedy, Farce, Pantomime, Tragedy and Shakespeare and...British Library ; British Library Labs
act, genre, books, metadata, bibliographic, theatre, and play
-
Dataset
Books related to India from the Digitised 19th Century Books dataset
A dataset which is derived from the Digitised 19th Century books dataset focusing on books related to India. The dataset was created by refining the book title field using keywords related to names used for India during the period, places within India, cultural terms such as 'Hindu' and another term...British Library ; British Library Labs
books, metadata, bibliographic, and India
-
Dataset
Books related to 19th Century British Colonies derived from the Digitised 19th Century books dataset
A dataset derived from the Digitised 19th Century Books dataset which contains books related to 19th Century British Colonies. The dataset of 1288 items was created using filtering by keywords of locations and then manually checked for accuracy. The data was augmented with additional columns including 'City', 'Colony Name' and...British Library ; British Library Labs
Africa, colonialism, Canada, Ceylon, metadata, bibliographic, India, Australia, books, British Colony, and British Colonies
-
Dataset
Books related to War derived from the Digitised 19th Century Books Dataset
A dataset which is derived from the Digitised 19th Century Books dataset comprising all non-fiction English language books related to armed conflicts. The dataset of 1127 items was developed by refining based on keywords such as 'war', 'battle', 'uprising', 'revolt', 'rebellion', 'invasion' and 'mutiny'. This dataset was curated by students...British Library ; British Library Labs
non-fiction, War, books, metadata, and bibliographic
-
Dataset
Books related to the Industrial Revolution derived from the Digitised 19th Century books dataset
A dataset which is a subset of the Digitised 19th Century Books dataset comprising books related to the Industrial Revolution in Britain. The subset of 354 items was refined by using keywords associated with placenames and the topic of industrialism. This dataset was curated by the Aepyi student group at...British Library ; British Library Labs
books, industrialism, metadata, bibliographic, and Industrial Revolution
-
Dataset
Latin American books in Digitised 19th century books
A dataset which is derived from the 19th Century Books dataset comprising c.1,100 books which are related to Latin America, written in Spanish, English, German, French, Italian, Swedish and Dutch.British Library ; British Library Labs
books, Latin America, metadata, and bibliographic
-
Dataset
Books divided by Genre from the Digitised 19th century books dataset
A dataset derived from the Digitised 19th Century Books dataset which classifies the books by genre (Drama, Poetry, Prose, Music and unidentified). For Drama, Music and Prose several types were identified. For Drama: comedy, play, recitation and tragedy. For Prose: novel, parody, romance, satire, story, history subset of story and...British Library ; British Library Labs
Music, Genre, Prose, books, Poetry, metadata, bibliographic, and Drama
-
Dataset
Books containing images about Finland
A dataset derived from the Digitised 19th Century books dataset comprising books with images about Finland, approximately 40 titles. This dataset was compiled by Ruby Dixon a student at Graveney School who completed work experience at British Library Labs in 2016.British Library ; British Library Labs
books, Finland, metadata, and bibliographic
-
Dataset
Russian language books in the Digitised 19th century books dataset
A dataset which is a subset of the Digitised 19th Century books dataset comprising Russian Language books. The spreadsheet contains metadata of 585 books in Russian. This dataset was compiled by Nadya Miryanova a student at Lady Eleanor Holles who completed work experience at British Library Labs in 2017.British Library ; British Library Labs
books, metadata, bibliographic, and Russia
-
Dataset
UK Doctoral Thesis Metadata from EThOS
This dataset has been superseded by a more recent version: https://doi.org/10.23636/1188 If you require access to an earlier version, please email openaccess@bl.uk, including the dataset title, date, and DOI in your request. The data in this collection comprises the bibliographic metadata for all UK doctoral theses listed in EThOS, the...British Library ; Rosie, Heather
-
Dataset
Ground Truth transcriptions for training OCR of historical Arabic handwritten texts
This dataset comprises 120 digitised images (TIFF files) drawn from a selection of historical Arabic scientific manuscripts (10th-19th century) digitised through the British Library Qatar Foundation Partnership. Also contained are ground truth transcriptions (XML) for each page that can be used for training optical character recognition (OCR) or handwritten text...British Library ; Keinan-Schoonbaert, Adi
Arabic, transcription, and OCR
-
Dataset
Judicial Committee of the Privy Council: Linked Appeals Data
The dataset in this collection contains Linked Data about appeal cases heard by the Judicial Committee of the Privy Council between 1860 and 1998. The Judicial Committee of the Privy Council (JCPC) is the final court of appeal for British overseas territories and Crown dependencies, as well as ecclesiastical and...Middle, Sarah
-
Dataset
Ground Truth transcriptions for training OCR of historical Bengali printed texts - Transkribus
This dataset comprises 74 digitised images (TIFF files) drawn from a selection of early printed Bengali books (1713-1914) digitised through the Two Centuries of Indian Print project (https://www.bl.uk/projects/two-centuries-of-indian-print). Also contained are ground truth transcriptions (XML) for each page that can be used for training optical character recognition software on historical...British Library ; Derrick, Tom
OCR, transcription, and Indian
-
Dataset
Ground Truth transcriptions for training OCR of historical Bengali printed texts - Recognition of Early Indian Printed Documents competition
This dataset comprises 81 digitised images (TIFF files) drawn from a selection of early printed Bengali books (1713-1914) digitised through the Two Centuries of Indian Print project (https://www.bl.uk/projects/two-centuries-of-indian-print). Also contained are ground truth transcriptions (XML) for each page that can be used for training optical character recognition software on historical...British Library ; Derrick, Tom
Indian, transcription, and OCR
-
Dataset
Digitised Hebrew Manuscripts: Or 74 to Stowe Ch 297
This dataset comprises 32 digitised Hebrew manuscripts (1000 - 1903), with their shelfmarks in alphabetical order (Or 74 to Stowe Ch 297). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for image...King, Ellie
manuscripts, Jewish, and Hebrew
-
Dataset
Digitised Hebrew Manuscripts: Or 1389 to Sloane MS 3173
This dataset comprises 32 digitised Hebrew manuscripts (1200 - 1899), with their shelfmarks in alphabetical order (Or 1389 to Sloane MS 3173). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for image...King, Ellie
manuscripts, Jewish, and Hebrew
-
Dataset
Digitised Hebrew Manuscripts: Or 2518 to Or 5834
This dataset comprises 33 digitised Hebrew manuscripts (900 - 1899), with their shelfmarks in alphabetical order (Or 2518 to Or 5834). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for image and...King, Ellie
manuscripts, Jewish, and Hebrew
-
Dataset
Digitised Hebrew Manuscripts: Add MS 27169 to Or 12983
This dataset comprises 21 digitised Hebrew manuscripts (1100 - 1899), with their shelfmarks in alphabetical order (Add MS 27169 to Or 12983). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for image...King, Ellie
manuscripts, Jewish, and Hebrew
-
Dataset
Italian Academies Project Database Images
Italian Academies 1525-1700 images. This dataset comprises 1084 image files in JPEG format. The images include emblems, portraits and title pages for academies, people and works. The XML metadata associated with these images contains references to their file names: https://doi.org/10.21250/iad1Gianfrancesco, Lorenza ; Testa, Simone ; Everson, Jane ; Reidy, Dennis ; Sampson, Lisa …
-
Dataset
Italian Academies Project Database XML Records
Italian Academies 1525-1700 metadata.This dataset comprises 8598 XML records representing 587 Academies, 7100 people and 911 works. Some of the XML files in this dataset contain links to images, which are available in the Italian Academies Project Database Images dataset: https://doi.org/10.21250/iad2 XML nodes with the 'ImageId' attribute contain the filename...Gianfrancesco, Lorenza ; Testa, Simone ; Everson, Jane ; Reidy, Dennis ; Sampson, Lisa …
-
Dataset
Linked Open British National Bibliography - Forthcoming Books N-Triples and RDF/XML
This dataset includes metadata for forthcoming books to be published or distributed in the UK.Deliot, Corine
British National Bibliography, forthcoming, linked open data, BNB, N-Triples, RDF/XML, NT, CIP, and metadata
-
Dataset
UK Selective Web Archive Classification Dataset. 1996 - 2010. TSV.
The dataset comprises a manually curated selective archive produced by UKWA which includes the classification of sites into a two-tiered subject hierarchy. In partnership with the Internet Archive and JISC, UKWA had obtained access to the subset of the Internet Archive’s web collection that relates to the UK. The JISC...UK Web Archive
archive, web domain dataset, JISC UK, classification dataset, UKWA Open Data, and 1996-2014
-
Dataset
Linked Open British National Bibliography - Serials. 1950- N-Triples and RDF/XML
This dataset includes metadata for serials published or distributed in the UK since 1950.Deliot, Corine
British National Bibliography, serials, linked open data, BNB, N-Triples, RDF/XML, NT, and metadata
-
Dataset
JISC UK Web Domain Dataset Host Link Graph. 1996 - 2010. TSV.
The dataset comprises ~2.5 billion 200 OK responses from the 1996 - 2010 tranche of the JISC UK Web Domain Dataset which have been scanned for hyperlinks. For each link, UKWA extracts the host that the link targets, and uses this to build up a picture of which hosts have...UKWA Open Data
archive, 1996-2012, web domain dataset, JISC UK, host link graph, and UKWA Open Data
-
Dataset
JISC UK Web Domain Dataset Format Profile. 1996 - 2010.
The dataset is a format profile, summarising media type (MIME type) data formats contained within all of the HTTP 200 OK responses in the 1996 - 2010 tranche of the JISC UK Web Domain Dataset. In partnership with the Internet Archive and JISC, UKWA had obtained access to the subset...UK Web Archive
archive, 1996-2010, web domain dataset, JISC UK, UKWA Open Data, and format profile
-
Dataset
JISC UK Web Domain Dataset Geoindex. 1996 - 2010. TSV.
The dataset comprises ~2.5 billion 200 OK responses in the 1996 - 2010 tranche of the JISC UK Web Domain Dataset Dataset which have been scanned for geographic references - specifically postcodes. This set of postcode citations, found at particular URLs and crawled at particular times, forms an historical geoindex...UK Web Archive
archive, 1996-2011, JISC UK, geoindex, UKWA Open Data, and web domain dataset
-
Dataset
JISC UK Web Domain Dataset Crawled URL Index. 1996 - 2013. CDX.
The dataset comprises original compound index (CDX) files that have been re-assembled into 18 separate CDX files for each year of crawling activity represented (1996 - 2013). Please note that the individual CDX files are not sorted. In order to enable access to web archives, UKWA uses CDX files to...UKWA Open Data
archive, 1996-2013, crawled URL index, web domain dataset, JISC UK, and UKWA Open Data
-
Dataset
AAS Card Catalogues: Chinese (Wade Giles)
This dataset contains digitised cards from the Wade-Giles card catalogue.British Library
-
Dataset
Theatrical playbills from Britain and Ireland
The dataset comprises 264 volumes of digitised theatrical playbills published between 1660 – 1902 (mostly 19th century) from England, Scotland, Wales and Ireland. Digitised from the British Library's physical collection of over 500 volumes of playbills. The dataset in Portable Document Format (PDF). The playbills cover theatres in Bath (Royal),...British Library Labs ; Kirk, Tanya
singlesheet, playbill, and playbills
-
Dataset
Digitised Hebrew Manuscripts: Add MS 10456 - Add MS 17058
This dataset comprises 25 digitised Hebrew manuscripts (1200 - 1599), with their shelfmarks in alphabetical order (Add MS 10456 - Add MS 17058). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for...Keinan-Schoonbaert, Adi
manuscripts, Jewish, and Hebrew
-
Dataset
Digitised Hebrew Manuscripts: Or 2510 - Or 2588
This dataset comprises 40 digitised Hebrew manuscripts (900 - 1747; unknown date), with their shelfmarks in alphabetical order (Or 2510 - Or 2588). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for...Keinan-Schoonbaert, Adi
manuscripts, Jewish, and Hebrew
-
Dataset
Digitised Hebrew Manuscripts: Add MS 18229 - Add MS 26897
This dataset comprises 27 digitised Hebrew manuscripts (1100 - 1799; unknown date), with their shelfmarks in alphabetical order (Add MS 18229 - Add MS 26897). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be...Keinan-Schoonbaert, Adi
manuscripts, Jewish, and Hebrew
-
Dataset
Digitised Hebrew Manuscripts: Or 2626 - Or 6425
This dataset comprises 43 digitised Hebrew manuscripts (920 - 1845; unknown date), with their shelfmarks in alphabetical order (Or 2626 - Or 6425). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for...Keinan-Schoonbaert, Adi
manuscripts, Jewish, and Hebrew
-
Dataset
Digitised Hebrew Manuscripts: Harley MS 5709 - Or 11016
This dataset comprises 25 digitised Hebrew manuscripts (1100 - 1699; unknown date), with their shelfmarks in alphabetical order (Harley MS 5709 - Or 11016). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used...Keinan-Schoonbaert, Adi
manuscripts, Jewish, and Hebrew
-
Dataset
Digitised Hebrew Manuscripts: Add MS 9399 - Harley MS 5708
This dataset comprises 27 digitised Hebrew manuscripts (1200 - 1499; unknown date), with their shelfmarks in alphabetical order (Add MS 9399 - Harley MS 5708). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be...Keinan-Schoonbaert, Adi
manuscripts, Jewish, and Hebrew
-
Dataset
Digitised Hebrew Manuscripts: Add MS 26938 - Add MS 9398
This dataset comprises 35 digitised Hebrew manuscripts (1100 - 1816; unknown date), with their shelfmarks in alphabetical order (Add MS 26938 - Add MS 9398). These manuscripts are out of copyright. We would appreciate it if users could read our Ethical terms of use guide before reusing our Hebrew manuscripts...Keinan-Schoonbaert, Adi
manuscripts, Jewish, and Hebrew
-
Dataset
Digitised Hebrew Manuscripts: Or 2210 - Or 2364
This dataset comprises 21 digitised Hebrew manuscripts (1000 - 1655; unknown date), with their shelfmarks in alphabetical order (Or 2210 - Or 2364). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for...Keinan-Schoonbaert, Adi
manuscripts, Jewish, and Hebrew
-
Dataset
Digitised Hebrew Manuscripts: Or 2365 - Or 2405
This dataset comprises 27 digitised Hebrew manuscripts (1200 - 1867; unknown date), with their shelfmarks in alphabetical order (Or 2365 - Or 2405). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for...Keinan-Schoonbaert, Adi
manuscripts, Jewish, and Hebrew
-
Dataset
Digitised Hebrew Manuscripts: Or 1103 - Or 2201
This dataset comprises 46 digitised Hebrew manuscripts (1250 - 1699; unknown date), with their shelfmarks in alphabetical order (Or 1103 - Or 2201). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for...Keinan-Schoonbaert, Adi
manuscripts, Jewish, and Hebrew
-
Dataset
Digitised Hebrew Manuscripts: Or 2406 - Or 2509
This dataset comprises 41 digitised Hebrew manuscripts (1300 - 1799; unknown date), with their shelfmarks in alphabetical order (Or 2406 - Or 2509). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for...Keinan-Schoonbaert, Adi
manuscripts, Jewish, and Hebrew
-
Dataset
Digitised Hebrew Manuscripts: Or 63 - Or 9882
This dataset comprises 35 digitised Hebrew manuscripts (0900 - 1899), with their shelfmarks in alphabetical order (Or 63 - Or 9882). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for image and...Keinan-Schoonbaert, Adi
manuscripts, Jewish, and Hebrew
-
Dataset
Digitised Hebrew Manuscripts: Add 10455 to Or 1099
This dataset comprises 39 digitised Hebrew manuscripts (1200 - 1799), with their shelfmarks in alphabetical order (Add 10455 - Or 1099). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for image and...Cronin, Catherine
manuscripts, Jewish, and Hebrew
-
Dataset
Digitised Hebrew Manuscripts: Harley 5772 to Or 14580
This dataset comprises 42 digitised Hebrew manuscripts (1200 - 1871), with their shelfmarks in alphabetical order (Harley 5772 - Or 14580). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for image and...Cronin, Catherine
manuscripts, Jewish, and Hebrew
-
Dataset
Digitised Hebrew Manuscripts: Add Ch 1250 to Or 11625
This dataset comprises 78 digitised Hebrew manuscripts (1182 - 1895), with their shelfmarks in alphabetical order (Add Ch 1250 to Or 11625). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for image...Cronin, Catherine
manuscripts, Jewish, and Hebrew
-
Dataset
Digitised Hebrew Manuscripts: Or 46 to Sloane 2642
This dataset comprises 35 digitised Hebrew manuscripts (1000 - 1899), with their shelfmarks in alphabetical order (Or 46 - Sloane 2642). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for image and...Cronin, Catherine
manuscripts, Jewish, and Hebrew
-
Dataset
Digitised Hebrew Manuscripts: Add MS 5242 to Arundel Or 50
This dataset comprises 22 digitised Hebrew manuscripts (1100 - 1799), with their shelfmarks in alphabetical order (Add MS 5242 to Arundel Or 50). These manuscripts are out of copyright. These JPEG files were converted from TIFF files using IrfanView, and then further compressed using JPEGMini. They can be used for...King, Ellie
manuscripts, Jewish, and Hebrew
-
Dataset
Digitised Hebrew Manuscripts: Metadata
This dataset contains metadata (TEI XML catalogue records) of all British Library digitised Hebrew manuscripts. These TEI XML files can be opened and manipulated using an XML editor.Keinan-Schoonbaert, Adi
manuscripts, Jewish, and Hebrew
-
Dataset
Digitised Quarterly Lists XML and Metadata
Two Centuries of Indian Print 1867-1947. The files in this dataset are derived from the British Library’s collection of bound volume Quarterly Lists: printed catalogue records of Indian books published quarterly and by province of British India between 1867 and 1947. The dataset comprises text from the collection of digitised...Derrick, Tom
-
Dataset
Digitised Quarterly Lists PDFs and Metadata
The files in this dataset are derived from the British Library’s collection of bound volume Quarterly Lists: printed catalogue records of Indian books published quarterly and by province of British India between 1867 and 1947. The dataset comprises full-text searchable PDFs of 215 volumes as well as the associated metadata...Derrick, Tom
-
Dataset
Volumes of Lysons Collectanea (Amusements), comprising broadsides, cuttings, advertisements on amusements 1660-1840
The dataset comprises nine digitised volumes of a collection of broadsides, cuttings and advertisements, relating to public exhibitions and places of amusement from 1660 - 1840 (with OCR-derived text.) Part of the Lysons Collectanea collection.British Library
amusements, text, newspapers, broadsides, OCR, and adverts
-
Dataset
Volumes of Lysons Collectanea (Trades), comprising advertisements, cuttings, and illustrations relating to trades, professions, medical cures. 1660-1825.
The dataset comprises the OCR text derived from four digitised volumes of a collection of advertisements, cuttings and illustrations relating to trades, professions and medical cures from 1660 - 1825.British Library
text, newspapers, OCR, trades, and adverts
-
Dataset
Volumes of Madden's cuttings, views, and pamphlets about the British Museum. 1755-1870.
The dataset comprises four digitised volumes of a collection of cuttings, views and pamphlets made by Sir Frederic Madden about the British Museum, dating 1755 - 1870 (with OCR-derived text.)British Library
British Museum, text, and OCR