Search Constraints
Search Results
-
Dataset
IMPACT Digitisation Centre of Competence Dataset
The Impact Centre of Competence dataset contains more than half a million representative text-based images compiled by a number of major European libraries. Covering texts from as early as 1500, and containing material from newspapers, books, pamphlets and typewritten notes, the dataset is an invaluable resource for future research into...Universitat d’Alacant ; Instituut voor de Nederlandse Taal ; Koninklijke Bibliotheek ; Bibliothèque Nationale de France ; British Library …
-
Dataset
UK Selective Web Archive Classification Dataset. 1996 - 2010. TSV.
The dataset comprises a manually curated selective archive produced by UKWA which includes the classification of sites into a two-tiered subject hierarchy. In partnership with the Internet Archive and JISC, UKWA had obtained access to the subset of the Internet Archive’s web collection that relates to the UK. The JISC...UK Web Archive
archive, web domain dataset, JISC UK, classification dataset, UKWA Open Data, and 1996-2014
-
Dataset
JISC UK Web Domain Dataset Crawled URL Index. 1996 - 2013. CDX.
The dataset comprises original compound index (CDX) files that have been re-assembled into 18 separate CDX files for each year of crawling activity represented (1996 - 2013). Please note that the individual CDX files are not sorted. In order to enable access to web archives, UKWA uses CDX files to...UKWA Open Data
archive, 1996-2013, crawled URL index, web domain dataset, JISC UK, and UKWA Open Data
-
Dataset
JISC UK Web Domain Dataset Format Profile. 1996 - 2010.
The dataset is a format profile, summarising media type (MIME type) data formats contained within all of the HTTP 200 OK responses in the 1996 - 2010 tranche of the JISC UK Web Domain Dataset. In partnership with the Internet Archive and JISC, UKWA had obtained access to the subset...UK Web Archive
archive, 1996-2010, web domain dataset, JISC UK, UKWA Open Data, and format profile
-
Dataset
JISC UK Web Domain Dataset Host Link Graph. 1996 - 2010. TSV.
The dataset comprises ~2.5 billion 200 OK responses from the 1996 - 2010 tranche of the JISC UK Web Domain Dataset which have been scanned for hyperlinks. For each link, UKWA extracts the host that the link targets, and uses this to build up a picture of which hosts have...UKWA Open Data
archive, 1996-2012, web domain dataset, JISC UK, host link graph, and UKWA Open Data
-
Dataset
JISC UK Web Domain Dataset Geoindex. 1996 - 2010. TSV.
The dataset comprises ~2.5 billion 200 OK responses in the 1996 - 2010 tranche of the JISC UK Web Domain Dataset Dataset which have been scanned for geographic references - specifically postcodes. This set of postcode citations, found at particular URLs and crawled at particular times, forms an historical geoindex...UK Web Archive
archive, 1996-2011, JISC UK, geoindex, UKWA Open Data, and web domain dataset
-
Dataset
Digitised 19th Century Books - Metadata - 01/09/2013
The dataset holds metadata on the the books digitised within this collection, providing a quick means to connect a book identifier with some of the key bibliographic metadata about it. The metadata is held in JSON notation for ease of reuse.British Library ; British Library Labs
JSON, microsoft, books, metadata, and bibliographic
-
Dataset
Digitised Books - Flickr Tag History - Dec 2013 to March 2016. TSV
The dataset comprises user submitted tags (with dates) added to the British Library Flickr Commons collections up to March 2016. The images were algorithmically gathered from 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900. The books cover a wide range of...British Library ; British Library Labs
tag history, microsoft, books, digitised, tags, Flickr, tagging, and TSV
-
Dataset
OCR text derived from digitised books (unknown precise publication dates) in ALTO XML
This set consists 284 volumes. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO) Extensible Markup Language...British Library ; British Library Labs
XML, microsoft, books, digitised, metadata, nineteenth century, and ALTO
-
Dataset
OCR text derived from digitised books published 1890 - 1899 in ALTO XML
This set consists 14847 volumes, published between 1890-1899. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO
-
Dataset
OCR text derived from digitised books published 1870 - 1879 in ALTO XML
This set consists 8630 volumes, published between 1870-1879. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO
-
Dataset
OCR text derived from digitised books published 1850 - 1859 in ALTO XML
This set consists 5818 volumes, published between 1850-1859. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO
-
Dataset
OCR text derived from digitised books published 1860 - 1869 in ALTO XML
This set consists 7498 volumes, published between 1860-1869. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO
-
Dataset
OCR text derived from digitised books published 1880 - 1889 in ALTO XML
This set consists 10856 volumes, published between 1880-1889. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, ALTO, and nineteenth century
-
Dataset
OCR text derived from digitised books published 1830 - 1839 in ALTO XML
This set consists 2639 volumes, published between 1830-1839. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO
-
Dataset
OCR text derived from digitised books published 1810 - 1819 in ALTO XML
This set consists 2338 volumes, published between 1810-1819. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO
-
Dataset
OCR text derived from digitised books published 1820 - 1829 in ALTO XML
This set consists 2739 volumes, published between 1820-1829. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO
-
Dataset
OCR text derived from digitised books published 1700 - 1799 in ALTO XML.
The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO) Extensible Markup Language (XML) format.British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, eighteenth century, and ALTO
-
Dataset
OCR text derived from digitised books published 1800 - 1809 in ALTO XML
This set consists 1502 volumes, published between 1800-1809. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO
-
Dataset
OCR text derived from digitised books published c. 1510 - 1699 in ALTO XML
This set consists 693 volumes, published between c. 1510 - 1699. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and...British Library ; British Library Labs
XML, sixteenth century, books, digitised, seventeenth century, Microsoft, ALTO, and metadata
-
Dataset
OCR text derived from digitised books published 1840 - 1849 in ALTO XML
This set consists 4070 volumes, published between 1840-1849. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO
-
Dataset
Digitised Books - Images identified as Medium Sized Images. c. 1567 - c. 1900. JPG
The dataset comprises c. 217,101 images identified as 'Medium Sized Images' from the British Library's Flickr Commons collections, dating between c. 1567 - c. 1900. The images were algorithmically gathered from 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900; Medium Sized...British Library ; British Library Labs
medium sized images, books, images, digitised, and Microsoft
-
Dataset
Digitised Books - Images identified as Medium Sized Images. c. 1567 - c. 1900. JPG
The dataset comprises c. 217,101 images identified as 'Medium Sized Images' from the British Library's Flickr Commons collections, dating between c. 1567 - c. 1900. The images were algorithmically gathered from 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900; Medium Sized...British Library ; British Library Labs
digitised, books, Microsoft, images, and medium sized images
-
Dataset
Digitised Books. c. 1510 - c. 1900. JSON (OCR derived text)
The dataset comprises text created by OCR from the 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in JavaScript Object Notation (JSON) text...British Library ; British Library Labs
-
Dataset
Digitised Books - Images identified as Embellishments. c. 1510 - c. 1900. JPG
The images were algorithmically gathered from 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The images are in .JPEG format.British Library ; British Library Labs
embellishments, digitised, books, Microsoft, and images
-
Dataset
Digitised Books - Images of the bound covers of books. c. 1510 - c. 1900. JPG
The dataset comprises c. 61,561 images identified as 'Book Covers' from the British Library's Flickr Commons collections, dating between c. 1510 - c. 1900.British Library ; British Library Labs
digitised, books, Microsoft, images, and bookcovers
-
Dataset
Digitised Books - Images identified as Plates. c. 1528 - c. 1900. JPG
The dataset comprises c. 385,237 images identified as 'Plates' from the British Library's Flickr Commons collections, dating between c. 1528 – c. 1900. The images were algorithmically gathered from 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900; Plates have currently been...British Library ; British Library Labs
-
Dataset
Volumes of performances connecting Sir Henry Irving. 1879 - 1905.
Sir Henry Irving's American and Provincial Tours 1883 - 1905; miscellaneous performances, including some given by Royal Command, 1883 - 1903; Lyceum Theatre 1879 – 1902; and Drury Lane Theatre, 1903 and 1905. The collection was formed by Bram Stoker.British Library
-
Dataset
Theatrical playbills from Britain and Ireland (OCR text only)
The dataset comprises 264 volumes of digitised theatrical playbills published between 1660 – 1902 (mostly 19th century) from England, Scotland, Wales and Ireland. Digitised from the British Library's physical collection of over 500 volumes of playbills. The dataset contains text files (.TXT) in Optical Character Recognition (OCR) format. The playbills...British Library Labs
singlesheet, text, playbill, OCR, and playbills
-
Dataset
Portraits of actors, views of theatres and playbills (covering 1750 - 1821 in a single volume)
166 page PDF of collated portraits and views (with OCR-derived text) The dataset comprises one digitised volume (166 pages) of a collection of portraits of celebrated actors and actresses, views of theatres and playbills, dating 1750 - 1821. The dataset is in Portable Document Format (PDF).British Library
text, theatres, views, portraits, actors, OCR, and playbills
-
Dataset
Volumes of Lysons Collectanea (Trades), comprising advertisements, cuttings, and illustrations relating to trades, professions, medical cures. 1660-1825.
The dataset comprises the OCR text derived from four digitised volumes of a collection of advertisements, cuttings and illustrations relating to trades, professions and medical cures from 1660 - 1825.British Library
text, newspapers, OCR, trades, and adverts
-
Dataset
Volumes of Lysons Collectanea (Amusements), comprising broadsides, cuttings, advertisements on amusements 1660-1840
The dataset comprises nine digitised volumes of a collection of broadsides, cuttings and advertisements, relating to public exhibitions and places of amusement from 1660 - 1840 (with OCR-derived text.) Part of the Lysons Collectanea collection.British Library
amusements, text, newspapers, broadsides, OCR, and adverts
-
Dataset
Volumes of portraits and biographies of officers in the South African wars collected by John Malcolm Bulloch. 1900 - 1902.
The dataset comprises six digitised volumes (in PDF) of a collection of portraits and biographical details of some officers distinguished in the South African War (1900 - 1902) (with OCR-derived text.) The collection was formed by John Malcolm Bulloch..British Library
South Africa, text, portraits, war, army, OCR, biographies, and biography
-
Dataset
Volumes of Madden's cuttings, views, and pamphlets about the British Museum. 1755-1870.
The dataset comprises four digitised volumes of a collection of cuttings, views and pamphlets made by Sir Frederic Madden about the British Museum, dating 1755 - 1870 (with OCR-derived text.)British Library
British Museum, text, and OCR
-
Dataset
Theatrical playbills from Britain and Ireland
The dataset comprises 264 volumes of digitised theatrical playbills published between 1660 – 1902 (mostly 19th century) from England, Scotland, Wales and Ireland. Digitised from the British Library's physical collection of over 500 volumes of playbills. The dataset in Portable Document Format (PDF). The playbills cover theatres in Bath (Royal),...British Library Labs ; Kirk, Tanya
singlesheet, playbill, and playbills
-
Dataset
Volume of Christmas ballads and broadsides. 1750 - 1840
110 page PDF of miscellaneous Christmas ballads and prose broadsides (with OCR-derived text.) The dataset comprises one digitised volume (110 pages) of a collection of Christmas ballads and prose broadsides chiefly printed in London by J. Pitts between 1750 - 1840. The dataset is in Portable Document Format (PDF).British Library
-
Dataset
Volumes of signs of taverns in England and Wales. 1628 - 1858
The dataset comprises 14 digitised volumes (as PDFs) of a collection of tavern signs in and England and Wales dating 1628 – 1858 (with OCR-derived text.)British Library
-
Dataset
Results From A 2015 Survey On Git/Distributed Version Control At Imperial College London
These are the - anonymised - results from a survey run at Imperial College London in November-December 2015. The survey was aimed at user of distributed version control systems, in particular Git. Before publishing the results I deleted all comments to avoid individuals being identified. The survey was designed to...Reimer, Torsten ; Boakye, Gifty
higher education, survey, Git, distributed version control, and software development
-
Dataset
Digitised Quarterly Lists PDFs and Metadata
The files in this dataset are derived from the British Library’s collection of bound volume Quarterly Lists: printed catalogue records of Indian books published quarterly and by province of British India between 1867 and 1947. The dataset comprises full-text searchable PDFs of 215 volumes as well as the associated metadata...Derrick, Tom
-
Dataset
SherlockNet data
Using Convolutional Neural Networks to Explore Over 400 Years of Book Illustrations: Starting from February 2016, as part of the British Library Labs Competition, we embarked on a collaboration with the British Library Labs and the British Museum to tag and caption the entire British Library 1M Collection, a set...Zhao, Luda ; Do, Brian ; Wang, Karen
tagging, Flickr, images, tags, digitised, Microsoft, sherlocknet, and books
-
Dataset
India Office Medical Archives samples (printed)
This dataset comprises 15 samples of digitised India Office Medical Archives on cholera and medical topography. All Open Government Licence. These PDF files relate to cholera and medical topography in British India and can be used for teaching, study, etc.Moon, Antonia
-
Dataset
Digitised maps of the former British East Africa
This dataset comprises 581 images of maps of the former British East Africa created between 1890 and 1940 and a spreadsheet of related catalogue records. All Open Government Licence v1.0 (OGL). A user-friendly geographical search index of the maps is available on Google Maps. These JPEG files were converted from...Dykes, Nick
War Office Archive, documents, Intelligence, Uganda, East Africa, British East Africa, Maps, Military maps, and Kenya
-
Dataset
India Office Medical Archives samples (small)
This dataset comprises 2 samples of digitised India Office Medical Archives on cholera and medical topography (one manuscript and one printed item). All Open Government Licence. These files relate to cholera and medical topography in British India and can be used for teaching, study, etc.Moon, Antonia
-
Dataset
India Office Medical Archives samples (manuscripts)
This dataset comprises 13 samples of digitised India Office Medical Archives on cholera and medical topography. All Open Government Licence.Moon, Antonia
-
Dataset
Linked Open British National Bibliography - Books. 1950- N-Triples and RDF/XML.
This dataset includes metadata for books published or distributed in the UK since 1950.Deliot, Corine
British National Bibliography, BNB, NT, linked open data, RDF/XML, N-Triples, books, and metadata
-
Dataset
Digitised Quarterly Lists XML and Metadata
Two Centuries of Indian Print 1867-1947. The files in this dataset are derived from the British Library’s collection of bound volume Quarterly Lists: printed catalogue records of Indian books published quarterly and by province of British India between 1867 and 1947. The dataset comprises text from the collection of digitised...Derrick, Tom
-
Dataset
UK Doctoral Thesis Metadata from EThOS
This dataset has been superseded by a more recent version: https://doi.org/10.23636/1137 If you require access to an earlier version, please email openaccess@bl.uk, including the dataset title, date, and DOI in your request. The data in this collection comprises the bibliographic metadata for all UK doctoral theses listed in EThOS, the UK's national thesis service. We...British Library ; Rosie, Heather
ethos, dissertations, thesis, research, PhD, doctoral, student, UK, theses, Higher education, and HE
-
Dataset
Linked Open British National Bibliography - Forthcoming Books N-Triples and RDF/XML
This dataset includes metadata for forthcoming books to be published or distributed in the UK.Deliot, Corine
British National Bibliography, forthcoming, linked open data, BNB, N-Triples, RDF/XML, NT, CIP, and metadata
-
Dataset
Linked Open British National Bibliography - Serials. 1950- N-Triples and RDF/XML
This dataset includes metadata for serials published or distributed in the UK since 1950.Deliot, Corine
British National Bibliography, serials, linked open data, BNB, N-Triples, RDF/XML, NT, and metadata
-
Dataset
Books divided by Genre from the Digitised 19th century books dataset
A dataset derived from the Digitised 19th Century Books dataset which classifies the books by genre (Drama, Poetry, Prose, Music and unidentified). For Drama, Music and Prose several types were identified. For Drama: comedy, play, recitation and tragedy. For Prose: novel, parody, romance, satire, story, history subset of story and...British Library ; British Library Labs
Music, Genre, Prose, books, Poetry, metadata, bibliographic, and Drama
- « Previous
- Next »
- 1
- 2
- 3
- 4