Ricerca
Risultati della ricerca
-
Dataset
OCR text derived from digitised books published 1700 - 1799 in ALTO XML.
The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO) Extensible Markup Language (XML) format.British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, eighteenth century, and ALTO
-
Dataset
OCR text derived from digitised books published 1840 - 1849 in ALTO XML
This set consists 4070 volumes, published between 1840-1849. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, nineteenth century, and ALTO
-
Dataset
OCR text derived from digitised books published c. 1510 - 1699 in ALTO XML
This set consists 693 volumes, published between c. 1510 - 1699. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and...British Library ; British Library Labs
XML, sixteenth century, books, digitised, seventeenth century, Microsoft, ALTO, and metadata
-
Dataset
OCR text derived from digitised books published 1880 - 1889 in ALTO XML
This set consists 10856 volumes, published between 1880-1889. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in Analysed Layout and Text Object (ALTO)...British Library ; British Library Labs
XML, books, digitised, metadata, Microsoft, ALTO, and nineteenth century
-
Dataset
OCR text derived from digitised books published 1900 - c. 1946. ALTO XML.
Unfortunately we are unable to make this dataset available due to copyright reasons. This set consists 1251 volumes, published between 1900-1946. The dataset comprises text from the collection of digitised books created using Optical Character Recognition (OCR) technology. The books cover a wide range of subject areas including philosophy, history,...British Library ; British Library Labs
-
Dataset
Digitised Books - Images identified as Medium Sized Images. c. 1567 - c. 1900. JPG
The dataset comprises c. 217,101 images identified as 'Medium Sized Images' from the British Library's Flickr Commons collections, dating between c. 1567 - c. 1900. The images were algorithmically gathered from 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900; Medium Sized...British Library ; British Library Labs
medium sized images, books, images, digitised, and Microsoft
-
Dataset
Digitised Books. c. 1510 - c. 1900. JSON (OCR derived text)
The dataset comprises text created by OCR from the 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The dataset is in JavaScript Object Notation (JSON) text...British Library ; British Library Labs
-
Dataset
Digitised Books - Images identified as Medium Sized Images. c. 1567 - c. 1900. JPG
The dataset comprises c. 217,101 images identified as 'Medium Sized Images' from the British Library's Flickr Commons collections, dating between c. 1567 - c. 1900. The images were algorithmically gathered from 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900; Medium Sized...British Library ; British Library Labs
digitised, books, Microsoft, images, and medium sized images
-
Dataset
Digitised Books - Images identified as Embellishments. c. 1510 - c. 1900. JPG
The images were algorithmically gathered from 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The images are in .JPEG format.British Library ; British Library Labs
embellishments, digitised, books, Microsoft, and images
-
Dataset
Digitised Books - Images identified as Plates. c. 1528 - c. 1900. JPG
The dataset comprises c. 385,237 images identified as 'Plates' from the British Library's Flickr Commons collections, dating between c. 1528 – c. 1900. The images were algorithmically gathered from 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900; Plates have currently been...British Library ; British Library Labs
-
Dataset
Digitised Books - Images of the bound covers of books. c. 1510 - c. 1900. JPG
The dataset comprises c. 61,561 images identified as 'Book Covers' from the British Library's Flickr Commons collections, dating between c. 1510 - c. 1900.British Library ; British Library Labs
digitised, books, Microsoft, images, and bookcovers
-
Dataset
John Jaffray dataset; a hand list of printed books and scrap books compiled by Jaffray relating to bookbinding and trade unionism in (mainly) 19th century Victorian London
This set comprises 169 records on 222 pages of a PDF listing the contents of the Jaffray Collection (shelf mark Jaff 1 to Jaff 169) composed using free text. John Jaffray (1811-1869) was a bookbinder in Victorian London, interested in bookbinding, trade unionism and Chartism. This is a restricted collection...Marks, P. J. M.
John Jaffray bookbinder, 19th century, trade unionism, and bookbinding
-
Dataset
StopsGB: Structured Timeline of Passenger Stations in Great Britain
Michael Quick's book _Railway Passenger Stations in Great Britain: a Chronology_ offers a uniquely rich and detailed account of Britain's changing railway infrastructure. Its listing of over 12,000 stations allows us to reconstruct the coming of rail at both micro- and macro-scales. However, being published originally as a book (and... -
Dataset
British Library Television News Programme-Level List
This list provides a programme-level record of all television news and current affairs programmes recorded by the British Library’s Broadcast News service between March 2010 and May 2022. All of the channels featured were receivable free-to-air in the UK and licensed by Ofcom. All of the programmes listed can be...British Library
television, news, and current affairs
-
Dataset
Selected edge painting on British Library printed books: A work in progress
Bookbindings were (and are) sometimes decorated via painting the edges of the leaves, usually but not exclusively, the fore edges of text blocks. The painting can be visible when the book is closed, or hidden beneath a layer of gold, when the edges have been gilt. This dataset covers examples...Marks, P.J.M.
bindings, painting under gilt, foreedge , fanned out leaves, fore-edge painting, foreedge paintings , fore-edge paintings, hidden fore edge paintings, fore-edge, and bookbindings
-
Dataset
Faber Music and Music Sales Publications 2013 to 2018
The ‘Faber Music and Music Sales Publications 2013 to 2018’ dataset is an .xlsx (Excel Workbook) file containing metadata describing 57,202 digital and printed music publications published by Faber Music and Music Sales between 2013 and 2018 and deposited at the British Library under legal deposit legislation. The data was...Roper, Amelie ; British Library
music sales, legal deposit, digital music publications, and Faber Music
-
Dataset
Publishers’ Plate Numbers 1850-
Publisher’s plate numbers are a crucial element in dating 18th and 19th century music, which very rarely carries a publication date. MacLachlan's list supplements the publication “English music publishers' plate numbers in the first half of the nineteenth century” (London, Faber, 1965) by O.W. Neighbour and A. Tyson. He continues...MacLachlan, David
nineteenth century, music publishers, music, and plate numbers
-
Dataset
The Liverpool Standard etc
The Liverpool Standard and General Commercial Advertiser (1832-1856, with two changes of title) was a Conservative newspaper established by local politicians to counter the rise of Radicalism and promote “Church and State” ideology.British Library
-
Dataset
The Northern Daily Times etc
The Liverpool-based Northern Daily Times (1853-1861, with two changes of title) was the first provincial daily newspaper in England to enjoy a sustained run. It was also one of the very first one penny dailies.British Library
-
Dataset
The Sun
The Sun was a daily evening newspaper founded in 1792 with the support of then Prime Minister, William Pitt, and his Tory government. By the mid-1830s the politics of the newspaper had shifted, and it was advocating liberal and free trade principles. Ran 1792-1871, with dataset covering 1801-1871.British Library
-
Dataset
The Express
The Express (1846-1869) was an evening newspaper companion to the Daily News (1846-1912), published by Bradbury & Evans, and advocating reformist principles.British Library
-
Dataset
The Press.
The Press (1853-1866) was a weekly conservative newspaper, to which Benjamin Disraeli regularly contributed.British Library
-
Dataset
The Star
The Star (1788-1831, dataset 1801-1831) was the first daily London evening newspaper. Its circulation was facilitated by the success of the mail-coach service.British Library
-
Dataset
National Register.
The National Register (1808-1823) was a Conservative Sunday newspaper, owned by John Browne Bell, which was hostile to parliamentary reform.British Library
-
Dataset
The British Press; or, Morning Literary Advertiser
The British Press (1803-1826) was a daily newspaper founded in January 1803 in opposition to The Morning Post, with a conservative orientation. It printed the latest news, from home and abroad, for a London readership, and provided early journalistic employment for Charles Dickens.British Library
-
Dataset
Colored News
Colored News (1855) was an illustrated general interest weekly newspaper. It was the first British newspaper to publish illustrations in colour.British Library
-
Dataset
Halifax Local Opinion
The Halifax Local Opinion was a weekly newspaper which has been digitised by the British Library for the Living with Machines project. Th is dataset (BLNewspapers_HalifaxLocalOpinion0003063_1892.zip) is currently unavailable due to a technical glitch when uploading larger files into the repository. Hopefully this will be resolved and the dataset will...British Library
-
Dataset
The Blackpool Gazette & Herald
The Blackpool Gazette & Herald (1874 - 1919) was a weekly newspaper which has been digitised by the British Library for the Living with Machines project. All but one of these datasets is currently unavailable due to a technical glitch when uploading larger files into the repository. Hopefully this will...British Library
-
Dataset
Supporting documentation for A Literature Review of Palm Leaf Manuscript Conservation: Parts 1 and 2
Part 1: a historic overview, leaf preparation, materials and media, palm leaf manuscripts at the British Library and the common types of damage Part 2: historic and current conservation treatments, boxing and storage, religious and ethical issues, recommendations The closure of the British Library during the 2020-2021 Covid-19 pandemic allowed... -
Dataset
Ground Truth transcriptions for training OCR of historical Bengali printed texts – Recognition of Early Indian Printed Documents competition - updated with improved XML coordinates
This dataset comprises 81 digitised images (TIFF files) drawn from a selection of early printed Bengali books (1713-1914) digitised through the Two Centuries of Indian Print project (https://www.bl.uk/projects/two-centuries-of-indian-print). Also contained are ground truth transcriptions (XML) for each page that can be used for training optical character recognition software on historical...British Library ; Derrick, Tom
OCR, Indian, and transcription
-
Dataset
Dataset mapping the movement of Salkey's correspondents across the globe
Microsoft CSV file dataset created for Kepler to map the movement of Salkey's correspondents across the globeBritish Library
-
Dataset
Gephi Dataset for "Mapping Caribbean Diasporic Networks through Correspondence"
Microsoft CSV file dataset created in Gephi that can be uploaded in Gephi to create the visualisation of the network.British Library
-
Dataset
Spatial network dataset for "Mapping the Caribbean Diaspora through Andrew Salkey"
Microsoft csv. file dataset created for Kepler mapping the geographical movement of correspondentsBritish Library
-
Dataset
Kepler Dataset for "Mapping the Caribbean Diaspora through Andrew Salkey's Correspondence"
Dataset created in Kepler to map the movement of the Caribbean diasporic network present in Andrew Salkey's correspondence files.British Library
-
Dataset
All Data for "Mapping the Caribbean Diaspora through Andrew Salkey's Correspondence"
Microsoft excel of all of the metadata created by the project.British Library
-
Dataset
UK Doctoral Thesis Metadata from EThOS
This dataset has been superseded by a more recent version: https://doi.org/10.23636/vtpx-we51. If you require access to an earlier version, please email openaccess@bl.uk, including the dataset title, date, and DOI in your request. The data in this collection comprises the bibliographic metadata for all UK doctoral theses listed in EThOS, the...British Library ; Rosie, Heather
thesis, student, UK, dissertations, PhD, research, doctoral, EThOS, higher education, and theses
-
Dataset
UK Doctoral Thesis Metadata from EThOS
This dataset has been superseded by a more recent version: https://doi.org/10.23636/kvwc-ty06. If you require access to an earlier version, please email openaccess@bl.uk, including the dataset title, date, and DOI in your request. The data in this collection comprises the bibliographic metadata for all UK doctoral theses listed in EThOS, the...British Library ; Rosie, Heather
thesis, student, UK, dissertations, PhD, theses, doctoral, EThOS, higher education, and research
-
Dataset
Incunabula Printed Catalogue Dataset Metadata: Volumes 1-10
This dataset includes the combined catalogue entries derived from volumes 1-10 of the "Catalogue of books printed in the 15th century now at the British Museum" (know as BMC). The BMC was published between 1908-2007 and comprises detailed descriptions of the incunabula collection at the British Library. The dataset was...British Library
datasets, catalogues, early printing, incunabula, early printed books, metadata, and book history
-
Dataset
Incunabula Printed Catalogue Dataset: Volumes 1-10
This dataset includes the catalogue entries derived from volumes 1-10 of the "Catalogue of books printed in the 15th century now at the British Museum" (know as BMC). The BMC was published between 1908-2007 and comprises detailed descriptions of the incunabula collection at the British Library. The dataset was created...British Library
datasets, catalogues, early printing, book history, early printed books, metadata, and incunabula
-
Dataset
Text extracted from digitised maps of eastern Africa circa 1880-1940
This dataset comprises an Excel spreadsheet of text extracted from almost 2,000 digital images of maps and documents held in the War Office Archive, covering a large part of eastern Africa between c.1880 and 1940. The items were catalogued and digitised with generous funding from Indigo Trust. The harvested text...Dykes, Nick
War Office Archive, place names, text extraction, military maps, East Africa, computer vision, land use, colonial history, and ethnography
-
Dataset
DeezyMatch training set for OCR
Optical character recognition (OCR) is the process of automatically transcribing text from images. The presence of OCR-induced errors in digitised text is a common problem in the digital humanities. OCR errors are usually due to the misrecognition of characters, such as "h" recognised as "b", or "c" recognised as "o".... -
Dataset
Datasets for toponym recognition and disambiguation for nineteenth-century English newspapers
We present two datasets, one for the task of toponym recognition and one for the task of toponym disambiguation. The datasets are derived from the "Dataset for Toponym Resolution in Nineteenth-Century English Newspapers" (DOI: https://doi.org/10.23636/r7d4-kw08). The toponym recognition dataset consists of two JSON files (ner_fine_train.json and ner_fine_dev.json), whereas the toponym...Coll Ardanuy, Mariona ; Nanni, Federico
toponym disambiguation, nineteenth-century newspapers, named entity recognition, entity linking, toponym resolution, toponym recognition, and dataset
-
Dataset
EAP031 Catalogue Metadata
This Excel spreadsheet contains the metadata that describes the archival collection digitised in Bulgaria by the EAP031 "The Treasures of Danzan Ravjaa" project team. The metadata was originally created by the EAP031 project team that digitised the archive in 2005. The project team was led by Professor Caroline Humphrey. This...EAP031 Project Team
metadata, manuscripts, and Tibetan
-
Dataset
EAP696 Catalogue Metadata
This Excel spreadsheet contains the metadata that describes the archival collection digitised in Bulgaria by the EAP696 "Minority press in Ottoman Turkish in Bulgaria" project team. The metadata was originally created by the EAP696 project team that digitised the archive in 2014. The project team was led by Mr Stoyan...EAP696 Project Team
-
Dataset
UK Doctoral Thesis Metadata from EThOS
The data in this collection comprises the bibliographic metadata for all UK doctoral theses listed in EThOS, the UK's national thesis service. We estimate the data covers around 98% of all PhDs ever awarded by UK Higher Education institutions, dating back to 1787. Thesis metadata from every PhD-awarding university in...British Library ; Rosie, Heather
higher education, student, UK, dissertations, PhD, theses, doctoral, ethos, thesis, and research
-
Dataset
UK Doctoral Thesis Metadata from EThOS
This dataset has been superseded by a more recent version: https://doi.org/10.23636/rcm4-zk44. If you require access to an earlier version, please email openaccess@bl.uk, including the dataset title, date, and DOI in your request. The data in this collection comprises the bibliographic metadata for all UK doctoral theses listed in EThOS, the...British Library ; Rosie, Heather
higher education, dissertations, PhD, doctoral, and EThOS
-
Dataset
SherlockNet data
Using Convolutional Neural Networks to Explore Over 400 Years of Book Illustrations: Starting from February 2016, as part of the British Library Labs Competition, we embarked on a collaboration with the British Library Labs and the British Museum to tag and caption the entire British Library 1M Collection, a set...Zhao, Luda ; Do, Brian ; Wang, Karen
tagging, Flickr, images, tags, digitised, Microsoft, sherlocknet, and books
-
Dataset
Incunabula Printed Catalogue Dataset: Volumes 1-10 copy of github repository
This dataset includes the github repository used to derive catalogue entries from volumes 1-10 of the "Catalogue of books printed in the 15th century now at the British Museum" (know as BMC). The BMC was published between 1908-2007 and comprises detailed descriptions of the incunabula collection at the British Library....British Library
book history, metadata, catalogues, datasets, incunabula, early printed books, and early printing
-
Dataset
Glasgow Courier
Glasgow Courier was a thrice weekly/bi-weekly newspaper which has been digitised by the British Library for the Living with Machines projectBritish Library