Index Catalog // British Library

2023

Dataset

OCR and crowdsourced annotations, Language of Mechanisation, JSON files

Datasets created through crowdsourcing tasks created on the Zooniverse crowdsourcing platform by the Living with Machines ‘language of mechanisation’ project team. Building on earlier work classifying machines by function, we asked volunteers on Zooniverse 'how did the word x change over time and place?' and presented them with options for...

British Library ; Vieira, Miguel ; Ong, Tiffany ; Ciula, Arianna

mechanisation, newspapers, Industrial Revolution, 19th century British English, historical newspapers, 19th century, analytics, data visualisation, crowdsourcing, transport history, and historical semantics

2023

Dataset

Language of Mechanisation: annotated historical newspaper articles

Datasets created through crowdsourcing tasks created on the Zooniverse crowdsourcing platform by the Living with Machines ‘language of mechanisation’ project team. Building on earlier work classifying machines by function, we asked volunteers on Zooniverse 'how did the word x change over time and place?' and presented them with options for...

British Library ; Ridge, Mia ; Pedrazzini, Nilo ; McGillivray, Barbara

crowdsourcing, 19th century British English, annotation, historical newspapers, mechanisation, data visualisation, historical semantics, and transport history

2023

Dataset

UK Doctoral Thesis Metadata from EThOS

The data in this collection comprises the bibliographic metadata for all UK doctoral theses listed in EThOS, the UK's national thesis service. We estimate the data covers around 98% of all PhDs ever awarded by UK Higher Education institutions, dating back to 1787. Thesis metadata from every PhD-awarding university in...

British Library ; Rosie, Heather

higher education, student, UK, dissertations, PhD, theses, doctoral, ethos, thesis, and research

2023

Dataset

EAP031 Catalogue Metadata

This Excel spreadsheet contains the metadata that describes the archival collection digitised in Bulgaria by the EAP031 "The Treasures of Danzan Ravjaa" project team. The metadata was originally created by the EAP031 project team that digitised the archive in 2005. The project team was led by Professor Caroline Humphrey. This...

EAP031 Project Team

metadata, manuscripts, and Tibetan

2021

Dataset

EAP696 Catalogue Metadata

This Excel spreadsheet contains the metadata that describes the archival collection digitised in Bulgaria by the EAP696 "Minority press in Ottoman Turkish in Bulgaria" project team. The metadata was originally created by the EAP696 project team that digitised the archive in 2014. The project team was led by Mr Stoyan...

EAP696 Project Team

Turkish minority press, Bulgaria, and Ottoman Empire

2022

Dataset

MapReader_Data_SIGSPATIAL_2022

Hosseini, Kasra ; Wilson, Daniel C.S. ; Beelen, Kaspar ; McDonough, Katherine

Deep learning, Supervised learning, Computer vision, Historical maps, Digital libraries and archives, and Classification

2023

Dataset

Datasets for toponym recognition and disambiguation for nineteenth-century English newspapers

We present two datasets, one for the task of toponym recognition and one for the task of toponym disambiguation. The datasets are derived from the "Dataset for Toponym Resolution in Nineteenth-Century English Newspapers" (DOI: https://doi.org/10.23636/r7d4-kw08). The toponym recognition dataset consists of two JSON files (ner_fine_train.json and ner_fine_dev.json), whereas the toponym...

Coll Ardanuy, Mariona ; Nanni, Federico

toponym disambiguation, nineteenth-century newspapers, named entity recognition, entity linking, toponym resolution, toponym recognition, and dataset

2023

Dataset

DeezyMatch training set for OCR

Optical character recognition (OCR) is the process of automatically transcribing text from images. The presence of OCR-induced errors in digitised text is a common problem in the digital humanities. OCR errors are usually due to the misrecognition of characters, such as "h" recognised as "b", or "c" recognised as "o"....

Coll Ardanuy, Mariona ; Nanni, Federico ; Pedrazzini, Nilo

OCR, fuzzy string matching, string variation, newspapers, digital humanities, natural language processing, DeezyMatch, and Living with Machines

2022

Dataset

Diachronic word embeddings from 19th-century newspapers digitised by the British Library (1800-1919)

Word vectors related to the paper "Machines in the media: semantic change in the lexicon of mechanization in 19th-century British newspapers" by Nilo Pedrazzini and Barbara McGillivray (2022). The embeddings were trained on a 4.2-billion-word corpus of 19th-century British newspapers using Word2Vec and specific parameters. The embeddings are divided into...

Pedrazzini, Nilo ; McGillivray, Barbara

historical semantics, word-vectors, late-modern-english, newspapers, diachronic-embeddings, and word2vec

2023

Dataset

Decade-level Word2Vec models from automatically transcribed 19th-century newspapers digitised by the British Library (1800-1919)

Word embeddings trained on a 4.2-billion-word corpus of 19th-century British newspapers using Word2Vec and specific parameters. The embeddings are divided into periods of ten years each. Unlike those in this repository, these were not aligned and OCR errors skimmed from the vocabulary. See related GitHub repository for the full documentation:...

Pedrazzini, Nilo

historical semantics, British newspapers, word embeddings, word vectors, word2vec, and Late Modern English

2023

Dataset

Incunabula Printed Catalogue Dataset: Volumes 1-10 copy of github repository

This dataset includes the github repository used to derive catalogue entries from volumes 1-10 of the "Catalogue of books printed in the 15th century now at the British Museum" (know as BMC). The BMC was published between 1908-2007 and comprises detailed descriptions of the incunabula collection at the British Library....

British Library

book history, metadata, catalogues, datasets, incunabula, early printed books, and early printing

2023

Dataset

Incunabula Printed Catalogue Dataset: Volumes 1-10

This dataset includes the catalogue entries derived from volumes 1-10 of the "Catalogue of books printed in the 15th century now at the British Museum" (know as BMC). The BMC was published between 1908-2007 and comprises detailed descriptions of the incunabula collection at the British Library. The dataset was created...

British Library

datasets, catalogues, early printing, book history, early printed books, metadata, and incunabula

2023

Dataset

Incunabula Printed Catalogue Dataset Metadata: Volumes 1-10

This dataset includes the combined catalogue entries derived from volumes 1-10 of the "Catalogue of books printed in the 15th century now at the British Museum" (know as BMC). The BMC was published between 1908-2007 and comprises detailed descriptions of the incunabula collection at the British Library. The dataset was...

British Library

datasets, catalogues, early printing, incunabula, early printed books, metadata, and book history

2023

Dataset

Diachronic and diatopic word embeddings from newspapers digitised by the British Library (1830-1889): North and South England

Diachronic word embeddings (decade-level) trained with Word2Vec (via Gensim) on different geographic subcorpora of the Heritage Made Digital British and the Living with Machines historical newspaper collections: - North England (north.zip) - South England (south.zip) At the moment, for each subcorpus, Word2Vec models are available for each decade in the...

Pedrazzini, Nilo ; McGillivray, Barbara

historical semantics, diachronic embeddings, late modern English, word embeddings, word vectors, word2vec, and diatopic embeddings

2022

Dataset

Living with Machines Zooniverse Participant Survey

Summary results from a survey of contributors to Living with Machines Zooniverse crowdsourcing projects. Responses were received between 24 May and 13 June 2022. We designed the survey so that we could align our reporting with two other audience / participant research groups. Firstly, we used the demographic categories that...

British Library

online volunteering, digital participation, citizen science, citizen history, questionnaire, crowdsourcing, survey, and audience research

2023

Dataset

UK Doctoral Thesis Metadata from EThOS

This dataset has been superseded by a more recent version: https://doi.org/10.23636/rcm4-zk44. If you require access to an earlier version, please email openaccess@bl.uk, including the dataset title, date, and DOI in your request. The data in this collection comprises the bibliographic metadata for all UK doctoral theses listed in EThOS, the...

British Library ; Rosie, Heather

higher education, dissertations, PhD, doctoral, and EThOS

2023

Dataset

Review of Information Studies Courses in Higher Education for Web Archive Provision

This project reviewed the curriculum of information studies postgraduate courses in a number of countries across Europe. The curriculum for Library Studies, Archival Studies, Record Management and Digital Curation/Preservation and Digital Humanities were reviewed to see if there was any reference to web archiving. As these web pages were reviewed...

Boté-Vericad, Juan-José ; Byrne, Helena ; Healy, Sharon ; García, Mel ; Francis, Joan

training, third level niformation management course rovision, Britain, information management, Web Archiving, Ireland, and Spain

2023

Dataset

Dataset mapping the movement of Salkey's correspondents across the globe

Microsoft CSV file dataset created for Kepler to map the movement of Salkey's correspondents across the globe

British Library

Caribbean diaspora, data visualisation, and networks

2023

Dataset

Gephi Dataset for "Mapping Caribbean Diasporic Networks through Correspondence"

Microsoft CSV file dataset created in Gephi that can be uploaded in Gephi to create the visualisation of the network.

British Library

Caribbean diaspora, data visualisation, and networks

2023

Dataset

Spatial network dataset for "Mapping the Caribbean Diaspora through Andrew Salkey"

Microsoft csv. file dataset created for Kepler mapping the geographical movement of correspondents

British Library

Caribbean diaspora, networks, and data visualisations

Research Repository

Buscar

Resultados de la búsqueda

2023

Dataset

2023

Dataset

2023

Dataset

2023

Dataset

2021

Dataset

2022

Dataset

2023

Dataset

2023

Dataset

2022

Dataset

2023

Dataset

2023

Dataset

2023

Dataset

2023

Dataset

2023

Dataset

2022

Dataset

2023

Dataset

2023

Dataset

2023

Dataset

2023

Dataset

2023

Dataset

Limite su búsqueda