Index Catalog // British Library

2023

Dataset

OCR and crowdsourced annotations, Language of Mechanisation, JSON files

Datasets created through crowdsourcing tasks created on the Zooniverse crowdsourcing platform by the Living with Machines ‘language of mechanisation’ project team. Building on earlier work classifying machines by function, we asked volunteers on Zooniverse 'how did the word x change over time and place?' and presented them with options for...

British Library ; Vieira, Miguel ; Ong, Tiffany ; Ciula, Arianna

mechanisation, newspapers, Industrial Revolution, 19th century British English, historical newspapers, 19th century, analytics, data visualisation, crowdsourcing, transport history, and historical semantics

2023

Dataset

Language of Mechanisation: annotated historical newspaper articles

Datasets created through crowdsourcing tasks created on the Zooniverse crowdsourcing platform by the Living with Machines ‘language of mechanisation’ project team. Building on earlier work classifying machines by function, we asked volunteers on Zooniverse 'how did the word x change over time and place?' and presented them with options for...

British Library ; Ridge, Mia ; Pedrazzini, Nilo ; McGillivray, Barbara

crowdsourcing, 19th century British English, annotation, historical newspapers, mechanisation, data visualisation, historical semantics, and transport history

2023

Dataset

UK Doctoral Thesis Metadata from EThOS

The data in this collection comprises the bibliographic metadata for all UK doctoral theses listed in EThOS, the UK's national thesis service. We estimate the data covers around 98% of all PhDs ever awarded by UK Higher Education institutions, dating back to 1787. Thesis metadata from every PhD-awarding university in...

British Library ; Rosie, Heather

higher education, student, UK, dissertations, PhD, theses, doctoral, ethos, thesis, and research

2023

Software

Hybrid Correspondence Network Processing Script

The Python code was developed to to interrogate the ways in which digital and analogue correspondence files (letters and e-mails) function within the Archive of Harold Pinter; reflecting upon what these patterns might mean for archivists, curators and researchers working with hybrid correspondence collections. This code is collection agnostic and...

Mckean, Callum

Harold Pinter, data science, hybrid archives, and visualisations

2023

Dataset

EAP031 Catalogue Metadata

This Excel spreadsheet contains the metadata that describes the archival collection digitised in Bulgaria by the EAP031 "The Treasures of Danzan Ravjaa" project team. The metadata was originally created by the EAP031 project team that digitised the archive in 2005. The project team was led by Professor Caroline Humphrey. This...

EAP031 Project Team

metadata, manuscripts, and Tibetan

2021

Dataset

EAP696 Catalogue Metadata

This Excel spreadsheet contains the metadata that describes the archival collection digitised in Bulgaria by the EAP696 "Minority press in Ottoman Turkish in Bulgaria" project team. The metadata was originally created by the EAP696 project team that digitised the archive in 2014. The project team was led by Mr Stoyan...

EAP696 Project Team

Turkish minority press, Bulgaria, and Ottoman Empire

2023

Geographical dataset

Sarah FitzGerald's PhD placement project folder

This dataset is a zip file that contains the complete folder structure that Sarah used to manage this project. The content includes her planning, work, and outcomes, in the form of reports, presentations and blog posts. In addition to the data visualisations on the projects relating to Africa, Sarah also...

FitzGerald, Sarah

West Africa, research collaboration, projects, Africa, humanities, digital scholarship, and data visualisation

2023

Dataset

Datasets for toponym recognition and disambiguation for nineteenth-century English newspapers

We present two datasets, one for the task of toponym recognition and one for the task of toponym disambiguation. The datasets are derived from the "Dataset for Toponym Resolution in Nineteenth-Century English Newspapers" (DOI: https://doi.org/10.23636/r7d4-kw08). The toponym recognition dataset consists of two JSON files (ner_fine_train.json and ner_fine_dev.json), whereas the toponym...

Coll Ardanuy, Mariona ; Nanni, Federico

toponym disambiguation, nineteenth-century newspapers, named entity recognition, entity linking, toponym resolution, toponym recognition, and dataset

2023

Dataset

DeezyMatch training set for OCR

Optical character recognition (OCR) is the process of automatically transcribing text from images. The presence of OCR-induced errors in digitised text is a common problem in the digital humanities. OCR errors are usually due to the misrecognition of characters, such as "h" recognised as "b", or "c" recognised as "o"....

Coll Ardanuy, Mariona ; Nanni, Federico ; Pedrazzini, Nilo

OCR, fuzzy string matching, string variation, newspapers, digital humanities, natural language processing, DeezyMatch, and Living with Machines

2023

Dataset

Incunabula Printed Catalogue Dataset: Volumes 1-10 copy of github repository

This dataset includes the github repository used to derive catalogue entries from volumes 1-10 of the "Catalogue of books printed in the 15th century now at the British Museum" (know as BMC). The BMC was published between 1908-2007 and comprises detailed descriptions of the incunabula collection at the British Library....

British Library

book history, metadata, catalogues, datasets, incunabula, early printed books, and early printing

Research Repository

2023

Dataset

OCR and crowdsourced annotations, Language of Mechanisation, JSON files

2023

Dataset

Language of Mechanisation: annotated historical newspaper articles

2023

Dataset

UK Doctoral Thesis Metadata from EThOS

2023

Software

Hybrid Correspondence Network Processing Script

2023

Dataset

EAP031 Catalogue Metadata

2021

Dataset

EAP696 Catalogue Metadata

2023

Geographical dataset

Sarah FitzGerald's PhD placement project folder

2023

Dataset

Datasets for toponym recognition and disambiguation for nineteenth-century English newspapers

2023

Dataset

DeezyMatch training set for OCR

2023

Dataset

Incunabula Printed Catalogue Dataset: Volumes 1-10 copy of github repository

Limite su búsqueda

Type

Resource Type

Creator

Palabra clave

Idioma

Colección

Institution

Availability

Research Repository

Buscar

Resultados de la búsqueda

2023

Dataset

2023

Dataset

2023

Dataset

2023

Software

2023

Dataset

2021

Dataset

2023

Geographical dataset

2023

Dataset

2023

Dataset

2023

Dataset

Limite su búsqueda