Index Catalog // British Library

2023

Dataset

DeezyMatch training set for OCR

Optical character recognition (OCR) is the process of automatically transcribing text from images. The presence of OCR-induced errors in digitised text is a common problem in the digital humanities. OCR errors are usually due to the misrecognition of characters, such as "h" recognised as "b", or "c" recognised as "o"....

Coll Ardanuy, Mariona ; Nanni, Federico ; Pedrazzini, Nilo

OCR, fuzzy string matching, string variation, newspapers, digital humanities, natural language processing, DeezyMatch, and Living with Machines

2020

Research report

Data Study Group Final Report: Smart monitoring for conservation areas

WWF (World Wide Fund for Nature) monitors over 250,000 protected areas (e.g. national parks and nature reserves) and thousands of other sites and critical habitats. These sites are the foundation of global natural assets and are central to the preservation of biodiversity and human well-being. Unfortunately, they face increasing pressures...

Hosseini, Kasra ; Coll Ardanuy, Mariona ; Patterson, David ; Garcia-Velez, Laura ; Castro-Gonzalez, Leonardo …

neural networks, supervised learning, natural language processing, WWF, conservation, habitats, Data Study Groups, and Alan Turing Institute

2020

Dataset

Living Machines atypical animacy dataset

Atypical animacy detection dataset, based on nineteenth-century sentences in English extracted from an open dataset of nineteenth-century books digitized by the British Library (available via https://doi.org/10.21250/db14, British Library Labs, 2014). This dataset contains 598 sentences containing mentions of machines. Each sentence has been annotated according to the animacy and humanness...

Tolfo, Giorgia ; Ahnert, Ruth ; Beelen, Kaspar ; Coll Ardanuy, Mariona ; Lawrence, Jon …

digital history, natural language processing, computational linguistics, Living with Machines, atypical animacy, and digital humanities

2019

Journal article

Appraising, processing, and providing access to email in contemporary literary archives

The email of contemporary literary ﬁgures is ripe for research by scholars, and of broad interest to the general public, but can also present many challenges to cultural memory institutions that seek to appraise, process and provide access to this rich archival material. This article explores how ﬁve institutions across...

Schneider, J. ; Adams, C. ; DeBauche, S. ; Echols, R. ; McKean, C. …

contemporary literary archives, machine learning, archival processing, natural language processing, and email preservation

Research Repository

2023

Dataset

DeezyMatch training set for OCR

2020

Research report

Data Study Group Final Report: Smart monitoring for conservation areas

2020

Dataset

Living Machines atypical animacy dataset

2019

Journal article

Appraising, processing, and providing access to email in contemporary literary archives

Limite su búsqueda

Type

Resource Type

Creator

Palabra clave

Idioma

Colección

Institution

Availability

Research Repository

Buscar

Resultados de la búsqueda

2023

Dataset

2020

Research report

2020

Dataset

2019

Journal article

Limite su búsqueda