Index Catalog // British Library

2023

Dataset

DeezyMatch training set for OCR

Optical character recognition (OCR) is the process of automatically transcribing text from images. The presence of OCR-induced errors in digitised text is a common problem in the digital humanities. OCR errors are usually due to the misrecognition of characters, such as "h" recognised as "b", or "c" recognised as "o"....

Coll Ardanuy, Mariona ; Nanni, Federico ; Pedrazzini, Nilo

OCR, fuzzy string matching, string variation, newspapers, digital humanities, natural language processing, DeezyMatch, and Living with Machines

2021

Dataset

StopsGB: Structured Timeline of Passenger Stations in Great Britain

Michael Quick's book _Railway Passenger Stations in Great Britain: a Chronology_ offers a uniquely rich and detailed account of Britain's changing railway infrastructure. Its listing of over 12,000 stations allows us to reconstruct the coming of rail at both micro- and macro-scales. However, being published originally as a book (and...

Coll Ardanuy, Mariona ; Beelen, Kaspar ; Lawrence, Jon ; McDonough, Katherine ; Nanni, Federico …

toponym resolution, open science, digital humanities, Living with Machines, railway stations, and entity linking

2020

Dataset

Living Machines atypical animacy dataset

Atypical animacy detection dataset, based on nineteenth-century sentences in English extracted from an open dataset of nineteenth-century books digitized by the British Library (available via https://doi.org/10.21250/db14, British Library Labs, 2014). This dataset contains 598 sentences containing mentions of machines. Each sentence has been annotated according to the animacy and humanness...

Tolfo, Giorgia ; Ahnert, Ruth ; Beelen, Kaspar ; Coll Ardanuy, Mariona ; Lawrence, Jon …

digital history, natural language processing, computational linguistics, Living with Machines, atypical animacy, and digital humanities

2019

Poster (published)

Living with Machines - Computer-detected text in historical maps

Ahnert, Ruth ; Beavan, David ; Beelen, Kaspar ; Coll Ardanuy, Mariona ; Griffin, Emma …

maps and Living with Machines

2019

Poster (published)

Living with Machines - Agency of Machines

Ahnert, Ruth ; Beavan, David ; Beelen, Kaspar ; Colavizza, Giovanni ; Coll Ardanuy, Mariona …

lexicon expansion and Living with Machines

2019

Conference paper (published)

Resolving places, past and present: toponym resolution in historical British newspapers using multiple resources

Newspapers and their metadata are richly geographical, not only in their distribution but also their content. Attending to these spatial features is a prerequisite in newspaper research. Following other projects to have geoparsed place names in newspapers, we describe our approach to linking historical geospatial information in text to real-world...

Coll Ardanuy, Mariona ; McDonough, Katherine ; Krause, Amrey ; Wilson, Daniel C.S. ; Hosseini, Kasra …

Living with Machines

Research Repository

2023

Dataset

DeezyMatch training set for OCR

2021

Dataset

StopsGB: Structured Timeline of Passenger Stations in Great Britain

2020

Dataset

Living Machines atypical animacy dataset

2019

Poster (published)

Living with Machines - Computer-detected text in historical maps

2019

Poster (published)

Living with Machines - Agency of Machines

2019

Conference paper (published)

Resolving places, past and present: toponym resolution in historical British newspapers using multiple resources

Limit your search

Type

Resource Type

Creator

Keyword

Language

Collection

Institution

Availability

Research Repository

Search Constraints

Search Results

2023

Dataset

2021

Dataset

2020

Dataset

2019

Poster (published)

2019

Poster (published)

2019

Conference paper (published)

Limit your search