Index Catalog // British Library

2023

Dataset

Datasets for toponym recognition and disambiguation for nineteenth-century English newspapers

We present two datasets, one for the task of toponym recognition and one for the task of toponym disambiguation. The datasets are derived from the "Dataset for Toponym Resolution in Nineteenth-Century English Newspapers" (DOI: https://doi.org/10.23636/r7d4-kw08). The toponym recognition dataset consists of two JSON files (ner_fine_train.json and ner_fine_dev.json), whereas the toponym...

Coll Ardanuy, Mariona ; Nanni, Federico

toponym disambiguation, nineteenth-century newspapers, named entity recognition, entity linking, toponym resolution, toponym recognition, and dataset

2023

Dataset

DeezyMatch training set for OCR

Optical character recognition (OCR) is the process of automatically transcribing text from images. The presence of OCR-induced errors in digitised text is a common problem in the digital humanities. OCR errors are usually due to the misrecognition of characters, such as "h" recognised as "b", or "c" recognised as "o"....

Coll Ardanuy, Mariona ; Nanni, Federico ; Pedrazzini, Nilo

OCR, fuzzy string matching, string variation, newspapers, digital humanities, natural language processing, DeezyMatch, and Living with Machines

2022

Journal article

A Dataset for Toponym Resolution in Nineteenth-Century English Newspapers

We present a new dataset for the task of toponym resolution in digitized historical newspapers in English. It consists of 343 annotated articles from newspapers based in four different locations in England (Manchester, Ashton-under-Lyne, Poole and Dorchester), published between 1780 and 1870. The articles have been manually annotated with mentions...

Coll Ardanuy, Mariona ; Beavan, David ; Beelen, Kaspar ; Hosseini, Kasra ; Lawrence, Jon …

nineteenth-century English, geographic information retrieval, benchmark, newspapers, toponym resolution, and dataset

2021

Conference paper (published)

When Time Makes Sense: A Historically-Aware Approach to Targeted Sense Disambiguation

As languages evolve historically, making computational approaches sensitive to time can improve performance on specific tasks. In this work, we assess whether applying historical language models and time-aware methods help with determining the correct sense of polysemous words. We outline the task of time-sensitive Targeted Sense Disambiguation (TSD), which aims...

Beelen, Kaspar ; Nanni, Federico ; Coll Ardanuy, Mariona ; Hosseini, Kasra ; Tolfo, Giorgia …

2020

Conference paper (published)

Living Machines: A study of atypical animacy

This paper proposes a new approach to animacy detection, the task of determining whether an entity is represented as animate in a text. In particular, this work is focused on atypical animacy and examines the scenario in which typically inanimate objects, specifically machines, are given animate attributes. To address it,...

Coll Ardanuy, Mariona ; Nanni, Federico ; Beelen, Kaspar ; Hosseini, Kasra ; Ahnert, Ruth …

nineteenth-century English, living machines, BERT, and animacy

2020

Conference paper (published)

DeezyMatch: A Flexible Deep Learning Approach to Fuzzy String Matching

We present DeezyMatch, a free, open-source software library written in Python for fuzzy string matching and candidate ranking. Its pair classifier supports various deep neural network architectures for training new classifiers and for fine-tuning a pretrained model, which paves the way for transfer learning in fuzzy string matching. This approach...

Hosseini, Kasra ; Nanni, Federico ; Coll Ardanuy, Mariona

Natural Language Processing, string matching, toponym matching, machine learning, and digital humanities

2020

Dataset

Living Machines atypical animacy dataset

Atypical animacy detection dataset, based on nineteenth-century sentences in English extracted from an open dataset of nineteenth-century books digitized by the British Library (available via https://doi.org/10.21250/db14, British Library Labs, 2014). This dataset contains 598 sentences containing mentions of machines. Each sentence has been annotated according to the animacy and humanness...

Tolfo, Giorgia ; Ahnert, Ruth ; Beelen, Kaspar ; Coll Ardanuy, Mariona ; Lawrence, Jon …

digital history, natural language processing, computational linguistics, Living with Machines, atypical animacy, and digital humanities

Research Repository

2023

Dataset

Datasets for toponym recognition and disambiguation for nineteenth-century English newspapers

2023

Dataset

DeezyMatch training set for OCR

2022

Journal article

A Dataset for Toponym Resolution in Nineteenth-Century English Newspapers

2021

Conference paper (published)

When Time Makes Sense: A Historically-Aware Approach to Targeted Sense Disambiguation

2020

Conference paper (published)

Living Machines: A study of atypical animacy

2020

Conference paper (published)

DeezyMatch: A Flexible Deep Learning Approach to Fuzzy String Matching

2020

Dataset

Living Machines atypical animacy dataset

Limite su búsqueda

Type

Resource Type

Creator

Palabra clave

Idioma

Colección

Institution

Availability

Research Repository

Buscar

Resultados de la búsqueda

2023

Dataset

2023

Dataset

2022

Journal article

2021

Conference paper (published)

2020

Conference paper (published)

2020

Conference paper (published)

2020

Dataset

Limite su búsqueda