Index Catalog // British Library

2022

Conference paper (published)

Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0

In this work, we explore whether the recently demonstrated zero-shot abilities of the T0 model extend to Named Entity Recognition for out-of-distribution languages and time periods. Using a historical newspaper corpus in 3 languages as test-bed, we use prompts to extract possible named entities. Our results show that a naive...

De Toni, Francesco ; Akiki, Christopher ; De La Rosa, Javier ; Fourrier, Clémentine ; Manjavacas, Enrique …

digital humanities, T0, and named entity recognition

2022

Journal article

Under the Impression: Multispectral Imaging of Lord Frederick Campbell Charter XXI 5

Lord Frederick Campbell Charter 5 is the only surviving English document that still has an authentic, legible, pre-Conquest seal attached to it. The text purports to be a writ of Edward the Confessor (1003x5–1066) granting a slew of rights to Christ Church Cathedral, Canterbury. We examined the writ using multispectral...

Hudson, Alison ; Duffy, Christina

multispectral imaging, digital humanities, conservation, seals, early medieval history, writs, and Norman Conquest

2022

Abstract

Historic machines from 'prams' to 'Parliament': new avenues for collaborative linguistic research

Research in computational linguistics has made successful attempts at modelling word meaning at scale, but much remains to be done to put these computational models to the test of historical scholarship (see e.g. Beelen et al. 2021). More importantly, a lot of computational research looks at texts in a historical...

Ridge, Mia ; Tolfo, Giorgia ; Westerling, Kalle ; Pedrazzini, Nilo ; McGillivray, Barbara

crowdsourcing, computational linguistics, and digital humanities

2020

Conference paper (unpublished)

A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching

Recognizing toponyms and resolving them to their real-world referents is required for providing advanced semantic access to textual data. This process is often hindered by the high degree of variation in toponyms. Candidate selection is the task of identifying the potential entities that can be referred to by a toponym...

Coll Ardanuy, Mariona ; Hosseini, Kasra ; McDonough, Katherine ; Krause, Amrey ; van Strien, Daniel …

fuzzy string matching, toponym matching, toponym resolution, entity linking, and digital humanities

2021

Dataset

StopsGB: Structured Timeline of Passenger Stations in Great Britain

Michael Quick's book _Railway Passenger Stations in Great Britain: a Chronology_ offers a uniquely rich and detailed account of Britain's changing railway infrastructure. Its listing of over 12,000 stations allows us to reconstruct the coming of rail at both micro- and macro-scales. However, being published originally as a book (and...

Coll Ardanuy, Mariona ; Beelen, Kaspar ; Lawrence, Jon ; McDonough, Katherine ; Nanni, Federico …

toponym resolution, open science, digital humanities, Living with Machines, railway stations, and entity linking

2021

Journal article

Can I believe what I see? Data visualisation and trust in the humanities

Questions of trust are increasingly important in relation to data and its use. The authors focus on humanities data and its visualisation, through analysis of their own recent projects with museums, archives and libraries internationally. Their account connects the specifics of hands-on digital humanities work to larger epistemological questions. They...

Boyd Davis, Stephen ; Vane, Olivia ; Kräutli, Florian

scepticism, critical design, interdisciplinarity, ethics, digital humanities, interrogability, data visualisation, and GLAM

2020

Conference paper (published)

DeezyMatch: A Flexible Deep Learning Approach to Fuzzy String Matching

We present DeezyMatch, a free, open-source software library written in Python for fuzzy string matching and candidate ranking. Its pair classifier supports various deep neural network architectures for training new classifiers and for fine-tuning a pretrained model, which paves the way for transfer learning in fuzzy string matching. This approach...

Hosseini, Kasra ; Nanni, Federico ; Coll Ardanuy, Mariona

Natural Language Processing, string matching, toponym matching, machine learning, and digital humanities

2020

Dataset

Living Machines atypical animacy dataset

Atypical animacy detection dataset, based on nineteenth-century sentences in English extracted from an open dataset of nineteenth-century books digitized by the British Library (available via https://doi.org/10.21250/db14, British Library Labs, 2014). This dataset contains 598 sentences containing mentions of machines. Each sentence has been annotated according to the animacy and humanness...

Tolfo, Giorgia ; Ahnert, Ruth ; Beelen, Kaspar ; Coll Ardanuy, Mariona ; Lawrence, Jon …

digital history, natural language processing, computational linguistics, Living with Machines, atypical animacy, and digital humanities

2020

Dataset

Living with Machines alpha and beta Zooniverse 'accident' task data

Data created through crowdsourcing tasks hosted on the Zooniverse platform. Members of the public were asked to look at a selection of articles from 19th century newspapers that mentioned machines and decide if they described an industrial accident. A further task asked participants to transcribe personal, organisational and place names...

Zooniverse volunteers

crowdsourcing, digital history, citizen history, Living with Machines, newspapers, and digital humanities

2020

Conference paper (unpublished)

Assessing the Impact of OCR Quality on Downstream NLP Tasks

A growing volume of heritage data is being digitized and made available as text via optical character recognition (OCR). Scholars and libraries are increasingly using OCR-generated text for retrieval and analysis. However, the process of creating text through OCR introduces varying degrees of error to the text. The impact of...

van Strien, Daniel ; Beelen, Kaspar ; Coll Ardanuy, Mariona ; Hosseini, Kasra ; McGillivray, Barbara …

Natural Language Processing, OCR, Optical Character Recognition, information retrieval, NLP, and digital humanities

Research Repository

2022

Conference paper (published)

Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0

2022

Journal article

Under the Impression: Multispectral Imaging of Lord Frederick Campbell Charter XXI 5

2022

Abstract

Historic machines from 'prams' to 'Parliament': new avenues for collaborative linguistic research

2020

Conference paper (unpublished)

A Deep Learning Approach to Geographical Candidate Selection through Toponym Matching

2021

Dataset

StopsGB: Structured Timeline of Passenger Stations in Great Britain

2021

Journal article

Can I believe what I see? Data visualisation and trust in the humanities

2020

Conference paper (published)

DeezyMatch: A Flexible Deep Learning Approach to Fuzzy String Matching

2020

Dataset

Living Machines atypical animacy dataset

2020

Dataset

Living with Machines alpha and beta Zooniverse 'accident' task data

2020

Conference paper (unpublished)

Assessing the Impact of OCR Quality on Downstream NLP Tasks

Limit your search

Type

Resource Type

Creator

Keyword

Language

Collection

Institution

Availability

Research Repository

Search Constraints

Search Results

2022

Conference paper (published)

2022

Journal article

2022

Abstract

2020

Conference paper (unpublished)

2021

Dataset

2021

Journal article

2020

Conference paper (published)

2020

Dataset

2020

Dataset

2020

Conference paper (unpublished)

Limit your search