Search Constraints
Search Results
-
Abstract
Using smart annotations to map the geography of newspapers
Geographic information is a key component in the description of collection objects, and yet its format is often unsuited for use with methods of geographic analysis. Catalogue entries are often inconsistent, in plain text, and without geographic coordinates (much less coordinates linked to authority records). Georesolution of the relevant fields...Ryan, Yann ; Coll Ardanuy, Mariona ; van Strien, Daniel ; Hosseini, Kasra ; Beelen, Kaspar …
-
Conference paper (unpublished)
Assessing the Impact of OCR Quality on Downstream NLP Tasks
A growing volume of heritage data is being digitized and made available as text via optical character recognition (OCR). Scholars and libraries are increasingly using OCR-generated text for retrieval and analysis. However, the process of creating text through OCR introduces varying degrees of error to the text. The impact of... -
Conference paper (published)
Living Machines: A study of atypical animacy
This paper proposes a new approach to animacy detection, the task of determining whether an entity is represented as animate in a text. In particular, this work is focused on atypical animacy and examines the scenario in which typically inanimate objects, specifically machines, are given animate attributes. To address it,...Coll Ardanuy, Mariona ; Nanni, Federico ; Beelen, Kaspar ; Hosseini, Kasra ; Ahnert, Ruth …
nineteenth-century English, living machines, BERT, and animacy
-
Dataset
Living Machines atypical animacy dataset
Atypical animacy detection dataset, based on nineteenth-century sentences in English extracted from an open dataset of nineteenth-century books digitized by the British Library (available via https://doi.org/10.21250/db14, British Library Labs, 2014). This dataset contains 598 sentences containing mentions of machines. Each sentence has been annotated according to the animacy and humanness... -
Conference paper (published)
When Time Makes Sense: A Historically-Aware Approach to Targeted Sense Disambiguation
As languages evolve historically, making computational approaches sensitive to time can improve performance on specific tasks. In this work, we assess whether applying historical language models and time-aware methods help with determining the correct sense of polysemous words. We outline the task of time-sensitive Targeted Sense Disambiguation (TSD), which aims...Beelen, Kaspar ; Nanni, Federico ; Coll Ardanuy, Mariona ; Hosseini, Kasra ; Tolfo, Giorgia …
-
Journal article
MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at Scale
We present MapReader, a free, open-source software library written in Python for analyzing large map collections (scanned or born-digital). This library transforms the way historians can use maps by turning extensive, homogeneous map sets into searchable primary sources. MapReader allows users with little or no computer vision expertise to i)...Hosseini, Kasra ; Wilson, Daniel C.S. ; Beelen, Kaspar ; McDonough, Katherine