Buscar
Resultados de la búsqueda
-
Abstract
Historic machines from 'prams' to 'Parliament': new avenues for collaborative linguistic research
Research in computational linguistics has made successful attempts at modelling word meaning at scale, but much remains to be done to put these computational models to the test of historical scholarship (see e.g. Beelen et al. 2021). More importantly, a lot of computational research looks at texts in a historical...Ridge, Mia ; Tolfo, Giorgia ; Westerling, Kalle ; Pedrazzini, Nilo ; McGillivray, Barbara
digital humanities, crowdsourcing, and computational linguistics
-
Journal article
Can I believe what I see? Data visualisation and trust in the humanities
Questions of trust are increasingly important in relation to data and its use. The authors focus on humanities data and its visualisation, through analysis of their own recent projects with museums, archives and libraries internationally. Their account connects the specifics of hands-on digital humanities work to larger epistemological questions. They...Boyd Davis, Stephen ; Vane, Olivia ; Kräutli, Florian
scepticism, critical design, interdisciplinarity, ethics, digital humanities, interrogability, data visualisation, and GLAM
-
Conference paper (published)
DeezyMatch: A Flexible Deep Learning Approach to Fuzzy String Matching
We present DeezyMatch, a free, open-source software library written in Python for fuzzy string matching and candidate ranking. Its pair classifier supports various deep neural network architectures for training new classifiers and for fine-tuning a pretrained model, which paves the way for transfer learning in fuzzy string matching. This approach...Hosseini, Kasra ; Nanni, Federico ; Coll Ardanuy, Mariona
machine learning, Natural Language Processing, string matching, digital humanities, and toponym matching
-
Dataset
Living Machines atypical animacy dataset
Atypical animacy detection dataset, based on nineteenth-century sentences in English extracted from an open dataset of nineteenth-century books digitized by the British Library (available via https://doi.org/10.21250/db14, British Library Labs, 2014). This dataset contains 598 sentences containing mentions of machines. Each sentence has been annotated according to the animacy and humanness... -
Dataset
Living with Machines alpha and beta Zooniverse 'accident' task data
Data created through crowdsourcing tasks hosted on the Zooniverse platform. Members of the public were asked to look at a selection of articles from 19th century newspapers that mentioned machines and decide if they described an industrial accident. A further task asked participants to transcribe personal, organisational and place names...Zooniverse volunteers
citizen history, digital history, digital humanities, Living with Machines, newspapers, and crowdsourcing
-
Conference paper (unpublished)
Assessing the Impact of OCR Quality on Downstream NLP Tasks
A growing volume of heritage data is being digitized and made available as text via optical character recognition (OCR). Scholars and libraries are increasingly using OCR-generated text for retrieval and analysis. However, the process of creating text through OCR introduces varying degrees of error to the text. The impact of...