Buscar
Resultados de la búsqueda
-
Dataset
Diachronic word embeddings from 19th-century newspapers digitised by the British Library (1800-1919)
Word vectors related to the paper "Machines in the media: semantic change in the lexicon of mechanization in 19th-century British newspapers" by Nilo Pedrazzini and Barbara McGillivray (2022). The embeddings were trained on a 4.2-billion-word corpus of 19th-century British newspapers using Word2Vec and specific parameters. The embeddings are divided into...Pedrazzini, Nilo ; McGillivray, Barbara
historical semantics, word-vectors, late-modern-english, newspapers, diachronic-embeddings, and word2vec
-
Dataset
Diachronic and diatopic word embeddings from newspapers digitised by the British Library (1830-1889): North and South England
Diachronic word embeddings (decade-level) trained with Word2Vec (via Gensim) on different geographic subcorpora of the Heritage Made Digital British and the Living with Machines historical newspaper collections: - North England (north.zip) - South England (south.zip) At the moment, for each subcorpus, Word2Vec models are available for each decade in the...Pedrazzini, Nilo ; McGillivray, Barbara
historical semantics, diachronic embeddings, late modern English, word embeddings, word vectors, word2vec, and diatopic embeddings
-
Conference paper (published)
When Time Makes Sense: A Historically-Aware Approach to Targeted Sense Disambiguation
As languages evolve historically, making computational approaches sensitive to time can improve performance on specific tasks. In this work, we assess whether applying historical language models and time-aware methods help with determining the correct sense of polysemous words. We outline the task of time-sensitive Targeted Sense Disambiguation (TSD), which aims...Beelen, Kaspar ; Nanni, Federico ; Coll Ardanuy, Mariona ; Hosseini, Kasra ; Tolfo, Giorgia …
-
Abstract
Using smart annotations to map the geography of newspapers
Geographic information is a key component in the description of collection objects, and yet its format is often unsuited for use with methods of geographic analysis. Catalogue entries are often inconsistent, in plain text, and without geographic coordinates (much less coordinates linked to authority records). Georesolution of the relevant fields...Ryan, Yann ; Coll Ardanuy, Mariona ; van Strien, Daniel ; Hosseini, Kasra ; Beelen, Kaspar …