Buscar
Resultados de la búsqueda
-
Dataset
Decade-level Word2Vec models from automatically transcribed 19th-century newspapers digitised by the British Library (1800-1919)
Word embeddings trained on a 4.2-billion-word corpus of 19th-century British newspapers using Word2Vec and specific parameters. The embeddings are divided into periods of ten years each. Unlike those in this repository, these were not aligned and OCR errors skimmed from the vocabulary. See related GitHub repository for the full documentation:...Pedrazzini, Nilo
historical semantics, British newspapers, word embeddings, word vectors, word2vec, and Late Modern English
-
Dataset
Diachronic and diatopic word embeddings from newspapers digitised by the British Library (1830-1889): North and South England
Diachronic word embeddings (decade-level) trained with Word2Vec (via Gensim) on different geographic subcorpora of the Heritage Made Digital British and the Living with Machines historical newspaper collections: - North England (north.zip) - South England (south.zip) At the moment, for each subcorpus, Word2Vec models are available for each decade in the...Pedrazzini, Nilo ; McGillivray, Barbara
historical semantics, diachronic embeddings, late modern English, word embeddings, word vectors, word2vec, and diatopic embeddings