Index Catalog // British Library

Research Repository

Borrar filtros

Filtrado por: Availability External link (access may be restricted) Palabra clave word embeddings Colección Living with Machines

2023

Dataset

Decade-level Word2Vec models from automatically transcribed 19th-century newspapers digitised by the British Library (1800-1919)

Word embeddings trained on a 4.2-billion-word corpus of 19th-century British newspapers using Word2Vec and specific parameters. The embeddings are divided into periods of ten years each. Unlike those in this repository, these were not aligned and OCR errors skimmed from the vocabulary. See related GitHub repository for the full documentation:...

Pedrazzini, Nilo
historical semantics, British newspapers, word embeddings, word vectors, word2vec, and Late Modern English
2023

Dataset

Diachronic and diatopic word embeddings from newspapers digitised by the British Library (1830-1889): North and South England

Diachronic word embeddings (decade-level) trained with Word2Vec (via Gensim) on different geographic subcorpora of the Heritage Made Digital British and the Living with Machines historical newspaper collections: - North England (north.zip) - South England (south.zip) At the moment, for each subcorpus, Word2Vec models are available for each decade in the...

Pedrazzini, Nilo ; McGillivray, Barbara
historical semantics, diachronic embeddings, late modern English, word embeddings, word vectors, word2vec, and diatopic embeddings