Research Repository

Journal article

Neural Language Models for Nineteenth-Century English

上市 Deposited

Creator

Hosseini, Kasra ( )
Beelen, Kaspar
Colavizza, Giovanni ( )
Coll Ardanuy, Mariona ( )

2021

Abstract

We present four types of neural language models trained on a large historical dataset of books in English, published between 1760-1900 and comprised of ~5.1 billion tokens. The language model architectures include static (word2vec and fastText) and contextualized models (BERT and Flair). For each architecture, we trained a model instance using the whole dataset. Additionally, we trained separate instances on text published before 1850 for the two static models, and four instances considering different time slices for BERT. Our models have already been used in various downstream tasks where they consistently improved performance. In this paper, we describe how the models have been created and outline their reuse potential.

Items:

缩图	文件名	上载日期	能见度	File Size	动作
	48-761-1-PB.pdf	2021-12-13	上市	610 KB	Download Download (as thumbnail)

Metadata

Resource Type: Journal article
Creator: Hosseini, Kasra ( )

Beelen, Kaspar

Colavizza, Giovanni ( )

Coll Ardanuy, Mariona ( )
Contributor: British Library ( Collaborative Institution )

Living with Machines ( Research Group )
Date published: 2021
Institution: British Library
Project name: Living with Machines
Funder: Name: Arts and Humanities Research Council

Awards: AH/S01179X/1
Journal title: Journal of Open Humanities Data
Publisher: Ubiquity Press
Official URL: https://doi.org/10.5334/johd.48
Licence: CC BY 4.0 Attribution
DOI: 10.5334/johd.48

人际关系

在Collection中：

Living with Machines