DUKweb (Diachronic UK web) - British Library Research Repository
Skip to main content
Shared Research Repository
Dataset

DUKweb (Diachronic UK web)

8 October 2020

Abstract

We present DUKweb, a set of large-scale resources useful for the diachronic analysis of contemporary English. The dataset is derived from JISC UK Web Domain Dataset (1996-2013), which collects resources from the Internet Archive that were hosted on domains ending in ‘.uk’. The dataset includes co-occurrences matrices for each year and two types of word vectors by year, Temporal Random Indexing vectors and word2vec embeddings.

Files

File nameDate UploadedVisibilityFile size
2000.csv.zip
19 Oct 2020
Public
115 MB
2001.csv.zip
19 Oct 2020
Public
198 MB
2002.csv.zip
19 Oct 2020
Public
278 MB