Index Catalog // British Library

2015

Dataset

Volumes of Lysons Collectanea (Trades), comprising advertisements, cuttings, and illustrations relating to trades, professions, medical cures. 1660-1825.

The dataset comprises the OCR text derived from four digitised volumes of a collection of advertisements, cuttings and illustrations relating to trades, professions and medical cures from 1660 - 1825.

British Library

text, newspapers, OCR, trades, and adverts

2015

Dataset

Volumes of Lysons Collectanea (Amusements), comprising broadsides, cuttings, advertisements on amusements 1660-1840

The dataset comprises nine digitised volumes of a collection of broadsides, cuttings and advertisements, relating to public exhibitions and places of amusement from 1660 - 1840 (with OCR-derived text.) Part of the Lysons Collectanea collection.

British Library

amusements, text, newspapers, broadsides, OCR, and adverts

2019

Dataset

British and Irish Newspapers

A title-level list of British, Irish, British Overseas Territories and Crown Dependencies newspapers held by the British Library.

British Library

datasets, catalogues, media, newspapers, periodicals, and metadata

2020

Dataset

Living with Machines alpha and beta Zooniverse 'accident' task data

Data created through crowdsourcing tasks hosted on the Zooniverse platform. Members of the public were asked to look at a selection of articles from 19th century newspapers that mentioned machines and decide if they described an industrial accident. A further task asked participants to transcribe personal, organisational and place names...

Zooniverse volunteers

crowdsourcing, digital history, citizen history, Living with Machines, newspapers, and digital humanities

2021

Dataset

Dataset for Toponym Resolution in Nineteenth-Century English Newspapers

We present a new dataset for the task of toponym resolution in digitised historical newspapers in English. It consists of 343 annotated articles from newspapers based in four different locations in England (Manchester, Ashton-under-Lyne, Poole and Dorchester), published between 1780 and 1870. The articles have been manually annotated with mentions...

Coll Ardanuy, Mariona ; Beavan, David ; Beelen, Kaspar ; Hosseini, Kasra ; Lawrence, Jon …

nineteenth-century English, geographic information retrieval, newspapers, toponym resolution, and dataset

2021

Dataset

Dataset for Toponym Resolution in Nineteenth-Century English Newspapers

We present a new dataset (version 2) for the task of toponym resolution in digitised historical newspapers in English. It consists of 455 annotated articles from newspapers based in four different locations in England (Manchester, Ashton-under-Lyne, Poole and Dorchester), published between 1780 and 1870. The articles have been manually annotated...

Coll Ardanuy, Mariona ; Beavan, David ; Beelen, Kaspar ; Hosseini, Kasra ; Lawrence, Jon …

nineteenth-century English, dataset, newspapers, toponym resolution, and geographic information retrieval

2022

Dataset

British Library Newspaper Title-level List: A list of catalogued newspaper titles held by the British Library

A title-level list of catalogued newspapers held by the British Library.

British Library

datasets, catalogues, media, newspapers, periodicals, and metadata

2023

Dataset

The Newspaper Press Directory (1846-1920) - enriched and structured version

Mitchell's Newspaper Press Directories contained an almost complete list of newspapers published in England, Wales, Scotland and Ireland. It was published regularly from 1846 onwards and provided a detailed description of the newspaper landscape over time. This version contains a structured, tabular representation of the directories (as CSV or Excel...

C. Mitchell and Co. ; British Library

press directories and newspapers

2022

Dataset

Diachronic word embeddings from 19th-century newspapers digitised by the British Library (1800-1919)

Word vectors related to the paper "Machines in the media: semantic change in the lexicon of mechanization in 19th-century British newspapers" by Nilo Pedrazzini and Barbara McGillivray (2022). The embeddings were trained on a 4.2-billion-word corpus of 19th-century British newspapers using Word2Vec and specific parameters. The embeddings are divided into...

Pedrazzini, Nilo ; McGillivray, Barbara

historical semantics, word-vectors, late-modern-english, newspapers, diachronic-embeddings, and word2vec

2023

Dataset

DeezyMatch training set for OCR

Optical character recognition (OCR) is the process of automatically transcribing text from images. The presence of OCR-induced errors in digitised text is a common problem in the digital humanities. OCR errors are usually due to the misrecognition of characters, such as "h" recognised as "b", or "c" recognised as "o"....

Coll Ardanuy, Mariona ; Nanni, Federico ; Pedrazzini, Nilo

OCR, fuzzy string matching, string variation, newspapers, digital humanities, natural language processing, DeezyMatch, and Living with Machines

Research Repository

2015

Dataset

Volumes of Lysons Collectanea (Trades), comprising advertisements, cuttings, and illustrations relating to trades, professions, medical cures. 1660-1825.

2015

Dataset

Volumes of Lysons Collectanea (Amusements), comprising broadsides, cuttings, advertisements on amusements 1660-1840

2019

Dataset

British and Irish Newspapers

2020

Dataset

Living with Machines alpha and beta Zooniverse 'accident' task data

2021

Dataset

Dataset for Toponym Resolution in Nineteenth-Century English Newspapers

2021

Dataset

Dataset for Toponym Resolution in Nineteenth-Century English Newspapers

2022

Dataset

British Library Newspaper Title-level List: A list of catalogued newspaper titles held by the British Library

2023

Dataset

The Newspaper Press Directory (1846-1920) - enriched and structured version

2022

Dataset

Diachronic word embeddings from 19th-century newspapers digitised by the British Library (1800-1919)

2023

Dataset

DeezyMatch training set for OCR

Affina la ricerca

Type

Resource Type

Creator

Parola chiave

Lingua

Collezione

Institution

Availability

Research Repository

Ricerca

Risultati della ricerca

2015

Dataset

2015

Dataset

2019

Dataset

2020

Dataset

2021

Dataset

2021

Dataset

2022

Dataset

2023

Dataset

2022

Dataset

2023

Dataset

Affina la ricerca