Index Catalog // British Library

Research Repository

Borrar filtros

Filtrado por: Institution British Library Research Repository Creator Pedrazzini, Nilo Availability File publicly available Resource Type Dataset

2023

Dataset

Language of Mechanisation: annotated historical newspaper articles

Datasets created through crowdsourcing tasks created on the Zooniverse crowdsourcing platform by the Living with Machines ‘language of mechanisation’ project team. Building on earlier work classifying machines by function, we asked volunteers on Zooniverse 'how did the word x change over time and place?' and presented them with options for...

British Library ; Ridge, Mia ; Pedrazzini, Nilo ; McGillivray, Barbara
crowdsourcing, 19th century British English, annotation, historical newspapers, mechanisation, data visualisation, historical semantics, and transport history
2023

Dataset

DeezyMatch training set for OCR

Optical character recognition (OCR) is the process of automatically transcribing text from images. The presence of OCR-induced errors in digitised text is a common problem in the digital humanities. OCR errors are usually due to the misrecognition of characters, such as "h" recognised as "b", or "c" recognised as "o"....

Coll Ardanuy, Mariona ; Nanni, Federico ; Pedrazzini, Nilo
OCR, fuzzy string matching, string variation, newspapers, digital humanities, natural language processing, DeezyMatch, and Living with Machines