Ricerca
Risultati della ricerca
-
Dataset
DeezyMatch training set for OCR
Optical character recognition (OCR) is the process of automatically transcribing text from images. The presence of OCR-induced errors in digitised text is a common problem in the digital humanities. OCR errors are usually due to the misrecognition of characters, such as "h" recognised as "b", or "c" recognised as "o".... -
Dataset
Living with Machines alpha and beta Zooniverse 'accident' task data
Data created through crowdsourcing tasks hosted on the Zooniverse platform. Members of the public were asked to look at a selection of articles from 19th century newspapers that mentioned machines and decide if they described an industrial accident. A further task asked participants to transcribe personal, organisational and place names...Zooniverse volunteers
crowdsourcing, digital history, citizen history, Living with Machines, newspapers, and digital humanities