Index Catalog // British Library

2023

Dataset

Incunabula Printed Catalogue Dataset: Volumes 1-10

This dataset includes the catalogue entries derived from volumes 1-10 of the "Catalogue of books printed in the 15th century now at the British Museum" (know as BMC). The BMC was published between 1908-2007 and comprises detailed descriptions of the incunabula collection at the British Library. The dataset was created...

British Library

datasets, catalogues, early printing, book history, early printed books, metadata, and incunabula

2023

Dataset

Incunabula Printed Catalogue Dataset: Volumes 1-10 copy of github repository

This dataset includes the github repository used to derive catalogue entries from volumes 1-10 of the "Catalogue of books printed in the 15th century now at the British Museum" (know as BMC). The BMC was published between 1908-2007 and comprises detailed descriptions of the incunabula collection at the British Library....

British Library

book history, metadata, catalogues, datasets, incunabula, early printed books, and early printing

2023

Dataset

Decade-level Word2Vec models from automatically transcribed 19th-century newspapers digitised by the British Library (1800-1919)

Word embeddings trained on a 4.2-billion-word corpus of 19th-century British newspapers using Word2Vec and specific parameters. The embeddings are divided into periods of ten years each. Unlike those in this repository, these were not aligned and OCR errors skimmed from the vocabulary. See related GitHub repository for the full documentation:...

Pedrazzini, Nilo

historical semantics, British newspapers, word embeddings, word vectors, word2vec, and Late Modern English

2022

Dataset

Diachronic word embeddings from 19th-century newspapers digitised by the British Library (1800-1919)

Word vectors related to the paper "Machines in the media: semantic change in the lexicon of mechanization in 19th-century British newspapers" by Nilo Pedrazzini and Barbara McGillivray (2022). The embeddings were trained on a 4.2-billion-word corpus of 19th-century British newspapers using Word2Vec and specific parameters. The embeddings are divided into...

Pedrazzini, Nilo ; McGillivray, Barbara

historical semantics, word-vectors, late-modern-english, newspapers, diachronic-embeddings, and word2vec

2023

Dataset

Datasets for toponym recognition and disambiguation for nineteenth-century English newspapers

We present two datasets, one for the task of toponym recognition and one for the task of toponym disambiguation. The datasets are derived from the "Dataset for Toponym Resolution in Nineteenth-Century English Newspapers" (DOI: https://doi.org/10.23636/r7d4-kw08). The toponym recognition dataset consists of two JSON files (ner_fine_train.json and ner_fine_dev.json), whereas the toponym...

Coll Ardanuy, Mariona ; Nanni, Federico

toponym disambiguation, nineteenth-century newspapers, named entity recognition, entity linking, toponym resolution, toponym recognition, and dataset

2023

Dataset

DeezyMatch training set for OCR

Optical character recognition (OCR) is the process of automatically transcribing text from images. The presence of OCR-induced errors in digitised text is a common problem in the digital humanities. OCR errors are usually due to the misrecognition of characters, such as "h" recognised as "b", or "c" recognised as "o"....

Coll Ardanuy, Mariona ; Nanni, Federico ; Pedrazzini, Nilo

OCR, fuzzy string matching, string variation, newspapers, digital humanities, natural language processing, DeezyMatch, and Living with Machines

2022

Dataset

MapReader_Data_SIGSPATIAL_2022

Hosseini, Kasra ; Wilson, Daniel C.S. ; Beelen, Kaspar ; McDonough, Katherine

Deep learning, Supervised learning, Computer vision, Historical maps, Digital libraries and archives, and Classification

2023

Geographical dataset

Sarah FitzGerald's PhD placement project folder

This dataset is a zip file that contains the complete folder structure that Sarah used to manage this project. The content includes her planning, work, and outcomes, in the form of reports, presentations and blog posts. In addition to the data visualisations on the projects relating to Africa, Sarah also...

FitzGerald, Sarah

West Africa, research collaboration, projects, Africa, humanities, digital scholarship, and data visualisation

2023

Software

Living-with-machines/MapReader: End of LwM

This release marks the end of the current funding for MapReader during the Living with Machines (LwM) project. @kasra-hosseini @andrewphilipsmith @rwood-97 @kmcdono2 @dcsw2 @kallewesterling @kasparvonbeelen

Hosseini, Kasra ; Wood, Rosie ; Smith, Andy ; McDonough, Katie ; Wilson, Daniel C. S. …

computer vision and maps

2021

Dataset

EAP696 Catalogue Metadata

This Excel spreadsheet contains the metadata that describes the archival collection digitised in Bulgaria by the EAP696 "Minority press in Ottoman Turkish in Bulgaria" project team. The metadata was originally created by the EAP696 project team that digitised the archive in 2014. The project team was led by Mr Stoyan...

EAP696 Project Team

Turkish minority press, Bulgaria, and Ottoman Empire

Research Repository

2023

Dataset

Incunabula Printed Catalogue Dataset: Volumes 1-10

2023

Dataset

Incunabula Printed Catalogue Dataset: Volumes 1-10 copy of github repository

2023

Dataset

Decade-level Word2Vec models from automatically transcribed 19th-century newspapers digitised by the British Library (1800-1919)

2022

Dataset

Diachronic word embeddings from 19th-century newspapers digitised by the British Library (1800-1919)

2023

Dataset

Datasets for toponym recognition and disambiguation for nineteenth-century English newspapers

2023

Dataset

DeezyMatch training set for OCR

2022

Dataset

MapReader_Data_SIGSPATIAL_2022

2023

Geographical dataset

Sarah FitzGerald's PhD placement project folder

2023

Software

Living-with-machines/MapReader: End of LwM

2021

Dataset

EAP696 Catalogue Metadata

Limite su búsqueda

Type

Resource Type

Creator

Palabra clave

Idioma

Colección

Institution

Availability

Research Repository

Buscar

Resultados de la búsqueda

2023

Dataset

2023

Dataset

2023

Dataset

2022

Dataset

2023

Dataset

2023

Dataset

2022

Dataset

2023

Geographical dataset

2023

Software

2021

Dataset

Limite su búsqueda