Text extracted from digitised maps of eastern Africa circa 1880-1940 - British Library Research Repository
Skip to main content
Shared Research Repository
Dataset

Text extracted from digitised maps of eastern Africa circa 1880-1940

4 May 2020

Abstract

This dataset comprises an Excel spreadsheet of text extracted from almost 2,000 digital images of maps and documents held in the War Office Archive, covering a large part of eastern Africa between c.1880 and 1940. The items were catalogued and digitised with generous funding from Indigo Trust. The harvested text includes names of historical settlements and ethnic regions in eastern Africa, descriptions of historical land use, topography and vegetation, and notes of ethnographic, military or administrative context. Auto-extraction of the text was carried out using the Google Vision API. The spreadsheet provides text found, confidence scores from Google Vision relating to transcription accuracy, locations of text on each image, and links to a geographical search interface that enables access to the relevant images and catalogue records hosted on the BL website. A large number of erroneous text results were cleaned from the full set of Google Vision responses, but some errors remain - for example, where individual characters have been incorrectly transcribed within words, though the words themselves should still be identifiable. In addition, not all words appearing on the maps were captured.

Files

File nameDate UploadedVisibilityFile size
War_Office_Archive_Eastern_Africa_Text_Harvested.xlsx
22 May 2020
Public
12.2 MB