Using smart annotations to map the geography of newspapers
PubblicoDeposited
Creator
Ryan, Yann
()
Coll Ardanuy, Mariona
()
van Strien, Daniel
()
Hosseini, Kasra
()
Beelen, Kaspar
()
Hetherington, James
()
McDonough, Katherine
()
McGillivray, Barbara
()
Ridge, Mia
()
Vane, Olivia
()
Wilson, Daniel C.S.
()
2020
Aggiungere alla collezione
Non hai accesso ad alcuna raccolta esistente. È possibile creare una nuova raccolta.
Abstract
Geographic information is a key component in the description of collection objects, and yet its format is often unsuited for use with methods of geographic analysis. Catalogue entries are often inconsistent, in plain text, and without geographic coordinates (much less coordinates linked to authority records). Georesolution of the relevant fields (by matching text strings to citable external resources which do have spatial coordinates) makes catalogue data machine-readable and allows collection exploration that more fully takes the geographic dimension of metadata into account.
Geographic metadata analysis requires a very high-quality resolution of the relevant metadata fields. However, georeferencing by hand is highly time-consuming, and both state-of-the-art georesolution systems and off-the-shelf geocoders achieve highly varying performance depending on the datasets they are applied to (Gritta 2019, DeLozier 2015, Alex 2015). Here, we propose a strategy to resolve place names in metadata that uses an active learning method based on heterogeneous uncertainty sampling (Lewis and Catlett 1994). This method, which we call “smart annotation” (because it depends on selective human feedback), significantly reduces the number of manual annotations by actively querying the user only for less certain matches. We applied this method to the British Library (BL) newspaper title catalogue and obtained 25,000 high-quality georeferenced records in less than three hours.