# StopsGB: Structured Timeline of Passenger Stations in Great Britain ## Abstract Michael Quick's book _Railway Passenger Stations in Great Britain: a Chronology_ offers a uniquely rich and detailed account of Britain's changing railway infrastructure. Its listing of over 12,000 stations allows us to reconstruct the coming of rail at both micro- and macro-scales. However, being published originally as a book (and subsequently online as a PDF created from an underlying MS Word document), this resource was not well suited for systematic linking to other data. We now present a new, automatically generated dataset that provides the rich detail of this exceptional resource in a structured format. Each station described in the _Chronology_ is given certain attributes, such as operating companies and opening and closing dates, and is georeferenced and linked---whenever possible---to its corresponding entry on Wikidata. We name this structured, linked, and georeferenced dataset 'StopsGB' (Structured Timeline of Passenger Stations in Great Britain), and we make it openly available. We believe this dataset (and the method used to create it) will be of widespread interest across the historical, digital library and semantic web communities, and that it will be a key resource for ongoing research into the impact of the railway in Great Britain. ## Dataset description The dataset is a TSV file with the following fields: * **PlaceId:** Internal autoincrement ID of the place where the station is located. * **StationId:** Internal autoincrement ID of the station. * **Place:** Name of the place where the station is located. * **AbbrStation:** Name of the station as appears in the _Chronology_, i.e. abbreviated (e.g. 'A TOWN' for 'ABBEY TOWN'). * **Station:** Automatically expanded name of the station (e.g. 'ABBEY TOWN' for 'A TOWN'). * **Company:** Wikidata ID of the first company mentioned for the station in the _Chronology_. * **AltCompanies:** Wikidata IDs of any other companies mentioned. * **Altnames:** Any alternate name for the railway station, mentioned in the entry's description in the _Chronology_. * **Referenced:** References to other railway stations, mentioned in the entry's description in the _Chronology_. * **Opening:** Date when the railway station first opened, either in YYYY-MM-DD format or "unknown" (if the opening date is not in the _Chronology_ or could not be automatically detected). * **Closing:** Date when the railway station last closed, either in YYYY-MM-DD format, "unknown" (if the closing date is not in the _Chronology_ or could not be automatically detected), or "still open" (if the _Chronology_ specifies that the station was still open at the time of writing). * **Interrupted:** Whether there was any interruption, such as the station closing temporarily (True or False). * **selected_entity:** Wikidata ID for the entity selected as the best match for this Chronology entry. * **selected_entity_label:** Name of the Wikidata entity that our method has selected as best match for this Chronology entry. * **selected_entity_type:** Whether the selected entity is based on the "Station" field (in which case we would expect coordinates to exactly match the location of the railway station) or on the "Place" field (in which case coordinates are an approximation, and based on the location of the matched "Place"). * **selected_entity_latitude:** Latitude of our method's selected Wikidata entity. * **selected_entity_longitude:** Longitude of our method's selected Wikidata entity. * **predicted_station:** Our method's prediction for the "Station" field (it will be the same as `selected_entity` unless `conf_station` is below a certain threshold). * **conf_station:** Confidence of our method's prediction for the "Station" field. * **predicted_place:** Our method's prediction for the "Place" field (it will be the same as `selected_entity` if `conf_station` is below a certain threshold). * **conf_place:** Confidence of our method's prediction for the "Place" field. * **cross_ref:** True if the entry is a cross-reference of another entry, False otherwise. Useful to filter out to identify unique stations. * **ghost_entry:** True if the entry is just a header for other stations, False otherwise. Useful to filter out to identify unique stations. The dataset lists 12,676 railway stations in 9,667 places. ## License The dataset is released under open license CC-BY-NC-SA, available at https://creativecommons.org/licenses/by-nc-sa/4.0/. ## Copyright notice Original data from _Railway Passenger Stations in Great Britain: a Chronology_ by Michael Quick. This dataset was created from v5.02 (2020) of the above work. Used with permission from The Railway and Canal Historical Society ©. Future updates to the source data will be posted by the RCHS at https://rchs.org.uk/railway-passenger-stations-in-great-britain-a-chronology/. Additional data from Living with Machines © The Alan Turing Institute, British Library Board, Queen Mary University of London, University of Exeter, University of East Anglia and University of Cambridge. The above acknowledgement must be included in all copies or substantial portions or reuses of this dataset. ## Funding statement This work was supported by Living with Machines (AHRC grant AH/S01179X/1). This project, funded by the UK Research and Innovation (UKRI) Strategic Priority Fund, is a multidisciplinary collaboration delivered by the Arts and Humanities Research Council (AHRC), with The Alan Turing Institute, the British Library and the Universities of Cambridge, East Anglia, Exeter, and Queen Mary University of London. ## Dataset creators Work for this paper was produced as part of the Living with Machines project. Contributors: * Conceptualisation: Katherine McDonough, Jon Lawrence, Daniel C.S. Wilson * Methodology: Mariona Coll Ardanuy, Federico Nanni, Kaspar Beelen * Implementation: Mariona Coll Ardanuy, Federico Nanni, Kaspar Beelen, Giorgia Tolfo * Reproducibility: Federico Nanni, Mariona Coll Ardanuy * Historical Analysis: Kaspar Beelen, Katherine McDonough, Jon Lawrence, Joshua Rhodes, Daniel C.S. Wilson * Data Acquisition and Curation: Daniel C.S. Wilson, Mariona Coll Ardanuy, Giorgia Tolfo, Federico Nanni * Annotation: Jon Lawrence, Katherine McDonough * Project management: Mariona Coll Ardanuy * Writing and editing: all authors. ## Cite Mariona Coll Ardanuy, Kaspar Beelen, Jon Lawrence, Katherine McDonough, Federico Nanni, Joshua Rhodes, Giorgia Tolfo, and Daniel C.S. Wilson (2021), Station to Station: Linking and Enriching Historical British Railway Data, In Computational Humanities Research (CHR 2021).