Ricerca
Risultati della ricerca
-
Conference paper (published)
MapReader: a computer vision pipeline for the semantic exploration of maps at scale
We present MapReader, a free, open-source software library written in Python for analyzing large map collections. MapReader allows users with little computer vision expertise to i) retrieve maps via web-servers; ii) preprocess and divide them into patches; iii) annotate patches; iv) train, fine-tune, and evaluate deep neural network models; and...Hosseini, Kasra ; Wilson, Daniel C. S. ; Beelen, Kaspar ; McDonough, Katherine
maps and ordnance survey
-
Conference paper (published)
“Webcomics Archive? Now I'm Interested”: Comics Readers Seeking Information in Web Archives
There is a longstanding tradition of understanding information needs and interaction behavior across different user groups to inform the design of digital products and services. There is a gap in such research of comics readers, specifically how they seek and interact with the information and interfaces of web-based archives provided...Berube, Linda ; Makri, Stephann ; Cooke, Ian ; Priego, Ernesto ; Wisdom, Stella
-
Conference paper (published)
Locating a National Collection through Audience Research DH2022 Long Abstract
This abstract was submitted to the DH2022 conference where I presented a long paper. It explores how geography can help to engage the public with digital cultural heritage collections. It draws on audience research that examined values and motivations in the UK alongside the use of location-based interfaces such as...Rees, Gethin ; Vitale, Valeria ; Hunt, Alex ; Horgan, John ; Strachan, Peter
location, metadata, web maps, geography, cultural heritage, and interface design
-
Conference paper (published)
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
As language models grow ever larger, the need for large-scale high-quality text datasets has never been more pressing, especially in multilingual settings. The BigScience workshop, a 1-year international and multidisciplinary initiative, was formed with the goal of researching and training large language models as a values-driven undertaking, putting issues of...Laurençon, Hugo ; Saulnier, Lucile ; Wang, Thomas ; Akiki, Christopher ; Villanova del Moral, Albert …
-
Conference paper (published)
Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0
In this work, we explore whether the recently demonstrated zero-shot abilities of the T0 model extend to Named Entity Recognition for out-of-distribution languages and time periods. Using a historical newspaper corpus in 3 languages as test-bed, we use prompts to extract possible named entities. Our results show that a naive...De Toni, Francesco ; Akiki, Christopher ; De La Rosa, Javier ; Fourrier, Clémentine ; Manjavacas, Enrique …
-
Conference paper (published)
Towards a Collections Model for Preservation Planning at the British Library
The development of a framework for preservation planning at the British Library has highlighted the need for a more-structured understanding of its digital collections, in particular with regard to identifying the specific sets of objects that would be the focus of preservation plans. Work has recently commenced on developing a...Day, Michael ; Pennock, Maureen
-
Conference paper (published)
Design Patterns in Digital Preservation: Understanding Information Flows
This paper proposes a framework to help understand the different ways digital preservation goals can achieved, and the contextual factors these choices depend on. This is done through a worked example: three different design patterns representing the three possible modes of archival information flow, each illustrated with realistic examples and...Jackson, Andrew N
OAIS, design patterns, community, risk management, and innovation
-
Conference paper (published)
Exploring Software, Tools and Methods used in Web Archive Research
This paper is one part of a larger research project, titled, Web Archives - Researcher Skills and Tools (WARST). In this poster we focus on the data from the WARST study which examines the software, tools and methods used in the web archive research lifecycle.Schmid, Katharina ; Healy, Sharon ; Byrne, Helena
web archiving, web archive research, web archive users, and web archive creators
-
Conference paper (published)
When Time Makes Sense: A Historically-Aware Approach to Targeted Sense Disambiguation
As languages evolve historically, making computational approaches sensitive to time can improve performance on specific tasks. In this work, we assess whether applying historical language models and time-aware methods help with determining the correct sense of polysemous words. We outline the task of time-sensitive Targeted Sense Disambiguation (TSD), which aims...Beelen, Kaspar ; Nanni, Federico ; Coll Ardanuy, Mariona ; Hosseini, Kasra ; Tolfo, Giorgia …
-
Conference paper (published)
Living Machines: A study of atypical animacy
This paper proposes a new approach to animacy detection, the task of determining whether an entity is represented as animate in a text. In particular, this work is focused on atypical animacy and examines the scenario in which typically inanimate objects, specifically machines, are given animate attributes. To address it,...Coll Ardanuy, Mariona ; Nanni, Federico ; Beelen, Kaspar ; Hosseini, Kasra ; Ahnert, Ruth …
nineteenth-century English, living machines, BERT, and animacy
-
Conference paper (published)
Towards a Foundation for Collaborative Digital Archiving with Local Concert-Giving Organisations
The centenaries of former chapters of the British Music Society (BMS), established in 1918, have prompted their governing bodies to take stock of their histories and build on the cataloguing, documentation and preservation of their archival collections. The InterMusE project aims to support this shared instinct to archive by capturing... -
Conference paper (published)
Developing an Open-Source Corpus of Yoruba Speech
This paper introduces an open-source speech dataset for Yoruba — one of the largest low-resource West African languages spoken by at least 22 million people. Yoruba is one of the official languages of Nigeria, Benin and Togo, and is spoken in other neighboring African countries and beyond. The corpus consists...Gutkin, Alexander ; Demirşahin, Işın ; Kjartansson, Oddur ; Rivera, Clara ; Túbọ̀sún, Kọ́lá
-
Conference paper (published)
ICIDS2020 Panel: Building the Discipline of Interactive Digital Narratives
Building our discipline has been an ongoing discussion since the early days of ICIDS. From earlier international joint efforts to integrate research from multiple fields of study to today’s endeavours by researchers to provide scholarly works of reference, the discussion on how to continue building Interactive Digital Narratives as a...Bernstein, Mark ; Palosaari Eladhari, Mirjam ; Koenitz, Hartmut ; Louchart, Sandy ; Nack, Frank …
-
Conference paper (published)
DeezyMatch: A Flexible Deep Learning Approach to Fuzzy String Matching
We present DeezyMatch, a free, open-source software library written in Python for fuzzy string matching and candidate ranking. Its pair classifier supports various deep neural network architectures for training new classifiers and for fine-tuning a pretrained model, which paves the way for transfer learning in fuzzy string matching. This approach...Hosseini, Kasra ; Nanni, Federico ; Coll Ardanuy, Mariona
Natural Language Processing, string matching, toponym matching, machine learning, and digital humanities
-
Conference paper (published)
Archiving Interactive Narratives at the British Library
This paper describes the creation of the Interactive Narratives collection in the UK Web Archive, as part of the UK Legal Deposit Libraries Emerging Formats Project. The aim of the project is to identify, collect and preserve complex digital publications that are in scope for collection under UK Non-Print Legal...Clark, Lynda ; Rossi, Giulia Carla ; Wisdom, Stella
Emerging Formats, digital storytelling, new media collection management, Interactive Narratives collection, digital preservation, and web archiving
-
Conference paper (published)
Crowd- and Community-Fuelled Archaeology. Early Results from the MicroPasts Project
The MicroPasts project is a novel experiment in the use of crowd-based methodologies to enable participatory archaeological research. Building on a long tradition of offline community archaeology in the UK, this initiative aims to integrate crowd-sourcing, crowd-funding and forum-based discussion to encourage groups of academics and volunteers to collaborate on...Bonacchi, Chiara ; Bevan, Andrew ; Pett, Daniel ; Keinan-Schoonbaert, Adi
Public Archaeology, crowd-funding, crowd-sourcing, and online communities
-
Conference paper (published)
Preserving eBooks: Past, Present and Future - A Series of National Library Perspectives
This panel will present and discuss different eBook workflows and challenges from four national libraries, considering a range of issues from technical complexities to evolution of the content type and changes in the publishing/collecting landscape.Owens, Trevor ; Pennock, Maureen ; Smyth, Tom ; Steinke, Tobias
access, ingest, ebooks, digital preservation, formats, and scale
-
Conference paper (published)
Dawn of Digital Repositories Certification under ISO 16363. Exploring the Horizon and beyond
The dawn of Trustworthy Digital Repository Certification under the ISO 16363:2012 standard is on the horizon. Across the digital preservation community, institutions are eager to learn more about the processes of preparing for and undergoing an ISO 16363 audit from an accredited third-party organization. As the first ISO 16363 audits...Giaretta, David ; LaPlant, Lisa ; Shiers, Jamie ; Tieman, Jessica ; Pennock, Maureen …
repository, certification, trustworthy, audit, and standards
-
Conference paper (published)
Resolving places, past and present: toponym resolution in historical British newspapers using multiple resources
Newspapers and their metadata are richly geographical, not only in their distribution but also their content. Attending to these spatial features is a prerequisite in newspaper research. Following other projects to have geoparsed place names in newspapers, we describe our approach to linking historical geospatial information in text to real-world...Coll Ardanuy, Mariona ; McDonough, Katherine ; Krause, Amrey ; Wilson, Daniel C.S. ; Hosseini, Kasra …
-
Conference paper (published)
CRISP: Crowdsourcing Representation Information to Support Preservation
In this paper, we describe a new collaborative approach to the collection of representation information to ensure long term access to digital content. Representation information is essential for successful rendering of digital content in the future. Manual collection and maintenance of RI has so far proven to be highly resource...Pennock, Maureen ; Jackson, Andrew N. ; Wheatley, Paul
-
Conference paper (published)
Cross-disciplinary Collaborations to Enrich Access to Non-Western Language Material in the Cultural Heritage Sector
The British Library is home to millions of items representing every age of written civilisation, including books, manuscripts and newspapers in all written languages. Large digitisation programmes currently underway are opening up access to this rich and unique historical content on an ever increasing scale. However, particularly for historical material...Derrick, Tom ; McGregor, Nora
HTR, page analysis, layout analysis, recognition, Bangla script, Arabic script, OCR, and datasets
-
Conference paper (published)
Multi-spectral Imaging at the British Library
The British Library is the national library of the United Kingdom and holds over 150 million items with an additional three million new items added each year. The 625 km of shelving contains manuscripts, maps, newspapers, magazines, prints, drawings, music scores and patents. The fundamental purpose of the Library is...Duffy, Christina
multi-spectral imaging, text-recovery, iron gall ink, digital, digitization, reagent, data, library, fire-damage, imaging, and erasure
-
Conference paper (published)
Using METS, PREMIS and MODS for Archiving eJournals: Paper - iPRES 2008 - London
As institutions turn towards developing archival digital repositories, many decisions on the use of metadata have to be made. In addition to deciding on the more traditional descriptive and administrative metadata, particular care needs to be given to the choice of structural and preservation metadata, as well as to integrating...Dappert, Angela ; Enders, Marcus
-
Conference paper (published)
Risk Assessment; using a risk based approach to prioritise handheld digital information
The British Library (BL) Digital Library Programme (DLP) has a broad set of objectives to achieve over the next few years, from web-archiving to the ingest of e-journals through to mass digitisation of newspapers and books. These projects are decided by the DLP programme board and are managed by the...McLeod, Rory
-
Conference paper (published)
Modeling Organizational Preservation Goals to Guide Digital Preservation
Digital preservation activities can only succeed if they go beyond the technical properties of digital objects. They must consider the strategy, policy, goals, and constraints of the institution that undertakes them and take into account the cultural and institutional framework in which data, documents and records are preserved. Furthermore, because...Dappert, Angela ; Farquhar, Adam
-
Conference paper (published)
Costing the Digital Preservation Lifecycle More Effectively
Having confidence in the permanence of a digital resource requires a deep understanding of the preservation activities that will need to be performed throughout its lifetime and an ability to plan and resource for those activities. The LIFE (Lifecycle Information For E-Literature) and LIFE2 Projects have advanced understanding of the...Wheatley, Paul
-
Conference paper (published)
Adapting Existing Technologies for Digitally Archiving Personal Lives. Digital Forensics, Ancestral Computing, and Evolutionary Perspectives and Tools
The adoption of existing technologies for digital curation, most especially digital capture, is outlined in the context of personal digital archives and the Digital Manuscripts Project at the British Library. Technologies derived from computer forensics, data conversion and classic computing, and evolutionary computing are considered. The practical imperative of moving...John, Jeremy Leighton
-
Conference paper (published)
The Integrated Preservation Suite: Scaled and automated preservation planning for highly diverse digital collections (long paper)
The Integrated Preservation Suite is an internally funded project at the British Library to develop and enhance the Library's preservation planning capabilities, largely focussed on automation and addressing the Library's heterogeneous collections. Through agile development practices, the project is iteratively designing and implementing the technical infrastructure for the suite as...May, Peter ; Pennock, Maureen ; Russo, David
software preservation, knowledge base, preservation watch, and preservation planning
-
Conference paper (published)
Developing a robust migration workflow for preserving and curating hand-held media
Many memory institutions hold large collections of hand-held media, which can comprise hundreds of terabytes of data spread over many thousands of data-carriers. Many of these carriers are at risk of significant physical degradation over time, depending on their composition. Unfortunately, handling them manually is enormously time consuming and so...Dappert, Angela ; Jackson, Andrew ; Kimura, Akiko
disk-copying robot, iPRES, data-carrier stabilization, auto loader, and digital preservation
-
Conference paper (published)
An analysis of contemporary JPEG2000 codecs for image format migration
This paper presents results of an analysis of different implementations of the JPEG2000 standard, specifically part 1: JP2, an image format that is currently popular within the digital preservation community. In particular we are interested in the effect different JPEG2000 codecs (encoders and decoders) have on image quality in response...Palmer, William ; May, Peter ; Cliff, Peter
TIFF, image quality, generational loss, JPEG2000, migration, codec, and PSNR
-
Conference paper (published)
Quality assured image file format migration in large digital object repositories
This article gives an overview on how different components developed by the SCAPE project are intended to be used in composite file format migration workflows; it will explain how the SCAPE platform can be employed to make sure that the workflows can be used to migrate very large image collections...Schlarb, Sven ; Cliff, Peter ; May, Peter ; Palmer, William ; Hahn, Matthias …
-
Conference paper (published)
Capturing and replaying streaming media in a web archive – a British Library case study
A prerequisite for digital preservation is to be able to capture and retain the content which is considered worth preserving. This has been a significant challenge or web archiving, especially for websites with embedded streaming media content, which cannot be copied via a simple HTTP request to a URL. This...Hockx-Yu, Helen ; Crawford, Lewis ; Coram, Roger ; Johnson, Stephen
-
Conference paper (published)
LIFE3: A predictive costing tool for digital collections
Predicting the costs of long-term digital preservation is a crucial yet complex task for even the largest repositories and institutions. For smaller projects and individual researchers faced with preservation requirements, the problem is even more overwhelming, as they lack the accumulated experience of the former. Yet being able to estimate...Hole, Brian ; Lin, Li ; McCann, Patrick ; Wheatley, Paul
-
Conference paper (published)
LIFE3: Predicting Long Term Digital Preservation Costs
As we develop our ability to preserve digital collections through techniques such as migration and emulation, the decision process of what action to take and when to take it becomes increasingly complex. Cost is a crucial factor to consider but the financial implications of preservation planning decisions are not typically...Wheatley, Paul ; Hole, Brian
-
Conference paper (published)
Implementing metadata that guides digital preservation services
Effective digital preservation depends on a set of preservation services that work together to ensure that digital objects can be preserved for the long-term. These services need digital preservation metadata, in particular, descriptions of the properties that digital objects may have and descriptions of the requirements that guide digital preservation...Dappert, Angela ; Farquhar, Adam
-
Conference paper (published)
A framework for distributed preservation workflows
The Planets project is developing a service-oriented environment for the definition and evaluation of preservation strategies for human-centric data. It focuses on the question of logically preserving digital materials, as opposed to the physical preservation of content bit-streams. This includes the development of preservation tools for the automated characterization, migration,...Schmidt, Rainer ; King, Ross ; Steeg, Fabian ; Melms, Peter ; Jackson, Andrew …
-
Conference paper (published)
Deal with conflict, capture the relationship: the case of digital object properties
Properties of digital objects play a central role in digital preservation. All key preservation services are linked via a common understanding of the properties which describe the digital objects in a repository's care. Unfortunately, different services deal with properties on sometimes different levels of description. While, for example, a preservation...Dappert, Angela
-
Conference paper (published)
A METS based information package for long term accessibility of web archives
The British Library’s web archive comprises several terabyte of harvested websites. Like other content streams this data should be ingested into the library’s central preservation repository. The repository requires a standardized Submission- and Archival Information Package. Harvested Websites are stored in Archival Information Packages (AIP). Each AIP is described by...Enders, Markus
-
Conference paper (published)
Using Automated Dependency Analysis to generate representation information
To preserve access to digital content, we must preserve the representation information that captures the intended interpretation of the data. In particular, we must be able to capture performance dependency requirements, i.e. to identify the other resources that are required in order for the intended interpretation to be constructed successfully....Jackson, Andrew
-
Conference paper (published)
ArchivePress: A Really Simple Solution to Archiving Blog Content
Blog archiving and preservation is not a new challenge. Current solutions are commonly based on typical web archiving activities, whereby a crawler is configured to harvest a copy of the blog and return the copy to a web archive. Yet this is not the only solution, nor is it always...Pennock, Maureen ; Davis, Richard
-
Conference paper (published)
Considerations on the acquisition and preservation of ebook mobile apps
In 2018 and 2019, as part of the UK Legal Deposit Libraries’ sponsored ‘Emerging Formats’ project, the British Library’s digital preservation team undertook a program of research into the preservation of new forms of content. One of these content types was eBooks published as Mobile Apps. Research considered a relatively...Pennock, Maureen ; May, Peter ; Day, Michael
access, mobile apps, acquisition, digital preservation, and preservation
-
Conference paper (published)
Are you ready? Assessing whether organisations are prepared for digital preservation
In the last few years digital preservation has started to transition from a theoretical discipline to one where real solutions are beginning to be used. The Planets project has analyzed the readiness of libraries, archives and related organizations to begin to use the outputs of various digital preservation initiatives (and,...Sinclair, Pauline ; Billenness, Clive ; Duckworth, James ; Farquhar, Adam ; Humphreys, Jane …
-
Conference paper (published)
People mashing: Agile digital preservation and the AQuA Project
Manual quality assurance (QA) of digitised content is typically fallible and can result in collections that are marred by a variety of quality and access issues. Poor storage conditions, technology obsolescence and other unforeseen problems can also leave digital objects in an unusable state. Detecting, identifying and ultimately fixing these...Wheatley, Paul ; Middleton, Bo ; Double, Jodie ; Jackson, Andrew ; McGuinness, Rebecca
-
Conference paper (published)
Practical analysis of TIFF file size reductions achievable through compression
This paper presents results of a practical analysis into the effects of three main lossless TIFF compression algorithms – LZW, ZIP and Group 4 – on the storage requirements for a small set of digitized materials. In particular we are interested in understanding which algorithm achieves a greater reduction in...May, Peter ; Davies, Kevin
LZW, Group 4, LibTiff, TIFF, ZIP, compression, and ImageMagick
-
Conference paper (published)
Not just a British library: enabling a global discovery experience
Within the walls of the British Library lies one of the greatest collections in the world. However, the value of the British Library lies not only in the preservation of heritage items, but also in its determination to keep pace with the many changes in the global information environment. As...Flanagan, Dimity
open access; repositories; discovery; persistent identifiers; text and data mining; digitisation
-
Conference paper (published)
Issues raised by a 'rap' translation of a poem by Velimir Khlebnikov 'Kamennaia baba'
Chadwick, Brian
-
Conference paper (published)
Early printed books containing Andean languages in the British Library, London
West, Geoffrey
-
Conference paper (published)
Atlantic crossings: the trade in Latin American books in Europe in the nineteenth century
West, Geoffrey
-
Conference paper (published)
Where are they now? The dispersal of Spanish printed book collections, 1810-1850
West, Geoffrey
-
Conference paper (published)
Adventures with ePub3: when rendering goes wrong
The role of standards in digital preservation is widely acknowledged. The current version of the ePub standard, used for publishing and disseminating eBooks, is ePub3, specifically 3.1 (January 2017). A marked difference from ePub2 is support for fixed layout files and, whilst several different ePub readers are available, not all...Pennock, Maureen ; Day, Michael
-
Conference paper (published)
Preservation planning for emerging formats at the British Library
The British Library and the other UK Legal Deposit Libraries have been collecting various forms of born-digital digital publications since 2013 as part of what is known as Non-Print Legal Deposit (NPLD). In 2017, the UK Legal Deposit Libraries established an Emerging Formats project to look at selected types of...Day, Michael ; Pennock, Maureen ; Smith, Caylin ; Jenkins, Jeremy ; Cooke, Ian
-
Conference paper (published)
Co-operation for digital preservation and curation: Collaboration for collection development in institutional repository network
The digital preservation problem is a series of interrelated technical and organizational challenges that can only be met co-operatively by the many different stakeholders that are involved. The rise of the institutional repository paradigm backs this up with its focus on co-operation within national or subject-based networks and the wider...Day, Michael ; Pennock, Maureen ; Allinson, Julie
-
Conference paper (published)
Implementing Digital Preservation Strategy: Developing content collection profiles at the British Library
The British Library is increasingly a digital library. Through both digitization and acquisition, it has built up significant collections of digital content covering a very wide range of content types. Most recently, the extension of legal deposit provisions to non-print works in 2013 has meant that it - working in...Day, Michael ; McDonald, Ann ; Pennock, Maureen ; Kimura, Akiko
content management, digital libraries, content collection profiles, Internet, and digital preservation
-
Conference paper (published)
The Flashback Project: Rescuing disk-based content from the 1980s to the present day
This paper introduces the British Library's Flashback project, a proof-of-concept that explored the practical challenges of preserving digital content stored on physical media (magnetic and optical disks) using a sample of content from hybrid collection items dating from between 1980 and 2010. It describes some of the activities undertaken by...Pennock, Maureen ; May, Peter ; Day, Michael ; Davies, Kevin ; Whibley, Simon …
-
Conference paper (published)
Archiving web site resources: a records management view
In this paper, we propose the use of records management principles to identify and manage Web site resources with enduring value as records. Current Web archiving activities, collaborative or organisational, whilst extremely valuable in their own right, often do not and cannot incorporate requirements for proper records management. Material collected...Pennock, Maureen ; Kelly, Brian
-
Conference paper (published)
Sustainability assessments at the British Library: Formats, frameworks and findings
File format assessments have been the subject of much debate in and outside of the preservation community in the past decade. Recognizing the unique structural, operational, and collecting context of the British Library, the Library’s digital preservation team recently initiated new format assessment work to deliver recommendations on which file...Pennock, Maureen ; Wheatley, Paul ; May, Peter
file formats, British Library, assessments, transparency, sustainability, and preservation master
-
Conference paper (published)
Identifying digital preservation requirements: Digital Preservation Strategy and collection profiling at the British Library
The British Library is increasingly a digital library. Over past decades, it has built up significant collections of digital content covering a very wide range of content types. In addition to the increasing amounts of digital content acquired by purchase or donation, the Library and its partners have also invested...Day, Michael ; McDonald, Ann ; Kimura, Akiko ; Pennock, Maureen
preservation planning, institutional contexts of preservation, collection content profiling, and digital preservation
-
Conference paper (published)
Arabic dialect identification in the context of bivalency and code-switching
In this paper we use a novel approach towards Arabic dialect identification using language bivalency and written code-switching. Bivalency between languages or dialects is where a word or element is treated by language users as having a fundamentally similar semantic content in more than one language or dialect. Arabic dialect...El-Haj, Mahmoud ; Rayson, Paul ; Aboelezz, Mariam
Arabic, machine learning, dialects, language identification, NLP, and bivalency
-
Conference paper (published)
Latinised Arabic and connections to bilingual ability
As software support for non-Latin scripts is becoming more readily available, the continuing use of Latinised forms in online discourse highlights an interesting phenomenon. This paper focuses on Latinised Arabic (LA) as one manifestation of this trend. While there appears to be significant variation in the conventions used to Latinise...Aboelezz, Mariam
-
Conference paper (published)
Prospects for a Big Data History of Music
This position paper sets out the possibility of a musicology based on the analysis of musical-bibliographical metadata as Big Data. It outlines the work underway, as part of the AHRC-funded project A Big Data History of Music, to align seven major datasets of musical-bibliographical metadata. After discussing some of the...Rose, Stephen ; Tuppen, Sandra