Ricerca
Risultati della ricerca
-
Conference paper (published)
Risk Assessment; using a risk based approach to prioritise handheld digital information
The British Library (BL) Digital Library Programme (DLP) has a broad set of objectives to achieve over the next few years, from web-archiving to the ingest of e-journals through to mass digitisation of newspapers and books. These projects are decided by the DLP programme board and are managed by the...McLeod, Rory
-
Conference paper (published)
Modeling Organizational Preservation Goals to Guide Digital Preservation
Digital preservation activities can only succeed if they go beyond the technical properties of digital objects. They must consider the strategy, policy, goals, and constraints of the institution that undertakes them and take into account the cultural and institutional framework in which data, documents and records are preserved. Furthermore, because...Dappert, Angela ; Farquhar, Adam
-
Conference paper (published)
Costing the Digital Preservation Lifecycle More Effectively
Having confidence in the permanence of a digital resource requires a deep understanding of the preservation activities that will need to be performed throughout its lifetime and an ability to plan and resource for those activities. The LIFE (Lifecycle Information For E-Literature) and LIFE2 Projects have advanced understanding of the...Wheatley, Paul
-
Conference paper (published)
Adapting Existing Technologies for Digitally Archiving Personal Lives. Digital Forensics, Ancestral Computing, and Evolutionary Perspectives and Tools
The adoption of existing technologies for digital curation, most especially digital capture, is outlined in the context of personal digital archives and the Digital Manuscripts Project at the British Library. Technologies derived from computer forensics, data conversion and classic computing, and evolutionary computing are considered. The practical imperative of moving...John, Jeremy Leighton
-
Conference paper (published)
Using METS, PREMIS and MODS for Archiving eJournals: Paper - iPRES 2008 - London
As institutions turn towards developing archival digital repositories, many decisions on the use of metadata have to be made. In addition to deciding on the more traditional descriptive and administrative metadata, particular care needs to be given to the choice of structural and preservation metadata, as well as to integrating...Dappert, Angela ; Enders, Marcus
-
Conference paper (published)
Deal with conflict, capture the relationship: the case of digital object properties
Properties of digital objects play a central role in digital preservation. All key preservation services are linked via a common understanding of the properties which describe the digital objects in a repository's care. Unfortunately, different services deal with properties on sometimes different levels of description. While, for example, a preservation...Dappert, Angela
-
Conference paper (published)
A METS based information package for long term accessibility of web archives
The British Library’s web archive comprises several terabyte of harvested websites. Like other content streams this data should be ingested into the library’s central preservation repository. The repository requires a standardized Submission- and Archival Information Package. Harvested Websites are stored in Archival Information Packages (AIP). Each AIP is described by...Enders, Markus
-
Conference paper (published)
LIFE3: A predictive costing tool for digital collections
Predicting the costs of long-term digital preservation is a crucial yet complex task for even the largest repositories and institutions. For smaller projects and individual researchers faced with preservation requirements, the problem is even more overwhelming, as they lack the accumulated experience of the former. Yet being able to estimate...Hole, Brian ; Lin, Li ; McCann, Patrick ; Wheatley, Paul
-
Conference paper (published)
Capturing and replaying streaming media in a web archive – a British Library case study
A prerequisite for digital preservation is to be able to capture and retain the content which is considered worth preserving. This has been a significant challenge or web archiving, especially for websites with embedded streaming media content, which cannot be copied via a simple HTTP request to a URL. This...Hockx-Yu, Helen ; Crawford, Lewis ; Coram, Roger ; Johnson, Stephen
-
Conference paper (published)
A Latinised Arabic for All? Issues of representation, purpose and audience
This paper reviews two major issues which account for much of the variation in representing Arabic using Latin characters. Since the Latinisation of Arabic entails encoding additional phonetic information(by adding short vowels), how we choose to represent Arabic for Latinisation becomes a central issue. This representation may either reflect the...Aboelezz, Mariam
-
Conference paper (published)
Developing a robust migration workflow for preserving and curating hand-held media
Many memory institutions hold large collections of hand-held media, which can comprise hundreds of terabytes of data spread over many thousands of data-carriers. Many of these carriers are at risk of significant physical degradation over time, depending on their composition. Unfortunately, handling them manually is enormously time consuming and so...Dappert, Angela ; Jackson, Andrew ; Kimura, Akiko
disk-copying robot, iPRES, data-carrier stabilization, auto loader, and digital preservation
-
Conference paper (published)
An analysis of contemporary JPEG2000 codecs for image format migration
This paper presents results of an analysis of different implementations of the JPEG2000 standard, specifically part 1: JP2, an image format that is currently popular within the digital preservation community. In particular we are interested in the effect different JPEG2000 codecs (encoders and decoders) have on image quality in response...Palmer, William ; May, Peter ; Cliff, Peter
TIFF, image quality, generational loss, JPEG2000, migration, codec, and PSNR
-
Conference paper (published)
Quality assured image file format migration in large digital object repositories
This article gives an overview on how different components developed by the SCAPE project are intended to be used in composite file format migration workflows; it will explain how the SCAPE platform can be employed to make sure that the workflows can be used to migrate very large image collections...Schlarb, Sven ; Cliff, Peter ; May, Peter ; Palmer, William ; Hahn, Matthias …
-
Conference paper (published)
Sustainability assessments at the British Library: Formats, frameworks and findings
File format assessments have been the subject of much debate in and outside of the preservation community in the past decade. Recognizing the unique structural, operational, and collecting context of the British Library, the Library’s digital preservation team recently initiated new format assessment work to deliver recommendations on which file...Pennock, Maureen ; Wheatley, Paul ; May, Peter
file formats, British Library, assessments, transparency, sustainability, and preservation master
-
Conference paper (published)
Identifying digital preservation requirements: Digital Preservation Strategy and collection profiling at the British Library
The British Library is increasingly a digital library. Over past decades, it has built up significant collections of digital content covering a very wide range of content types. In addition to the increasing amounts of digital content acquired by purchase or donation, the Library and its partners have also invested...Day, Michael ; McDonald, Ann ; Kimura, Akiko ; Pennock, Maureen
preservation planning, institutional contexts of preservation, collection content profiling, and digital preservation
-
Conference paper (published)
Prospects for a Big Data History of Music
This position paper sets out the possibility of a musicology based on the analysis of musical-bibliographical metadata as Big Data. It outlines the work underway, as part of the AHRC-funded project A Big Data History of Music, to align seven major datasets of musical-bibliographical metadata. After discussing some of the...Rose, Stephen ; Tuppen, Sandra
-
Conference paper (published)
Practical analysis of TIFF file size reductions achievable through compression
This paper presents results of a practical analysis into the effects of three main lossless TIFF compression algorithms – LZW, ZIP and Group 4 – on the storage requirements for a small set of digitized materials. In particular we are interested in understanding which algorithm achieves a greater reduction in...May, Peter ; Davies, Kevin
LZW, Group 4, LibTiff, TIFF, ZIP, compression, and ImageMagick
-
Conference paper (published)
The Flashback Project: Rescuing disk-based content from the 1980s to the present day
This paper introduces the British Library's Flashback project, a proof-of-concept that explored the practical challenges of preserving digital content stored on physical media (magnetic and optical disks) using a sample of content from hybrid collection items dating from between 1980 and 2010. It describes some of the activities undertaken by...Pennock, Maureen ; May, Peter ; Day, Michael ; Davies, Kevin ; Whibley, Simon …
-
Conference paper (published)
Arabic dialect identification in the context of bivalency and code-switching
In this paper we use a novel approach towards Arabic dialect identification using language bivalency and written code-switching. Bivalency between languages or dialects is where a word or element is treated by language users as having a fundamentally similar semantic content in more than one language or dialect. Arabic dialect...El-Haj, Mahmoud ; Rayson, Paul ; Aboelezz, Mariam
Arabic, machine learning, dialects, language identification, NLP, and bivalency
-
Conference paper (published)
Adventures with ePub3: when rendering goes wrong
The role of standards in digital preservation is widely acknowledged. The current version of the ePub standard, used for publishing and disseminating eBooks, is ePub3, specifically 3.1 (January 2017). A marked difference from ePub2 is support for fixed layout files and, whilst several different ePub readers are available, not all...Pennock, Maureen ; Day, Michael
-
Conference paper (published)
Preservation planning for emerging formats at the British Library
The British Library and the other UK Legal Deposit Libraries have been collecting various forms of born-digital digital publications since 2013 as part of what is known as Non-Print Legal Deposit (NPLD). In 2017, the UK Legal Deposit Libraries established an Emerging Formats project to look at selected types of...Day, Michael ; Pennock, Maureen ; Smith, Caylin ; Jenkins, Jeremy ; Cooke, Ian
-
Conference paper (published)
Considerations on the acquisition and preservation of ebook mobile apps
In 2018 and 2019, as part of the UK Legal Deposit Libraries’ sponsored ‘Emerging Formats’ project, the British Library’s digital preservation team undertook a program of research into the preservation of new forms of content. One of these content types was eBooks published as Mobile Apps. Research considered a relatively...Pennock, Maureen ; May, Peter ; Day, Michael
access, mobile apps, acquisition, digital preservation, and preservation
-
Conference paper (published)
Not just a British library: enabling a global discovery experience
Within the walls of the British Library lies one of the greatest collections in the world. However, the value of the British Library lies not only in the preservation of heritage items, but also in its determination to keep pace with the many changes in the global information environment. As...Flanagan, Dimity
open access; repositories; discovery; persistent identifiers; text and data mining; digitisation
-
Conference paper (published)
The Integrated Preservation Suite: Scaled and automated preservation planning for highly diverse digital collections (long paper)
The Integrated Preservation Suite is an internally funded project at the British Library to develop and enhance the Library's preservation planning capabilities, largely focussed on automation and addressing the Library's heterogeneous collections. Through agile development practices, the project is iteratively designing and implementing the technical infrastructure for the suite as...May, Peter ; Pennock, Maureen ; Russo, David
software preservation, knowledge base, preservation watch, and preservation planning
-
Conference paper (published)
Cross-disciplinary Collaborations to Enrich Access to Non-Western Language Material in the Cultural Heritage Sector
The British Library is home to millions of items representing every age of written civilisation, including books, manuscripts and newspapers in all written languages. Large digitisation programmes currently underway are opening up access to this rich and unique historical content on an ever increasing scale. However, particularly for historical material...Derrick, Tom ; McGregor, Nora
HTR, page analysis, layout analysis, recognition, Bangla script, Arabic script, OCR, and datasets
-
Conference paper (published)
Dawn of Digital Repositories Certification under ISO 16363. Exploring the Horizon and beyond
The dawn of Trustworthy Digital Repository Certification under the ISO 16363:2012 standard is on the horizon. Across the digital preservation community, institutions are eager to learn more about the processes of preparing for and undergoing an ISO 16363 audit from an accredited third-party organization. As the first ISO 16363 audits...Giaretta, David ; LaPlant, Lisa ; Shiers, Jamie ; Tieman, Jessica ; Pennock, Maureen …
repository, certification, trustworthy, audit, and standards
-
Conference paper (published)
Preserving eBooks: Past, Present and Future - A Series of National Library Perspectives
This panel will present and discuss different eBook workflows and challenges from four national libraries, considering a range of issues from technical complexities to evolution of the content type and changes in the publishing/collecting landscape.Owens, Trevor ; Pennock, Maureen ; Smyth, Tom ; Steinke, Tobias
access, ingest, ebooks, digital preservation, formats, and scale
-
Conference paper (published)
Resolving places, past and present: toponym resolution in historical British newspapers using multiple resources
Newspapers and their metadata are richly geographical, not only in their distribution but also their content. Attending to these spatial features is a prerequisite in newspaper research. Following other projects to have geoparsed place names in newspapers, we describe our approach to linking historical geospatial information in text to real-world...Coll Ardanuy, Mariona ; McDonough, Katherine ; Krause, Amrey ; Wilson, Daniel C.S. ; Hosseini, Kasra …
-
Conference paper (published)
Archiving Interactive Narratives at the British Library
This paper describes the creation of the Interactive Narratives collection in the UK Web Archive, as part of the UK Legal Deposit Libraries Emerging Formats Project. The aim of the project is to identify, collect and preserve complex digital publications that are in scope for collection under UK Non-Print Legal...Clark, Lynda ; Rossi, Giulia Carla ; Wisdom, Stella
Emerging Formats, digital storytelling, new media collection management, Interactive Narratives collection, digital preservation, and web archiving
-
Conference paper (published)
Developing an Open-Source Corpus of Yoruba Speech
This paper introduces an open-source speech dataset for Yoruba — one of the largest low-resource West African languages spoken by at least 22 million people. Yoruba is one of the official languages of Nigeria, Benin and Togo, and is spoken in other neighboring African countries and beyond. The corpus consists...Gutkin, Alexander ; Demirşahin, Işın ; Kjartansson, Oddur ; Rivera, Clara ; Túbọ̀sún, Kọ́lá
-
Conference paper (published)
DeezyMatch: A Flexible Deep Learning Approach to Fuzzy String Matching
We present DeezyMatch, a free, open-source software library written in Python for fuzzy string matching and candidate ranking. Its pair classifier supports various deep neural network architectures for training new classifiers and for fine-tuning a pretrained model, which paves the way for transfer learning in fuzzy string matching. This approach...Hosseini, Kasra ; Nanni, Federico ; Coll Ardanuy, Mariona
Natural Language Processing, string matching, toponym matching, machine learning, and digital humanities
-
Conference paper (published)
Living Machines: A study of atypical animacy
This paper proposes a new approach to animacy detection, the task of determining whether an entity is represented as animate in a text. In particular, this work is focused on atypical animacy and examines the scenario in which typically inanimate objects, specifically machines, are given animate attributes. To address it,...Coll Ardanuy, Mariona ; Nanni, Federico ; Beelen, Kaspar ; Hosseini, Kasra ; Ahnert, Ruth …
nineteenth-century English, living machines, BERT, and animacy
-
Conference paper (published)
Towards a Foundation for Collaborative Digital Archiving with Local Concert-Giving Organisations
The centenaries of former chapters of the British Music Society (BMS), established in 1918, have prompted their governing bodies to take stock of their histories and build on the cataloguing, documentation and preservation of their archival collections. The InterMusE project aims to support this shared instinct to archive by capturing... -
Conference paper (published)
Station to Station: Linking and Enriching Historical British Railway Data
The transformative impact of the railway on nineteenth-century British society has been widely recognized, but understanding that process at scale remains challenging because the Victorian rail network was both vast and in a state of constant flux. Michael Quick’s reference work Railway Passenger Stations in Great Britain: a Chronology offers...Coll Ardanuy, Mariona ; Beelen, Kaspar ; Lawrence, Jon ; McDonough, Katherine ; Nanni, Federico …
-
Conference paper (published)
Exploring Software, Tools and Methods used in Web Archive Research
This paper is one part of a larger research project, titled, Web Archives - Researcher Skills and Tools (WARST). In this poster we focus on the data from the WARST study which examines the software, tools and methods used in the web archive research lifecycle.Schmid, Katharina ; Healy, Sharon ; Byrne, Helena
web archiving, web archive research, web archive users, and web archive creators
-
Conference paper (published)
Design Patterns in Digital Preservation: Understanding Information Flows
This paper proposes a framework to help understand the different ways digital preservation goals can achieved, and the contextual factors these choices depend on. This is done through a worked example: three different design patterns representing the three possible modes of archival information flow, each illustrated with realistic examples and...Jackson, Andrew N
OAIS, design patterns, community, risk management, and innovation
-
Conference paper (published)
Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0
In this work, we explore whether the recently demonstrated zero-shot abilities of the T0 model extend to Named Entity Recognition for out-of-distribution languages and time periods. Using a historical newspaper corpus in 3 languages as test-bed, we use prompts to extract possible named entities. Our results show that a naive...De Toni, Francesco ; Akiki, Christopher ; De La Rosa, Javier ; Fourrier, Clémentine ; Manjavacas, Enrique …
-
Conference paper (published)
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset
As language models grow ever larger, the need for large-scale high-quality text datasets has never been more pressing, especially in multilingual settings. The BigScience workshop, a 1-year international and multidisciplinary initiative, was formed with the goal of researching and training large language models as a values-driven undertaking, putting issues of...Laurençon, Hugo ; Saulnier, Lucile ; Wang, Thomas ; Akiki, Christopher ; Villanova del Moral, Albert …