Cross-disciplinary Collaborations to Enrich Access to Non-Western Language Material in the Cultural Heritage Sector - British Library Research Repository
Shared Research Repository
Conference paper (published)

Cross-disciplinary Collaborations to Enrich Access to Non-Western Language Material in the Cultural Heritage Sector

2019

Abstract

The British Library is home to millions of items representing every age of written civilisation, including books, manuscripts and newspapers in all written languages. Large digitisation programmes currently underway are opening up access to this rich and unique historical content on an ever increasing scale. However, particularly for historical material written in non-Latin scripts, enabling enriched full-text discovery and analysis across the digitised output, something which would truly transform access and scholarship, is still out of reach. This is due in part to commercial text recognition solutions currently on the market today having largely been optimised for modern documents and Latin scripts. This paper will report on a series of initiatives undertaken by the British Library to investigate, evaluate and support new research into enhancing text recognition capabilities for two major digitised collections of non-Western language collections: printed Bangla and handwritten Arabic. It seeks to present lessons learned and opportunities gained from cross-disciplinary collaboration between the cultural heritage sector and researchers working at the cutting edge of text recognition, with a view towards informing and encouraging future such partnerships.

Files

There is 1 file associated with this work, which is available for download.

Metadata

  • Resource type

    Conference paper (published)

  • Institution
    • British Library

  • Organisational unit
    • Digital Scholarship

  • Event title
    • DATeCH2019: 3rd International Conference on Digital Access to Cultural Textual Heritage

  • Event location
    • Brussels, Belgium

  • Book title
    • DATeCH2019 Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage

  • Pagination
    • 111-116

  • Publisher
    • ACM Press

  • ISBN
    • 9781450371940

  • Official URL
  • Rights statement
  • DOI
    • doi.org/10.1145/3322905.3322907

  • Keywords
  • Additional information
    • The attached file is the authors' accepted manuscript of this paper.