Automated Language Identification of Bibliographic Resources - British Library Research Repository
Shared Research Repository
Journal article

Automated Language Identification of Bibliographic Resources

20 December 2019

Abstract

This article describes experiments in the use of machine learning techniques at the British Library to assign language codes to catalog records, in order to provide information about the language of content of the resources described. In the first phase of the project, language codes were assigned to 1.15 million records with 99.7% confidence. The automated language identification tools developed will be used to contribute to future enhancement of over 4 million legacy records.

Files

There is 1 file associated with this work, which is available for download.

Metadata