Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0
上市Deposited
Creator
De Toni, Francesco
Akiki, Christopher
de la Rosa, Javier
Fourrier, Clémentine
Manjavacas, Enrique
Schweter, Stefan
van Strien, Daniel
()
添加到收藏
您无权访问任何现有集合。您可以创建一个新集合。
Abstract
In this work, we explore whether the recently demonstrated zero-shot abilities of the T0 model extend to Named Entity Recognition for out-of-distribution languages and time periods. Using a historical newspaper corpus in 3 languages as test-bed, we use prompts to extract possible named entities. Our results show that a naive approach for prompt-based zeroshot multilingual Named Entity Recognition is error-prone, but highlights the potential of such an approach for historical languages lacking labeled datasets. Moreover, we also find that T0-like models can be probed to predict the publication date and language of a document, which could be very relevant for the study of historical texts.