This question is often neglected because of the dominance of English and other Indo-European analytic languages in computational linguistics.
The project employs the morphological encoding developed at the Eep Talstra Centre for Bible and Computer (ETCBC). This encoding results in a concisely structured string. This string contains the letters of the Hebrew phrase in transliterated format, to which the encodings are added in-line, rather than as flags or footnotes. This means that the process of the linguistic encoding can be conceptualized as the transformation of one string into another, which renders sequence-to-sequence (seq2seq) models apt for the morphological parsing of inflectional languages.
After the initial project funded by the Netherlands eScience project, we continued our efforts to improve the predictions for the morphological encoding generated by the AI models. Our current focus is on Syriac texts. Thanks to support from the AI-generated encodings are manually revised by Gegham Bdoyan, Matthias Benadellah and Logan Copley, utilizing Qoroyo, a tool developed by Yusuf Çelik.
More about this Research Project
Start/end Date
Ongoing since 1 May 2024
Team
Leader: Willem Th. van Peursen
Development team: Yusuf Çelik, Mathias Coeckelbergs (2021–2022), Martijn Naaijer and Constantijn Sikkel; eScience Center support (2021–2022): Jisk Attema and Dafne van Kuppevelt; correction of AI-generated encodings (from 2023): Gegham Bdoyan, Matthias Benabdellah, Logan Copley (from 2023).
Fund
Funding provided by the eScience Center, the Peshitta Foundation, Brill Publisher and the Charis Foundation
Websites
Publication: A Transformer-based parser for Syriac morphology - ACL Anthology