Morphological Parser for Inflectional Languages via Deep Learning

This research project Morphological Parser for Inflectional Languages Using Deep Learning started as an eScience Center Open Small Scale Initiative (SSI) project, focusing on the question: How can we make a Machine Learning based parser for the morphology of inflectional languages?

This question is often neglected because of the dominance of English and other Indo-European analytic languages in computational linguistics.

The project employs the morphological encoding developed at the Eep Talstra Centre for Bible and Computer (ETCBC). This encoding results in a concisely structured string. This string contains the letters of the Hebrew phrase in transliterated format, to which the encodings are added in-line, rather than as flags or footnotes. This means that the process of the linguistic encoding can be conceptualized as the transformation of one string into another, which renders sequence-to-sequence (seq2seq) models apt for the morphological parsing of inflectional languages.

After the initial project funded by the Netherlands eScience project, we continued our efforts to improve the predictions for the morphological encoding generated by the AI models. Our current focus is on Syriac texts. Thanks to support from the AI-generated encodings are manually revised by Gegham Bdoyan, Matthias Benadellah and Logan Copley, utilizing Qoroyo, a tool developed by Yusuf Çelik.

More about this Research Project

Start/end Date

Ongoing since 1 May 2024

Team

Leader: Willem Th. van Peursen
Development team: Yusuf Çelik, Mathias Coeckelbergs (2021–2022), Martijn Naaijer and Constantijn Sikkel; eScience Center support (2021–2022): Jisk Attema and Dafne van Kuppevelt; correction of AI-generated encodings (from 2023): Gegham Bdoyan, Matthias Benabdellah, Logan Copley (from 2023).

Fund

Funding provided by the eScience Center, the Peshitta Foundation, Brill Publisher and the Charis Foundation

Websites

GitHub - ETCBC/ssi_morphology: Contains stuff for the Hebrew/Syriac morphology project of the Escience center/ETCBC

blogpost 1

blogpost 2

Publication: A Transformer-based parser for Syriac morphology - ACL Anthology

Morphological Parser for Inflectional Languages via Deep Learning

More about this Research Project

About this research

Quick links

Study

Featured

About VU Amsterdam