A Comparative Study of MBart and Alternative Transformer Models for Kumauni Language Translation

Neelesh Kumar Tanwar; Atul Joshi; Ankur Singh Bist

doi:10.56557/ajomcor/2025/v32i39482

A Comparative Study of MBart and Alternative Transformer Models for Kumauni Language Translation

PDF Review History Discussion

Published: 2025-07-10

DOI: 10.56557/ajomcor/2025/v32i39482

Page: 147-161

Issue: 2025 - Volume 32 [Issue 3]

Neelesh Kumar Tanwar

IIT Kharagpur, India.

Atul Joshi *

Graphic Era Hill University, Bhimtal Campus, India.

Ankur Singh Bist

Graphic Era Hill University, Bhimtal Campus, India.

*Author to whom correspondence should be addressed.

Abstract

The archiving and computational treatment of so-called low-resource language sets pose daunting challenges for NLP. This research look into applying the latest and greatest multilingual transformer architectures for the Kumaoni translation machine, Kumaoni being an Indo-Aryan language spoken in Northern India and, therefore, problematic from a digital resource point of view. Because of the closeness among Kumaoni and Hindi, Hindi is used as a proxy for training the model and for transferring the model, which makes for a major methodological consideration. Performance of MBart (Multilingual Denoising Pre-training for Neural Machine Translation) is tested against other transformer models, MarianMT and mT5, using a custom parallel dataset with roughly [insert dataset size] sentence pairs. The various evaluation metrics employed are BLEU, ROUGE-L, and TER. Results show that MBart performs better than baselines in BLEU, with an absolute gain of 2.45 points over MarianMT and almost 4 points over mT5. Although MBart outperforms the baseline systems in BLEU score, it is expected that its fluency and degree of error reduction will still be improved through additional experiments with larger datasets and further fine-tuning. These developments have shown that multilingual pre-training and cross-lingual transfer hold promise for low-resource translation techniques and introduce a replicable framework intended to further NLP for other poorly resourced languages.

Keywords: Machine translation, MBart, alternative transformer models, kumauni language translation

How to Cite

Tanwar, Neelesh Kumar, Atul Joshi, and Ankur Singh Bist. 2025. “A Comparative Study of MBart and Alternative Transformer Models for Kumauni Language Translation”. Asian Journal of Mathematics and Computer Research 32 (3):147-61. https://doi.org/10.56557/ajomcor/2025/v32i39482.

Downloads

Download data is not yet available.