Skip to content

Latest commit

 

History

History
64 lines (46 loc) · 3.07 KB

File metadata and controls

64 lines (46 loc) · 3.07 KB

Nesciun Lengaz Lascià Endò: Machine Translation for Fassa Ladin

MIT License

This repository contains code and data associated with the CLiC-it 2024 paper:

Giovanni Valer, Nicolò Penzo and Jacopo Staiano. 2024. Nesciun Lengaz Lascià Endò: Machine Translation for Fassa Ladin. In Proceedings of the Tenth Italian Conference on Computational Linguistics, Pisa, Italy. [cite] [paper]

Overview

Introduction

We built the first Fassa Ladin-Italian-English parallel corpus, and trained a machine translation model on it.

🔥 English → Ladin demo

You can try translating text from English/Italian to Fassa Ladin using the model on Hugging Face Spaces 🦀

Data

The dataset draws from multiple resources in 5 different domains: literature, news, games, laws, and brochures. It is available in the data directory, either as a single file or split into train, validation, in-domain test, and out-of-domain test sets.

Experiments

Preliminary Experiments

Open In Colab

Evaluate the performance of the pre-trained models.

Finetuning

Open In Colab

Fine-tune the pre-trained models on the Fassa Ladin-Italian-English parallel corpus, with the two approaches: Multilingual translation and Zero-shot Pivot-based transfer learning.

Evaluation

Open In Colab

Evaluate the models' performance, investigate Transfer learning across domains, and Forgetting of previous knowledge.

Citation

If you use or build on top of this work, please cite our paper as follows:

@inproceedings{valer-etal-2024-nesciun,
    title={Nesciun Lengaz Lascià Endò: {M}achine Translation for {F}assa {L}adin},
    author={Valer, Giovanni and Penzo, Nicolò and Staiano, Jacopo},
    booktitle={Proceedings of the 10th Italian Conference on Computational Linguistics},
    publisher={CEUR-ws.org},
    year={2024},
    month={december},
    address={Pisa, Italy}
}