Spaces:
Configuration error
Configuration error
File size: 2,081 Bytes
b73eb26 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | ## Introduction
LLMTrad-IBE is a strategic research initiative dedicated to overcoming the digital divide affecting the minority Romance languages of the Iberian Peninsula. By leveraging state-of-the-art Natural Language Processing (NLP), we aim to ensure these languages are not left behind in the era of Artificial Intelligence.
This project is a key component of the AI-TraLow coordinated framework (AI-Driven Translation for Low-Resource Languages and Cultures), supported by the Spanish Ministry of Science, Innovation, and Universities (MCIU/AEI/10.13039/501100011033/FEDER, UE) under reference PID2024-158157OB-C33.
## Mission and Scope
Our research focuses on the development, adaptation, and evaluation of Large Language Models (LLMs) for four specific linguistic varieties characterized by limited digital resources:
* Asturian
* Aragonese
* Aranese
* Eonavian
## Strategic Research Areas
We employ a hybrid methodology that integrates the structural precision of symbolic systems with the generative power of neural architectures:
* LLM Specialization: Fine-tuning decoder-only architectures and exploring parameter-efficient strategies (PEFT) for translation.
* Knowledge Distillation: Developing compact and efficient models to facilitate sustainable deployment in standard computing environments.
* Resource Synthesis: Expanding Apertium-based lexical resources and curating high-quality benchmarks, including FLORES+ and NTREX adaptations.
* Ethical AI: Implementing rigorous evaluation frameworks to detect and mitigate gender bias and ensure linguistic authenticity.
## Collaborative Network
LLMTrad-IBE thrives on the synergy between leading academic institutions:
* Universitat Oberta de Catalunya (UOC) — Coordinating Institution
* Universitat Autònoma de Barcelona (UAB)
* Universidad de Oviedo
* Universidad de Zaragoza
## Commitment to Open Science
As part of our commitment to the scientific community and linguistic heritage, all models, datasets, and tools developed within this project are released under permissive open-source licenses.
|