MT5-es-to-quy
This model was trained using synthetic data across several iterations of State-of-the-art Spanish to Quechua (Y) datasets.
Model Details
Model Description
The model presents a Test Bleu: 2.0391 and Test chrF: 22.6785 which was tested into another dataset (contest). Our validation results (Synthetic) were Bleu: 11.0126 and Validation chrF: 31.8344.
- Developed by:
- Julio Santisteban Pablo
- Ricardo Lazo Vasquez
- Shared by [optional]: Julio Santisteban Pablo
- Model type: Translate
- Language(s) (NLP): Spanish to Quechua (Y)
- License: Apache 2.0
- FinetuFine-tunedmodel: MT5
Model Sources
- Repository: Julio/mt5-es-to-quy
Uses
The overall goal is to improve translation from Spanish to Quechua(Y) and their related applications.
Direct Use
Improve translations from spanish to Quechua (Y).
Downstream Use
Create High-impact platforms who helps Quechua speakers.
Out-of-Scope Use
- Generate malicious content.
- A 100% trustable platform.
Bias, Risks, and Limitations
The overall bias in this model is that is a work in progress, which is not recommended to use this model for a critical scenario.
Recommendations
It is important to evaluate the model in your own data before use it blindly. If the end user thinks is necesary an additional training, it must be done in the same conditions over the expected productive data.
How to Get Started with the Model
You can use it normally importing the model using MT5 as a base.
Training Details
Training Data
The model was trained using syntetic data from different sources. The overall dataset will be published in the future and this notes will be updated accordingly.
Training Procedure
The model was trained over the syntetic dataset and tested in a contest dataset. (In the future, we will give more details).
Preprocessing
The model was preprocessed several times for each attempt of final model delivery. (In the future, we will give more details).
Training Hyperparameters
(In the future, we will give more details).
Speeds, Sizes, Times
(In the future, we will give more details).
Evaluation
Testing Data, Factors & Metrics
Testing Data
(In the future, we will give more details).
Factors
The main end user is the speakers of Quechua (Y) dialect.
Metrics
We evaluate our model using BLEU and cHrF evaluation metrics.
Results
The model presents a Test Bleu: 2.0391 and Test chrF: 22.6785 which was tested into another dataset (contest). Our validation results (Synthetic) were Bleu: 11.0126 and Validation chrF: 31.8344.
Summary
This work was made by several researchers from universities from Peru.
(In the future, we will give more details).
Model Examination
(In the future, we will give more details).
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
Technical Specifications
Model Architecture and Objective
(In the future, we will give more details).
Compute Infrastructure
(In the future, we will give more details).
Hardware
(In the future, we will give more details).
Software
The model was trained using PyTorch.
Citation
(In the future, we will give more details).
More Information
Please feel free to mail to the main researcher Julio Santisteban to his email:
jsantisteban@ucsp.edu.pe.Model Card Authors
Ricardo Lazo Vasquez
Model Card Contact
Please email us to:
jsantisteban@ucsp.edu.pe- CC:
ricardo.lazo@ucsp.edu.pe