mbart-neutralization

This model is a fine-tuned version of facebook/mbart-large-50 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0138
  • Bleu: 98.4772
  • Gen Len: 18.5104

Model description

This model is a fine-tuned version of facebook/mbart-large-50, a multilingual sequence-to-sequence Transformer model, adapted for the task of Spanish gender neutralization. The goal of the model is to transform gender-marked Spanish sentences into gender-neutral reformulations, preserving meaning while reducing grammatical gender marking. This task can be framed as a monolingual translation problem (Spanish → neutral Spanish). The model was trained using the Hugging Face Transformers library and follows a standard encoder–decoder architecture with transfer learning from the pretrained mBART model. The resulting system performs controlled rewriting rather than translation between languages, making it suitable for experiments in:

  • inclusive language generation
  • stylistic rewriting
  • bias reduction in text
  • controlled text transformation

Intended uses & limitations

This model is intended for:

  • Research experiments in NLP and inclusive language
  • Educational purposes in courses on Machine Translation or Text Generation
  • Demonstrations of transfer learning using multilingual seq2seq models
  • Automatic rewriting of short Spanish sentences into gender-neutral forms

Training and evaluation data

The model was trained on the Spanish Gender Neutralization dataset available on Hugging Face: 👉 hackathon-pln-es/neutral-es This dataset contains pairs of aligned sentences:

  • gendered: original sentence with grammatical gender marking
  • neutral: reformulated gender-neutral version

The dataset already includes a predefined split:

  • Training set
  • Test set

The dataset is relatively small and designed mainly for educational and experimental purposes, not for large-scale production systems. Before training, the data was:

  • tokenized using the mBART tokenizer
  • truncated/padded to model limits
  • converted into input/label format for seq2seq training

Evaluation was performed using the BLEU score (sacrebleu), a standard metric in machine translation.

Training procedure

The model was trained using the Hugging Face Trainer API for sequence-to-sequence learning. Training steps:

  1. The pretrained model facebook/mbart-large-50 was loaded.
  2. The dataset was tokenized using the corresponding mBART tokenizer.
  3. Inputs were formatted as:
    • source: gendered sentence
    • target: neutral sentence
  4. The model was fine-tuned using transfer learning.
  5. Training was performed on GPU in Google Colab.
  6. Evaluation during training used the sacrebleu metric.
  7. The final model was uploaded to the Hugging Face Hub.

The model therefore learns to perform monolingual rewriting via multilingual translation architecture.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5.6e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
No log 1.0 440 0.0246 98.2861 18.5729
0.2226 2.0 880 0.0138 98.4772 18.5104

Framework versions

  • Transformers 4.51.2
  • Pytorch 2.10.0+cu128
  • Datasets 4.6.0
  • Tokenizers 0.21.4
Downloads last month
2
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CarolGuga/mbart-neutralization

Finetuned
(344)
this model