Glossa-BART / README.md
rrrr66254's picture
Update README.md
0034fc0 verified
---
tags:
- text2text-generation
- transformers
- english
- bart
- sign-language
library_name: transformers
language:
- en
metrics:
- bertscore
- bleu
- rouge
base_model:
- facebook/bart-base
---
# Model Card for Model ID
This model is a fine-tuned version of `facebook/bart-base`, trained to convert American Sign Language (ASL) gloss sequences into fluent English sentences. It is designed to assist in research, education, and accessibility applications involving gloss-based ASL interpretation.
The model was trained using high-quality aligned pairs of gloss annotations and English translations, and evaluated using BERTScore.
## Model Details
### Model Description
This is the model card of a transformers model that has been pushed on the Hub. This model card has been automatically generated.
- **Developed by:** Dongjun Kim
- **Model type:** Text2Text Generation, Gloss2Eng
- **Language(s) (NLP):** English
## Intended Uses
This model is fine-tuned for translating American Sign Language (ASL) gloss input sequences into natural, grammatically correct English sentences. It can be used for:
- Building real-time sign language interpretation systems
- Research in sign language understanding and low-resource language translation
- Educational tools for ASL learners to see gloss-to-English transformation
- Data augmentation for multimodal ASL translation tasks
## Out-of-Scope Uses
The model is **not** suitable for:
- Translating from ASL videos or images directly (no visual input is processed)
- Formal legal or medical translation without human validation
- General-purpose translation outside ASL gloss context
- Languages other than English
## How to Use
```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("rrrr66254/bart-gloss-finetuned")
model = AutoModelForSeq2SeqLM.from_pretrained("rrrr66254/bart-gloss-finetuned")
gloss_input = "YOU GO STORE TOMORROW?"
inputs = tokenizer(gloss_input, return_tensors="pt")
output = model.generate(**inputs)
print(tokenizer.decode(output[0], skip_special_tokens=True))
# Expected output is "Are you going to the store tomorrow?"
```
## Bias, Risks, and Limitations
This model is trained on American Sign Language (ASL) glosses mapped to natural English sentences. As such, it may inherit several limitations:
- **Data bias**: If the training data overrepresents certain sentence structures, cultural expressions, or gloss forms, the model may produce outputs that lack variety or inclusivity.
- **Limited linguistic scope**: The model only understands **ASL gloss** as input and **English** as output. It does not cover other sign languages or spoken/written languages.
- **Context loss**: ASL gloss does not encode facial expressions, spatial grammar, or non-manual signals, which are essential in ASL. The model may misrepresent meaning as a result.
- **Generalization risk**: The model may not generalize well to gloss styles or sentence structures it wasn’t trained on.
Outputs should not be used in **critical settings** (e.g., legal, medical, or emergency interpreting) without human review.
### Recommendations
- Human-in-the-loop: Always have a fluent signer or linguist verify model outputs in any production or educational setting.
- Data expansion: Consider fine-tuning with more diverse gloss datasets that include different dialects or informal structures.
- Downstream use: If used as part of a larger translation or accessibility pipeline, include disclaimers about potential misinterpretation due to a lack of non-manual signals.
## Training Details
### Training Data
The model was fine-tuned on a custom dataset of 1:1 pairs of ASL gloss and fluent English sentences.
The glosses are structured representations of ASL without punctuation, articles, or verb conjugation. Each gloss sentence is paired with a corresponding English sentence that captures its intended meaning.
The dataset was cleaned to remove non-English outputs, duplicates, and ill-formed pairs using custom filters.
### Training Procedure
The training used the Hugging Face `Trainer` API with a sequence-to-sequence objective.
The training leveraged a BART-based architecture (facebook/bart-base) to learn a mapping from gloss to fluent English sentences.
#### Preprocessing [optional]
- Input text was trimmed and normalized
- Tokenizer: Pretrained BART tokenizer
- Special tokens: `[INST]` and `[/INST]` were used to delimit gloss input and output reference
#### Training Hyperparameters
#### Training Hyperparameters
- **Base model**: `facebook/bart-base`
- **Epochs**: 3
- **Learning rate**: 5e-5
- **Batch size**: 4 per device (both train and eval)
- **Gradient accumulation**: Not used
- **Weight decay**: 0.01
- **Learning rate scheduler**: Linear (default in Trainer)
- **Precision**: Mixed precision (fp16=True)
- **Evaluation strategy**: Per epoch
- **Save strategy**: Per epoch (with `save_total_limit=2`)
- **Logging frequency**: Every 50 steps
- **Early stopping**: Custom callback based on BERTScore with patience = 2
- **Evaluation metric**: BERTScore (F1), computed with `microsoft/deberta-xlarge-mnli`
#### Factors
This model does not explicitly disaggregate results by demographic group, signer identity, or domain. However, the training data may implicitly reflect distributional biases present in publicly available gloss datasets.
#### Metrics
- **Primary metric**: BERTScore (F1), BLEU, and ROUGE
- **Model selection**: Best checkpoint based on highest validation BERTScore-F1
- BERTScore is used to evaluate semantic alignment, while BLEU and ROUGE provide additional insight into surface-level n-gram overlap. All metrics were evaluated using the same held-out set of 500 gloss-reference pairs.
### Results
After 2 epochs of training, the model achieved the following on the 500-pair evaluation set:
| Metric | Score |
| ---------------- | ------ |
| **BERTScore-F1** | 0.7191 |
| **BERTScore-P** | 0.7399 |
| **BERTScore-R** | 0.6983 |
| **BLEU-1** | 0.7063 |
| **BLEU-2** | 0.6175 |
| **BLEU-3** | 0.5479 |
| **BLEU-4** | 0.4821 |
| **ROUGE-1** | 0.7587 |
| **ROUGE-2** | 0.5874 |
| **ROUGE-L** | 0.7312 |
- Qualitative inspection shows that most model outputs are fluent and contextually accurate. Common errors include omission of function words and minor verb tense mismatches.
#### Summary
This model demonstrates strong potential for gloss-to-English translation, with near-human fluency in many cases. However, further work is needed to improve generalization to informal gloss styles and integrate non-manual features.
## Model Card Authors
- Dongjun Kim
## Model Card Contact
- rrrr66254@gmail.com