|
|
--- |
|
|
tags: |
|
|
- text2text-generation |
|
|
- transformers |
|
|
- english |
|
|
- bart |
|
|
- sign-language |
|
|
library_name: transformers |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- bertscore |
|
|
- bleu |
|
|
- rouge |
|
|
base_model: |
|
|
- facebook/bart-base |
|
|
--- |
|
|
|
|
|
# Model Card for Model ID |
|
|
|
|
|
This model is a fine-tuned version of `facebook/bart-base`, trained to convert American Sign Language (ASL) gloss sequences into fluent English sentences. It is designed to assist in research, education, and accessibility applications involving gloss-based ASL interpretation. |
|
|
The model was trained using high-quality aligned pairs of gloss annotations and English translations, and evaluated using BERTScore. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
This is the model card of a transformers model that has been pushed on the Hub. This model card has been automatically generated. |
|
|
|
|
|
- **Developed by:** Dongjun Kim |
|
|
- **Model type:** Text2Text Generation, Gloss2Eng |
|
|
- **Language(s) (NLP):** English |
|
|
|
|
|
|
|
|
## Intended Uses |
|
|
|
|
|
This model is fine-tuned for translating American Sign Language (ASL) gloss input sequences into natural, grammatically correct English sentences. It can be used for: |
|
|
|
|
|
- Building real-time sign language interpretation systems |
|
|
- Research in sign language understanding and low-resource language translation |
|
|
- Educational tools for ASL learners to see gloss-to-English transformation |
|
|
- Data augmentation for multimodal ASL translation tasks |
|
|
|
|
|
## Out-of-Scope Uses |
|
|
|
|
|
The model is **not** suitable for: |
|
|
|
|
|
- Translating from ASL videos or images directly (no visual input is processed) |
|
|
- Formal legal or medical translation without human validation |
|
|
- General-purpose translation outside ASL gloss context |
|
|
- Languages other than English |
|
|
|
|
|
## How to Use |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("rrrr66254/bart-gloss-finetuned") |
|
|
model = AutoModelForSeq2SeqLM.from_pretrained("rrrr66254/bart-gloss-finetuned") |
|
|
|
|
|
gloss_input = "YOU GO STORE TOMORROW?" |
|
|
inputs = tokenizer(gloss_input, return_tensors="pt") |
|
|
output = model.generate(**inputs) |
|
|
print(tokenizer.decode(output[0], skip_special_tokens=True)) |
|
|
# Expected output is "Are you going to the store tomorrow?" |
|
|
``` |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
This model is trained on American Sign Language (ASL) glosses mapped to natural English sentences. As such, it may inherit several limitations: |
|
|
|
|
|
- **Data bias**: If the training data overrepresents certain sentence structures, cultural expressions, or gloss forms, the model may produce outputs that lack variety or inclusivity. |
|
|
- **Limited linguistic scope**: The model only understands **ASL gloss** as input and **English** as output. It does not cover other sign languages or spoken/written languages. |
|
|
- **Context loss**: ASL gloss does not encode facial expressions, spatial grammar, or non-manual signals, which are essential in ASL. The model may misrepresent meaning as a result. |
|
|
- **Generalization risk**: The model may not generalize well to gloss styles or sentence structures it wasn’t trained on. |
|
|
|
|
|
Outputs should not be used in **critical settings** (e.g., legal, medical, or emergency interpreting) without human review. |
|
|
|
|
|
### Recommendations |
|
|
|
|
|
- Human-in-the-loop: Always have a fluent signer or linguist verify model outputs in any production or educational setting. |
|
|
- Data expansion: Consider fine-tuning with more diverse gloss datasets that include different dialects or informal structures. |
|
|
- Downstream use: If used as part of a larger translation or accessibility pipeline, include disclaimers about potential misinterpretation due to a lack of non-manual signals. |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
The model was fine-tuned on a custom dataset of 1:1 pairs of ASL gloss and fluent English sentences. |
|
|
The glosses are structured representations of ASL without punctuation, articles, or verb conjugation. Each gloss sentence is paired with a corresponding English sentence that captures its intended meaning. |
|
|
The dataset was cleaned to remove non-English outputs, duplicates, and ill-formed pairs using custom filters. |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
The training used the Hugging Face `Trainer` API with a sequence-to-sequence objective. |
|
|
The training leveraged a BART-based architecture (facebook/bart-base) to learn a mapping from gloss to fluent English sentences. |
|
|
|
|
|
#### Preprocessing [optional] |
|
|
|
|
|
- Input text was trimmed and normalized |
|
|
- Tokenizer: Pretrained BART tokenizer |
|
|
- Special tokens: `[INST]` and `[/INST]` were used to delimit gloss input and output reference |
|
|
|
|
|
|
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
|
|
- **Base model**: `facebook/bart-base` |
|
|
- **Epochs**: 3 |
|
|
- **Learning rate**: 5e-5 |
|
|
- **Batch size**: 4 per device (both train and eval) |
|
|
- **Gradient accumulation**: Not used |
|
|
- **Weight decay**: 0.01 |
|
|
- **Learning rate scheduler**: Linear (default in Trainer) |
|
|
- **Precision**: Mixed precision (fp16=True) |
|
|
- **Evaluation strategy**: Per epoch |
|
|
- **Save strategy**: Per epoch (with `save_total_limit=2`) |
|
|
- **Logging frequency**: Every 50 steps |
|
|
- **Early stopping**: Custom callback based on BERTScore with patience = 2 |
|
|
- **Evaluation metric**: BERTScore (F1), computed with `microsoft/deberta-xlarge-mnli` |
|
|
|
|
|
#### Factors |
|
|
|
|
|
This model does not explicitly disaggregate results by demographic group, signer identity, or domain. However, the training data may implicitly reflect distributional biases present in publicly available gloss datasets. |
|
|
|
|
|
|
|
|
#### Metrics |
|
|
|
|
|
- **Primary metric**: BERTScore (F1), BLEU, and ROUGE |
|
|
- **Model selection**: Best checkpoint based on highest validation BERTScore-F1 |
|
|
- BERTScore is used to evaluate semantic alignment, while BLEU and ROUGE provide additional insight into surface-level n-gram overlap. All metrics were evaluated using the same held-out set of 500 gloss-reference pairs. |
|
|
|
|
|
### Results |
|
|
|
|
|
After 2 epochs of training, the model achieved the following on the 500-pair evaluation set: |
|
|
|
|
|
| Metric | Score | |
|
|
| ---------------- | ------ | |
|
|
| **BERTScore-F1** | 0.7191 | |
|
|
| **BERTScore-P** | 0.7399 | |
|
|
| **BERTScore-R** | 0.6983 | |
|
|
| **BLEU-1** | 0.7063 | |
|
|
| **BLEU-2** | 0.6175 | |
|
|
| **BLEU-3** | 0.5479 | |
|
|
| **BLEU-4** | 0.4821 | |
|
|
| **ROUGE-1** | 0.7587 | |
|
|
| **ROUGE-2** | 0.5874 | |
|
|
| **ROUGE-L** | 0.7312 | |
|
|
|
|
|
- Qualitative inspection shows that most model outputs are fluent and contextually accurate. Common errors include omission of function words and minor verb tense mismatches. |
|
|
|
|
|
|
|
|
#### Summary |
|
|
|
|
|
This model demonstrates strong potential for gloss-to-English translation, with near-human fluency in many cases. However, further work is needed to improve generalization to informal gloss styles and integrate non-manual features. |
|
|
|
|
|
## Model Card Authors |
|
|
|
|
|
- Dongjun Kim |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
- rrrr66254@gmail.com |