---
tags:
  - text2text-generation
  - transformers
  - english
  - bart
  - sign-language
library_name: transformers
language:
- en
metrics:
- bertscore
- bleu
- rouge
base_model:
- facebook/bart-base
---

# Model Card for Model ID

This model is a fine-tuned version of `facebook/bart-base`, trained to convert American Sign Language (ASL) gloss sequences into fluent English sentences. It is designed to assist in research, education, and accessibility applications involving gloss-based ASL interpretation. 
The model was trained using high-quality aligned pairs of gloss annotations and English translations, and evaluated using BERTScore.

## Model Details

### Model Description

This is the model card of a transformers model that has been pushed on the Hub. This model card has been automatically generated.

- **Developed by:** Dongjun Kim
- **Model type:** Text2Text Generation, Gloss2Eng
- **Language(s) (NLP):** English


## Intended Uses

This model is fine-tuned for translating American Sign Language (ASL) gloss input sequences into natural, grammatically correct English sentences. It can be used for:

- Building real-time sign language interpretation systems
- Research in sign language understanding and low-resource language translation
- Educational tools for ASL learners to see gloss-to-English transformation
- Data augmentation for multimodal ASL translation tasks

## Out-of-Scope Uses

The model is **not** suitable for:

- Translating from ASL videos or images directly (no visual input is processed)
- Formal legal or medical translation without human validation
- General-purpose translation outside ASL gloss context
- Languages other than English

## How to Use

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("rrrr66254/bart-gloss-finetuned")
model = AutoModelForSeq2SeqLM.from_pretrained("rrrr66254/bart-gloss-finetuned")

gloss_input = "YOU GO STORE TOMORROW?"
inputs = tokenizer(gloss_input, return_tensors="pt")
output = model.generate(**inputs)
print(tokenizer.decode(output[0], skip_special_tokens=True))
# Expected output is "Are you going to the store tomorrow?"
```

## Bias, Risks, and Limitations

This model is trained on American Sign Language (ASL) glosses mapped to natural English sentences. As such, it may inherit several limitations:

- **Data bias**: If the training data overrepresents certain sentence structures, cultural expressions, or gloss forms, the model may produce outputs that lack variety or inclusivity.
- **Limited linguistic scope**: The model only understands **ASL gloss** as input and **English** as output. It does not cover other sign languages or spoken/written languages.
- **Context loss**: ASL gloss does not encode facial expressions, spatial grammar, or non-manual signals, which are essential in ASL. The model may misrepresent meaning as a result.
- **Generalization risk**: The model may not generalize well to gloss styles or sentence structures it wasn’t trained on.

Outputs should not be used in **critical settings** (e.g., legal, medical, or emergency interpreting) without human review.

### Recommendations

- Human-in-the-loop: Always have a fluent signer or linguist verify model outputs in any production or educational setting.
- Data expansion: Consider fine-tuning with more diverse gloss datasets that include different dialects or informal structures.
- Downstream use: If used as part of a larger translation or accessibility pipeline, include disclaimers about potential misinterpretation due to a lack of non-manual signals.

## Training Details

### Training Data

The model was fine-tuned on a custom dataset of 1:1 pairs of ASL gloss and fluent English sentences.
The glosses are structured representations of ASL without punctuation, articles, or verb conjugation. Each gloss sentence is paired with a corresponding English sentence that captures its intended meaning.
The dataset was cleaned to remove non-English outputs, duplicates, and ill-formed pairs using custom filters.

### Training Procedure

The training used the Hugging Face `Trainer` API with a sequence-to-sequence objective.
The training leveraged a BART-based architecture (facebook/bart-base) to learn a mapping from gloss to fluent English sentences.

#### Preprocessing [optional]

- Input text was trimmed and normalized
- Tokenizer: Pretrained BART tokenizer
- Special tokens: `[INST]` and `[/INST]` were used to delimit gloss input and output reference


#### Training Hyperparameters

#### Training Hyperparameters

- **Base model**: `facebook/bart-base`
- **Epochs**: 3
- **Learning rate**: 5e-5
- **Batch size**: 4 per device (both train and eval)
- **Gradient accumulation**: Not used
- **Weight decay**: 0.01
- **Learning rate scheduler**: Linear (default in Trainer)
- **Precision**: Mixed precision (fp16=True)
- **Evaluation strategy**: Per epoch
- **Save strategy**: Per epoch (with `save_total_limit=2`)
- **Logging frequency**: Every 50 steps
- **Early stopping**: Custom callback based on BERTScore with patience = 2
- **Evaluation metric**: BERTScore (F1), computed with `microsoft/deberta-xlarge-mnli`

#### Factors

This model does not explicitly disaggregate results by demographic group, signer identity, or domain. However, the training data may implicitly reflect distributional biases present in publicly available gloss datasets.


#### Metrics

- **Primary metric**: BERTScore (F1), BLEU, and ROUGE
- **Model selection**: Best checkpoint based on highest validation BERTScore-F1
- BERTScore is used to evaluate semantic alignment, while BLEU and ROUGE provide additional insight into surface-level n-gram overlap. All metrics were evaluated using the same held-out set of 500 gloss-reference pairs.

### Results

After 2 epochs of training, the model achieved the following on the 500-pair evaluation set:

| Metric           | Score  |
| ---------------- | ------ |
| **BERTScore-F1** | 0.7191 |
| **BERTScore-P**  | 0.7399 |
| **BERTScore-R**  | 0.6983 |
| **BLEU-1**       | 0.7063 |
| **BLEU-2**       | 0.6175 |
| **BLEU-3**       | 0.5479 |
| **BLEU-4**       | 0.4821 |
| **ROUGE-1**      | 0.7587 |
| **ROUGE-2**      | 0.5874 |
| **ROUGE-L**      | 0.7312 |

- Qualitative inspection shows that most model outputs are fluent and contextually accurate. Common errors include omission of function words and minor verb tense mismatches.


#### Summary

This model demonstrates strong potential for gloss-to-English translation, with near-human fluency in many cases. However, further work is needed to improve generalization to informal gloss styles and integrate non-manual features.

## Model Card Authors

- Dongjun Kim

## Model Card Contact

- rrrr66254@gmail.com