File size: 6,749 Bytes

586880a
6424c32
 
 
 
 
 
586880a
4e1dcb1
 
 
 
1827d56
 
4e1dcb1
 
586880a
 
 
 
4e1dcb1
 
586880a
 
 
 
 
b030a23
586880a
4e1dcb1
5b001c4
4e1dcb1
586880a
 
4e1dcb1
586880a
4e1dcb1
586880a
4e1dcb1
 
 
 
586880a
4e1dcb1
586880a
4e1dcb1
586880a
4e1dcb1
 
 
 
586880a
4e1dcb1
586880a
4e1dcb1
 
586880a
47f5f99
 
586880a
4e1dcb1
 
 
 
47f5f99
71fa813
586880a
 
 
4e1dcb1
586880a
4e1dcb1
 
 
 
586880a
4e1dcb1
586880a
4e1dcb1
586880a
4e1dcb1
 
 
586880a
 
 
 
 
4e1dcb1
 
 
586880a
 
 
4e1dcb1
 
586880a
 
 
4e1dcb1
 
 
586880a
 
 
4e1dcb1
586880a
4e1dcb1
586880a
4e1dcb1
 
 
 
 
 
 
 
 
 
 
 
 
586880a
 
 
4e1dcb1
586880a
 
 
 
1827d56
4e1dcb1
1827d56
586880a
 
 
1827d56
586880a
7483cfb
 
0034fc0
 
 
7483cfb
 
 
 
 
 
 
 
1827d56
586880a
 
4e1dcb1
586880a
4e1dcb1
586880a
4e1dcb1
586880a
4e1dcb1
586880a
 
 
1827d56

---
tags:
  - text2text-generation
  - transformers
  - english
  - bart
  - sign-language
library_name: transformers
language:
- en
metrics:
- bertscore
- bleu
- rouge
base_model:
- facebook/bart-base
---

# Model Card for Model ID

This model is a fine-tuned version of `facebook/bart-base`, trained to convert American Sign Language (ASL) gloss sequences into fluent English sentences. It is designed to assist in research, education, and accessibility applications involving gloss-based ASL interpretation. 
The model was trained using high-quality aligned pairs of gloss annotations and English translations, and evaluated using BERTScore.

## Model Details

### Model Description

This is the model card of a transformers model that has been pushed on the Hub. This model card has been automatically generated.

- **Developed by:** Dongjun Kim
- **Model type:** Text2Text Generation, Gloss2Eng
- **Language(s) (NLP):** English


## Intended Uses

This model is fine-tuned for translating American Sign Language (ASL) gloss input sequences into natural, grammatically correct English sentences. It can be used for:

- Building real-time sign language interpretation systems
- Research in sign language understanding and low-resource language translation
- Educational tools for ASL learners to see gloss-to-English transformation
- Data augmentation for multimodal ASL translation tasks

## Out-of-Scope Uses

The model is **not** suitable for:

- Translating from ASL videos or images directly (no visual input is processed)
- Formal legal or medical translation without human validation
- General-purpose translation outside ASL gloss context
- Languages other than English

## How to Use

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("rrrr66254/bart-gloss-finetuned")
model = AutoModelForSeq2SeqLM.from_pretrained("rrrr66254/bart-gloss-finetuned")

gloss_input = "YOU GO STORE TOMORROW?"
inputs = tokenizer(gloss_input, return_tensors="pt")
output = model.generate(**inputs)
print(tokenizer.decode(output[0], skip_special_tokens=True))
# Expected output is "Are you going to the store tomorrow?"
```

## Bias, Risks, and Limitations

This model is trained on American Sign Language (ASL) glosses mapped to natural English sentences. As such, it may inherit several limitations:

- **Data bias**: If the training data overrepresents certain sentence structures, cultural expressions, or gloss forms, the model may produce outputs that lack variety or inclusivity.
- **Limited linguistic scope**: The model only understands **ASL gloss** as input and **English** as output. It does not cover other sign languages or spoken/written languages.
- **Context loss**: ASL gloss does not encode facial expressions, spatial grammar, or non-manual signals, which are essential in ASL. The model may misrepresent meaning as a result.
- **Generalization risk**: The model may not generalize well to gloss styles or sentence structures it wasn’t trained on.

Outputs should not be used in **critical settings** (e.g., legal, medical, or emergency interpreting) without human review.

### Recommendations

- Human-in-the-loop: Always have a fluent signer or linguist verify model outputs in any production or educational setting.
- Data expansion: Consider fine-tuning with more diverse gloss datasets that include different dialects or informal structures.
- Downstream use: If used as part of a larger translation or accessibility pipeline, include disclaimers about potential misinterpretation due to a lack of non-manual signals.

## Training Details

### Training Data

The model was fine-tuned on a custom dataset of 1:1 pairs of ASL gloss and fluent English sentences.
The glosses are structured representations of ASL without punctuation, articles, or verb conjugation. Each gloss sentence is paired with a corresponding English sentence that captures its intended meaning.
The dataset was cleaned to remove non-English outputs, duplicates, and ill-formed pairs using custom filters.

### Training Procedure

The training used the Hugging Face `Trainer` API with a sequence-to-sequence objective.
The training leveraged a BART-based architecture (facebook/bart-base) to learn a mapping from gloss to fluent English sentences.

#### Preprocessing [optional]

- Input text was trimmed and normalized
- Tokenizer: Pretrained BART tokenizer
- Special tokens: `[INST]` and `[/INST]` were used to delimit gloss input and output reference



#### Training Hyperparameters

#### Training Hyperparameters

- **Base model**: `facebook/bart-base`
- **Epochs**: 3
- **Learning rate**: 5e-5
- **Batch size**: 4 per device (both train and eval)
- **Gradient accumulation**: Not used
- **Weight decay**: 0.01
- **Learning rate scheduler**: Linear (default in Trainer)
- **Precision**: Mixed precision (fp16=True)
- **Evaluation strategy**: Per epoch
- **Save strategy**: Per epoch (with `save_total_limit=2`)
- **Logging frequency**: Every 50 steps
- **Early stopping**: Custom callback based on BERTScore with patience = 2
- **Evaluation metric**: BERTScore (F1), computed with `microsoft/deberta-xlarge-mnli`

#### Factors

This model does not explicitly disaggregate results by demographic group, signer identity, or domain. However, the training data may implicitly reflect distributional biases present in publicly available gloss datasets.


#### Metrics

- **Primary metric**: BERTScore (F1), BLEU, and ROUGE
- **Model selection**: Best checkpoint based on highest validation BERTScore-F1
- BERTScore is used to evaluate semantic alignment, while BLEU and ROUGE provide additional insight into surface-level n-gram overlap. All metrics were evaluated using the same held-out set of 500 gloss-reference pairs.

### Results

After 2 epochs of training, the model achieved the following on the 500-pair evaluation set:

| Metric           | Score  |
| ---------------- | ------ |
| **BERTScore-F1** | 0.7191 |
| **BERTScore-P**  | 0.7399 |
| **BERTScore-R**  | 0.6983 |
| **BLEU-1**       | 0.7063 |
| **BLEU-2**       | 0.6175 |
| **BLEU-3**       | 0.5479 |
| **BLEU-4**       | 0.4821 |
| **ROUGE-1**      | 0.7587 |
| **ROUGE-2**      | 0.5874 |
| **ROUGE-L**      | 0.7312 |

- Qualitative inspection shows that most model outputs are fluent and contextually accurate. Common errors include omission of function words and minor verb tense mismatches.


#### Summary

This model demonstrates strong potential for gloss-to-English translation, with near-human fluency in many cases. However, further work is needed to improve generalization to informal gloss styles and integrate non-manual features.

## Model Card Authors

- Dongjun Kim

## Model Card Contact

- rrrr66254@gmail.com