ArSL-Models — Arabic Sign Language Recognition

Two PyTorch models for Arabic Sign Language (ArSL) recognition, used by the CSLR app (deployed as a Hugging Face Space):

File	Task	Architecture	Classes
`improved_arsl_model.pth`	Alphabet (finger-spelling)	ResNet18 + BiLSTM + Attention	29
`sign_word_t5_classifier_best_3d.pth`	Word signs	T5-encoder over hand landmarks	10

⚠️ Note: Quantitative results below (accuracy, F1, dataset sizes) are placeholders pending the accompanying paper. They will be updated with the exact reported figures.

1. Alphabet model — `improved_arsl_model.pth`

A spatial-sequential classifier for static Arabic letter signs.

Architecture (ArSLAttentionLSTM)

Backbone: ResNet18 (ImageNet-pretrained), final pooling/fc removed → 512×7×7 feature map.
Sequence: the 7×7 grid is flattened to a length-49 sequence of 512-d vectors.
Recurrence: 2-layer bidirectional LSTM, hidden size 512 (→ 1024-d outputs).
Attention: additive attention pools the LSTM outputs into one context vector.
Head: 1024 → 512 → 256 → 29 MLP with BatchNorm, ReLU, dropout (0.5).

Input

RGB image, resized to 224×224, normalized with ImageNet mean/std ([0.485, 0.456, 0.406] / [0.229, 0.224, 0.225]).
Hand presence is verified with MediaPipe before classification.

Output: softmax over 29 classes.

Alphabet label map (index → letter)

idx	letter	idx	letter	idx	letter	idx	letter	idx	letter
0	ع	6	ف	12	ك	18	ر	24	ث
1	أ	7	ق	13	خ	19	ص	25	ذ
2	ب	8	غ	14	لا	20	س	26	و
3	د	9	ه	15	ل	21	ش	27	ي
4	ظ	10	ح	16	م	22	ط	28	ز
5	ض	11	ج	17	ن	23	ت

Reported metrics (to update from paper)

Metric	Value
Test accuracy	TBD
Macro F1	TBD
Dataset / split	TBD

2. Word model — `sign_word_t5_classifier_best_3d.pth`

A landmark-based classifier for dynamic word signs.

Architecture (T5EncoderClassifier)

Base: encoder of google-t5/t5-small (d_model = 512).
Input projection: Linear(feature_dim → 512) → Dropout → LayerNorm → GELU.
Pooling: first-token (CLS-style) hidden state of the encoder.
Head: Dropout → Linear(512 → 10).

Input

MediaPipe hand landmarks: 21 landmarks × 3 coords (x, y, z) for 1 hand → feature_dim = 63.
Landmarks are wrist-centered and scaled by the wrist→middle-MCP distance.
A single frame's landmarks are tiled to a sequence length of 100 with an all-ones attention mask.

Output: softmax over 10 classes.

Word label map (index → Arabic → English)

idx	Arabic	English
0	ينام	sleep
1	يسكت	be silent
2	حب	love
3	يدخن	smoke
4	دعم	support
5	مرتبك	confused
6	قلق	worried
7	هنا	here
8	السلام عليكم	greeting (peace be upon you)
9	شكرا	thanks

Reported metrics (to update from paper)

Metric	Value
Test accuracy	TBD
Macro F1	TBD
Dataset / split	TBD

Usage

import torch
from huggingface_hub import hf_hub_download

# --- Alphabet model ---
from models.alphabet_model import ArSLAttentionLSTM   # from the CSLR repo

ckpt = hf_hub_download("FatimahEmadEldin/ArSL-Models", "improved_arsl_model.pth")
model = ArSLAttentionLSTM(num_classes=29, hidden_size=512, num_layers=2,
                          bidirectional=True, dropout_rate=0.5)
state = torch.load(ckpt, map_location="cpu")
state = state.get("model_state_dict", state) if isinstance(state, dict) else state
model.load_state_dict(state, strict=False)
model.eval()

The full inference pipeline (MediaPipe hand detection, preprocessing, the T5 word model, and a web UI) is available in the CSLR repository.

Intended use & limitations

Intended use: education, accessibility demos, and research on Arabic sign language recognition.
Limitations: trained on a limited label set (29 letters / 10 words); accuracy depends on lighting, camera angle, hand visibility, and signing style. The word model classifies from a single tiled frame and is not a full continuous-sign sequence model. Not validated for clinical or safety-critical use.

Citation

@misc{arsl_models_2026,
  title  = {ArSL-Models: Arabic Sign Language Recognition},
  author = {Fatimah Emad Eldin},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/FatimahEmadEldin/ArSL-Models}}
}

Paper details and full results to be added.

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

FatimahEmadEldin
/

ArSL-Models

ArSL-Models — Arabic Sign Language Recognition

1. Alphabet model — `improved_arsl_model.pth`

Alphabet label map (index → letter)

Reported metrics (to update from paper)

2. Word model — `sign_word_t5_classifier_best_3d.pth`

Word label map (index → Arabic → English)

Reported metrics (to update from paper)

Usage

Intended use & limitations

Citation

License

Space using FatimahEmadEldin/ArSL-Models 1

ArSL-Models — Arabic Sign Language Recognition

1. Alphabet model — improved_arsl_model.pth

Alphabet label map (index → letter)

Reported metrics (to update from paper)

2. Word model — sign_word_t5_classifier_best_3d.pth

Word label map (index → Arabic → English)

Reported metrics (to update from paper)

Usage

Intended use & limitations

Citation

License

Space using FatimahEmadEldin/ArSL-Models 1

1. Alphabet model — `improved_arsl_model.pth`

2. Word model — `sign_word_t5_classifier_best_3d.pth`