ArSL-Models โ€” Arabic Sign Language Recognition

Two PyTorch models for Arabic Sign Language (ArSL) recognition, used by the CSLR app (deployed as a Hugging Face Space):

File Task Architecture Classes
improved_arsl_model.pth Alphabet (finger-spelling) ResNet18 + BiLSTM + Attention 29
sign_word_t5_classifier_best_3d.pth Word signs T5-encoder over hand landmarks 10

โš ๏ธ Note: Quantitative results below (accuracy, F1, dataset sizes) are placeholders pending the accompanying paper. They will be updated with the exact reported figures.


1. Alphabet model โ€” improved_arsl_model.pth

A spatial-sequential classifier for static Arabic letter signs.

Architecture (ArSLAttentionLSTM)

  • Backbone: ResNet18 (ImageNet-pretrained), final pooling/fc removed โ†’ 512ร—7ร—7 feature map.
  • Sequence: the 7ร—7 grid is flattened to a length-49 sequence of 512-d vectors.
  • Recurrence: 2-layer bidirectional LSTM, hidden size 512 (โ†’ 1024-d outputs).
  • Attention: additive attention pools the LSTM outputs into one context vector.
  • Head: 1024 โ†’ 512 โ†’ 256 โ†’ 29 MLP with BatchNorm, ReLU, dropout (0.5).

Input

  • RGB image, resized to 224ร—224, normalized with ImageNet mean/std ([0.485, 0.456, 0.406] / [0.229, 0.224, 0.225]).
  • Hand presence is verified with MediaPipe before classification.

Output: softmax over 29 classes.

Alphabet label map (index โ†’ letter)

idx letter idx letter idx letter idx letter idx letter
0 ุน 6 ู 12 ูƒ 18 ุฑ 24 ุซ
1 ุฃ 7 ู‚ 13 ุฎ 19 ุต 25 ุฐ
2 ุจ 8 ุบ 14 ู„ุง 20 ุณ 26 ูˆ
3 ุฏ 9 ู‡ 15 ู„ 21 ุด 27 ูŠ
4 ุธ 10 ุญ 16 ู… 22 ุท 28 ุฒ
5 ุถ 11 ุฌ 17 ู† 23 ุช

Reported metrics (to update from paper)

Metric Value
Test accuracy TBD
Macro F1 TBD
Dataset / split TBD

2. Word model โ€” sign_word_t5_classifier_best_3d.pth

A landmark-based classifier for dynamic word signs.

Architecture (T5EncoderClassifier)

  • Base: encoder of google-t5/t5-small (d_model = 512).
  • Input projection: Linear(feature_dim โ†’ 512) โ†’ Dropout โ†’ LayerNorm โ†’ GELU.
  • Pooling: first-token (CLS-style) hidden state of the encoder.
  • Head: Dropout โ†’ Linear(512 โ†’ 10).

Input

  • MediaPipe hand landmarks: 21 landmarks ร— 3 coords (x, y, z) for 1 hand โ†’ feature_dim = 63.
  • Landmarks are wrist-centered and scaled by the wristโ†’middle-MCP distance.
  • A single frame's landmarks are tiled to a sequence length of 100 with an all-ones attention mask.

Output: softmax over 10 classes.

Word label map (index โ†’ Arabic โ†’ English)

idx Arabic English
0 ูŠู†ุงู… sleep
1 ูŠุณูƒุช be silent
2 ุญุจ love
3 ูŠุฏุฎู† smoke
4 ุฏุนู… support
5 ู…ุฑุชุจูƒ confused
6 ู‚ู„ู‚ worried
7 ู‡ู†ุง here
8 ุงู„ุณู„ุงู… ุนู„ูŠูƒู… greeting (peace be upon you)
9 ุดูƒุฑุง thanks

Reported metrics (to update from paper)

Metric Value
Test accuracy TBD
Macro F1 TBD
Dataset / split TBD

Usage

import torch
from huggingface_hub import hf_hub_download

# --- Alphabet model ---
from models.alphabet_model import ArSLAttentionLSTM   # from the CSLR repo

ckpt = hf_hub_download("FatimahEmadEldin/ArSL-Models", "improved_arsl_model.pth")
model = ArSLAttentionLSTM(num_classes=29, hidden_size=512, num_layers=2,
                          bidirectional=True, dropout_rate=0.5)
state = torch.load(ckpt, map_location="cpu")
state = state.get("model_state_dict", state) if isinstance(state, dict) else state
model.load_state_dict(state, strict=False)
model.eval()

The full inference pipeline (MediaPipe hand detection, preprocessing, the T5 word model, and a web UI) is available in the CSLR repository.

Intended use & limitations

  • Intended use: education, accessibility demos, and research on Arabic sign language recognition.
  • Limitations: trained on a limited label set (29 letters / 10 words); accuracy depends on lighting, camera angle, hand visibility, and signing style. The word model classifies from a single tiled frame and is not a full continuous-sign sequence model. Not validated for clinical or safety-critical use.

Citation

@misc{arsl_models_2026,
  title  = {ArSL-Models: Arabic Sign Language Recognition},
  author = {Fatimah Emad Eldin},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/FatimahEmadEldin/ArSL-Models}}
}

Paper details and full results to be added.

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using FatimahEmadEldin/ArSL-Models 1