| --- |
| license: mit |
| library_name: pytorch |
| tags: |
| - sign-language-recognition |
| - transformer |
| - mediapipe |
| - skeleton-based-action-recognition |
| datasets: |
| - luciayen/CASL-W60-Landmarks |
| metrics: |
| - accuracy |
| pipeline_tag: video-classification |
| --- |
| |
| # π€ CASL-TransSLR: Robust Sign Language Transformer |
|
|
| **SignVLM-v4 Champion Model** |
|
|
| This repository contains the state-of-the-art Transformer architecture for the **CASL (Chinese-American Sign Language) Research Project**. This specific version (v4) is optimized for **Signer Independence**, meaning it is designed to recognize signs from people the model has never seen before. |
|
|
| ## π Performance Metrics (Unseen Signers) |
| Evaluated on **862 files** from independent signers: |
|
|
| | Metric | Value | |
| | :--- | :--- | |
| | **Overall Accuracy** | **80.39%** | |
| | **Weighted F1-Score** | **78.33%** | |
| | **Classes** | 60 Signs | |
|
|
| ## ποΈ Architecture Insight |
| The model uses a hybrid **Feature Extractor + Transformer Encoder** approach: |
| * **Feature Extractor:** A Linear layer (225 β 512) followed by **Temporal BatchNorm** (64 frames) to normalize motion across time. |
| * **Transformer:** 4 Layers of Multi-Head Attention ($d_{model}=512$, $n_{head}=8$, $ff_{dim}=1024$). |
| * **Classifier:** A 2-layer MLP with Dropout (0.5) for robust generalization. |
| |
| ## βοΈ Pre-processing Requirements |
| **IMPORTANT:** This model expects landmarks to be normalized. If you pass raw MediaPipe coordinates, the accuracy will drop significantly. |
| 1. **Centering:** Translate all points relative to the **Mid-Hip** (Point 0). |
| 2. **Scaling:** Normalize by the **Shoulder-to-Shoulder** distance to account for different body types. |
| 3. **Shape:** Input must be a tensor of shape `(Batch, 64, 225)`. |
| |
| ## π How to Load and Use |
| |
| ```python |
| import torch |
| from huggingface_hub import hf_hub_download |
| import importlib.util |
|
|
| # 1. Download files |
| repo_id = "luciayen/CASL-TransSLR" |
| model_bin = hf_hub_download(repo_id=repo_id, filename="pytorch_model.bin") |
| model_script = hf_hub_download(repo_id=repo_id, filename="model.py") |
|
|
| # 2. Import architecture |
| spec = importlib.util.spec_from_file_location("model_arch", model_script) |
| model_arch = importlib.util.module_from_spec(spec) |
| spec.loader.exec_module(model_arch) |
|
|
| # 3. Initialize & Load |
| device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
| model = model_arch.SignVLM().to(device) |
| model.load_state_dict(torch.load(model_bin, map_location=device)) |
| model.eval() |