File size: 2,425 Bytes
d5d5443 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | ---
license: mit
library_name: pytorch
tags:
- sign-language-recognition
- transformer
- mediapipe
- skeleton-based-action-recognition
datasets:
- luciayen/CASL-W60-Landmarks
metrics:
- accuracy
pipeline_tag: video-classification
---
# ๐ค CASL-TransSLR: Robust Sign Language Transformer
**SignVLM-v4 Champion Model**
This repository contains the state-of-the-art Transformer architecture for the **CASL (Chinese-American Sign Language) Research Project**. This specific version (v4) is optimized for **Signer Independence**, meaning it is designed to recognize signs from people the model has never seen before.
## ๐ Performance Metrics (Unseen Signers)
Evaluated on **862 files** from independent signers:
| Metric | Value |
| :--- | :--- |
| **Overall Accuracy** | **80.39%** |
| **Weighted F1-Score** | **78.33%** |
| **Classes** | 60 Signs |
## ๐๏ธ Architecture Insight
The model uses a hybrid **Feature Extractor + Transformer Encoder** approach:
* **Feature Extractor:** A Linear layer (225 โ 512) followed by **Temporal BatchNorm** (64 frames) to normalize motion across time.
* **Transformer:** 4 Layers of Multi-Head Attention ($d_{model}=512$, $n_{head}=8$, $ff_{dim}=1024$).
* **Classifier:** A 2-layer MLP with Dropout (0.5) for robust generalization.
## โ๏ธ Pre-processing Requirements
**IMPORTANT:** This model expects landmarks to be normalized. If you pass raw MediaPipe coordinates, the accuracy will drop significantly.
1. **Centering:** Translate all points relative to the **Mid-Hip** (Point 0).
2. **Scaling:** Normalize by the **Shoulder-to-Shoulder** distance to account for different body types.
3. **Shape:** Input must be a tensor of shape `(Batch, 64, 225)`.
## ๐ How to Load and Use
```python
import torch
from huggingface_hub import hf_hub_download
import importlib.util
# 1. Download files
repo_id = "luciayen/CASL-TransSLR"
model_bin = hf_hub_download(repo_id=repo_id, filename="pytorch_model.bin")
model_script = hf_hub_download(repo_id=repo_id, filename="model.py")
# 2. Import architecture
spec = importlib.util.spec_from_file_location("model_arch", model_script)
model_arch = importlib.util.module_from_spec(spec)
spec.loader.exec_module(model_arch)
# 3. Initialize & Load
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model_arch.SignVLM().to(device)
model.load_state_dict(torch.load(model_bin, map_location=device))
model.eval() |