Update README.md
Browse files
README.md
CHANGED
|
@@ -32,15 +32,6 @@ Download the weights and data labels from the files section of this repo and add
|
|
| 32 |
Geo-Sign projects pose-based sign-language features into a learnable **Poincaré ball** and aligns them with text embeddings via a geometric contrastive loss.
|
| 33 |
Compared with the strong Uni-Sign pose baseline, Geo-Sign boosts BLEU-4 by **+1.81** and ROUGE-L by **+3.03** on the CSL-Daily benchmark while keeping privacy-friendly skeletal inputs only.
|
| 34 |
|
| 35 |
-
## Model Details
|
| 36 |
-
| | |
|
| 37 |
-
| **Backbone** | Four part-specific **ST-GCNs** (body / L-hand / R-hand / face) feeding an mT5-Base decoder |
|
| 38 |
-
| **Hyperbolic branch** | • Learnable curvature \(c\) (init 1.5) • 256-D Poincaré embeddings • Weighted Fréchet-mean pooling (global) or Token-level hyperbolic attention (this checkpoint = **Token**) |
|
| 39 |
-
| **Train data** | Pre-trained pose encoder on **CSL-News** (1 985 h) then fine-tuned 40 epochs on **CSL-Daily** (20 k videos) |
|
| 40 |
-
| **Objective** | Cross-entropy translation + hyperbolic InfoNCE (α = 0.7) with Riemannian Adam optimisation |
|
| 41 |
-
| **Params** | 589 M (adds < 0.25 % over Uni-Sign) |
|
| 42 |
-
| **Frameworks** | PyTorch 2 · Hugging Face Transformers · Geoopt |
|
| 43 |
-
|
| 44 |
## Intended Uses & Scope
|
| 45 |
* **Primary** – Sign-language-to-text translation research, especially for resource-constrained or privacy-sensitive settings where RGB video is unavailable.
|
| 46 |
* **Out-of-scope** – Real-time production deployments without reliable pose estimation, medical or legal interpretations, or languages beyond datasets the model was trained on.
|
|
|
|
| 32 |
Geo-Sign projects pose-based sign-language features into a learnable **Poincaré ball** and aligns them with text embeddings via a geometric contrastive loss.
|
| 33 |
Compared with the strong Uni-Sign pose baseline, Geo-Sign boosts BLEU-4 by **+1.81** and ROUGE-L by **+3.03** on the CSL-Daily benchmark while keeping privacy-friendly skeletal inputs only.
|
| 34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
## Intended Uses & Scope
|
| 36 |
* **Primary** – Sign-language-to-text translation research, especially for resource-constrained or privacy-sensitive settings where RGB video is unavailable.
|
| 37 |
* **Out-of-scope** – Real-time production deployments without reliable pose estimation, medical or legal interpretations, or languages beyond datasets the model was trained on.
|