fiskenai
/

Geo-Sign

+Geo-Sign
+---
+license: cc-by-nc-4.0
+library_name: transformers
+license: mit
+model_name: Geo-Sign (Hyperbolic-Token)
+paperswithcode_id: geo-sign-hyperbolic-contrastive-regularisation
+tags:
+  - sign-language-translation
+  - skeleton-based
+  - hyperbolic-geometry
+  - mT5
+datasets:
+  - CSL-Daily
+  - CSL-News
+language:
+  - zh
+task:
+  - sign-language-translation
+---
+# Geo-Sign 🌐✋ → 📝
+**Hyperbolic Contrastive Regularisation for Geometrically-Aware Sign-Language Translation**
+**Paper**: *Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign-Language Translation*
+Edward Fish, Richard Bowden, CVSSP – University of Surrey (arXiv:2506.00129, May 2025)
+**Code**: <https://github.com/ed-fish/geo-sign>
+## TL;DR
+Geo-Sign projects pose-based sign-language features into a learnable **Poincaré ball** and aligns them with text embeddings via a geometric contrastive loss.
+Compared with the strong Uni-Sign pose baseline, Geo-Sign boosts BLEU-4 by **+1.81** and ROUGE-L by **+3.03** on the CSL-Daily benchmark while keeping privacy-friendly skeletal inputs only.
+## Model Details
+| | |
+|---|---|
+| **Backbone** | Four part-specific **ST-GCNs** (body / L-hand / R-hand / face) feeding an mT5-Base decoder |
+| **Hyperbolic branch** | • Learnable curvature \(c\) (init 1.5) • 256-D Poincaré embeddings • Weighted Fréchet-mean pooling (global) or Token-level hyperbolic attention (this checkpoint = **Token**) |
+| **Train data** | Pre-trained pose encoder on **CSL-News** (1 985 h) then fine-tuned 40 epochs on **CSL-Daily** (20 k videos) |
+| **Objective** | Cross-entropy translation + hyperbolic InfoNCE (α = 0.7) with Riemannian Adam optimisation |
+| **Params** | 589 M (adds < 0.25 % over Uni-Sign) |
+| **Frameworks** | PyTorch 2 · Hugging Face Transformers · Geoopt |
+## Intended Uses & Scope
+* **Primary** – Sign-language-to-text translation research, especially for resource-constrained or privacy-sensitive settings where RGB video is unavailable.
+* **Out-of-scope** – Real-time production deployments without reliable pose estimation, medical or legal interpretations, or languages beyond datasets the model was trained on.