Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,46 @@
|
|
| 1 |
-
-
|
| 2 |
-
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Geo-Sign
|
| 2 |
+
|
| 3 |
+
---
|
| 4 |
+
license: cc-by-nc-4.0
|
| 5 |
+
library_name: transformers
|
| 6 |
+
license: mit
|
| 7 |
+
model_name: Geo-Sign (Hyperbolic-Token)
|
| 8 |
+
paperswithcode_id: geo-sign-hyperbolic-contrastive-regularisation
|
| 9 |
+
tags:
|
| 10 |
+
- sign-language-translation
|
| 11 |
+
- skeleton-based
|
| 12 |
+
- hyperbolic-geometry
|
| 13 |
+
- mT5
|
| 14 |
+
datasets:
|
| 15 |
+
- CSL-Daily
|
| 16 |
+
- CSL-News
|
| 17 |
+
language:
|
| 18 |
+
- zh
|
| 19 |
+
task:
|
| 20 |
+
- sign-language-translation
|
| 21 |
+
---
|
| 22 |
+
|
| 23 |
+
# Geo-Sign 🌐✋ → 📝
|
| 24 |
+
**Hyperbolic Contrastive Regularisation for Geometrically-Aware Sign-Language Translation**
|
| 25 |
+
|
| 26 |
+
**Paper**: *Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign-Language Translation*
|
| 27 |
+
Edward Fish, Richard Bowden, CVSSP – University of Surrey (arXiv:2506.00129, May 2025)
|
| 28 |
+
**Code**: <https://github.com/ed-fish/geo-sign>
|
| 29 |
+
|
| 30 |
+
## TL;DR
|
| 31 |
+
Geo-Sign projects pose-based sign-language features into a learnable **Poincaré ball** and aligns them with text embeddings via a geometric contrastive loss.
|
| 32 |
+
Compared with the strong Uni-Sign pose baseline, Geo-Sign boosts BLEU-4 by **+1.81** and ROUGE-L by **+3.03** on the CSL-Daily benchmark while keeping privacy-friendly skeletal inputs only.
|
| 33 |
+
|
| 34 |
+
## Model Details
|
| 35 |
+
| | |
|
| 36 |
+
|---|---|
|
| 37 |
+
| **Backbone** | Four part-specific **ST-GCNs** (body / L-hand / R-hand / face) feeding an mT5-Base decoder |
|
| 38 |
+
| **Hyperbolic branch** | • Learnable curvature \(c\) (init 1.5) • 256-D Poincaré embeddings • Weighted Fréchet-mean pooling (global) or Token-level hyperbolic attention (this checkpoint = **Token**) |
|
| 39 |
+
| **Train data** | Pre-trained pose encoder on **CSL-News** (1 985 h) then fine-tuned 40 epochs on **CSL-Daily** (20 k videos) |
|
| 40 |
+
| **Objective** | Cross-entropy translation + hyperbolic InfoNCE (α = 0.7) with Riemannian Adam optimisation |
|
| 41 |
+
| **Params** | 589 M (adds < 0.25 % over Uni-Sign) |
|
| 42 |
+
| **Frameworks** | PyTorch 2 · Hugging Face Transformers · Geoopt |
|
| 43 |
+
|
| 44 |
+
## Intended Uses & Scope
|
| 45 |
+
* **Primary** – Sign-language-to-text translation research, especially for resource-constrained or privacy-sensitive settings where RGB video is unavailable.
|
| 46 |
+
* **Out-of-scope** – Real-time production deployments without reliable pose estimation, medical or legal interpretations, or languages beyond datasets the model was trained on.
|