fiskenai commited on
Commit
2cca664
·
verified ·
1 Parent(s): be8fe00

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -3
README.md CHANGED
@@ -1,3 +1,46 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Geo-Sign
2
+
3
+ ---
4
+ license: cc-by-nc-4.0
5
+ library_name: transformers
6
+ license: mit
7
+ model_name: Geo-Sign (Hyperbolic-Token)
8
+ paperswithcode_id: geo-sign-hyperbolic-contrastive-regularisation
9
+ tags:
10
+ - sign-language-translation
11
+ - skeleton-based
12
+ - hyperbolic-geometry
13
+ - mT5
14
+ datasets:
15
+ - CSL-Daily
16
+ - CSL-News
17
+ language:
18
+ - zh
19
+ task:
20
+ - sign-language-translation
21
+ ---
22
+
23
+ # Geo-Sign 🌐✋ → 📝
24
+ **Hyperbolic Contrastive Regularisation for Geometrically-Aware Sign-Language Translation**
25
+
26
+ **Paper**: *Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign-Language Translation*
27
+ Edward Fish, Richard Bowden, CVSSP – University of Surrey (arXiv:2506.00129, May 2025)
28
+ **Code**: <https://github.com/ed-fish/geo-sign>
29
+
30
+ ## TL;DR
31
+ Geo-Sign projects pose-based sign-language features into a learnable **Poincaré ball** and aligns them with text embeddings via a geometric contrastive loss.
32
+ Compared with the strong Uni-Sign pose baseline, Geo-Sign boosts BLEU-4 by **+1.81** and ROUGE-L by **+3.03** on the CSL-Daily benchmark while keeping privacy-friendly skeletal inputs only.
33
+
34
+ ## Model Details
35
+ | | |
36
+ |---|---|
37
+ | **Backbone** | Four part-specific **ST-GCNs** (body / L-hand / R-hand / face) feeding an mT5-Base decoder |
38
+ | **Hyperbolic branch** | • Learnable curvature \(c\) (init 1.5) • 256-D Poincaré embeddings • Weighted Fréchet-mean pooling (global) or Token-level hyperbolic attention (this checkpoint = **Token**) |
39
+ | **Train data** | Pre-trained pose encoder on **CSL-News** (1 985 h) then fine-tuned 40 epochs on **CSL-Daily** (20 k videos) |
40
+ | **Objective** | Cross-entropy translation + hyperbolic InfoNCE (α = 0.7) with Riemannian Adam optimisation |
41
+ | **Params** | 589 M (adds < 0.25 % over Uni-Sign) |
42
+ | **Frameworks** | PyTorch 2 · Hugging Face Transformers · Geoopt |
43
+
44
+ ## Intended Uses & Scope
45
+ * **Primary** – Sign-language-to-text translation research, especially for resource-constrained or privacy-sensitive settings where RGB video is unavailable.
46
+ * **Out-of-scope** – Real-time production deployments without reliable pose estimation, medical or legal interpretations, or languages beyond datasets the model was trained on.