File size: 3,146 Bytes
2cca664
59d9e01
 
 
 
 
2cca664
 
 
 
59d9e01
 
 
 
 
2cca664
59d9e01
 
2cca664
 
 
 
 
 
 
 
4aef6ba
c5165af
4aef6ba
 
1f7b9fd
9d676ab
3059c07
 
9d676ab
 
 
 
 
 
3059c07
2cca664
 
 
 
 
 
59d9e01
 
4aef6ba
 
 
 
 
 
 
 
 
 
 
 
 
59d9e01
 
 
 
 
4aef6ba
 
 
 
 
 
 
 
 
 
 
59d9e01
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
datasets:
- CSL-Daily
- CSL-News
language:
- zh
library_name: transformers
license: mit
model_name: Geo-Sign (Hyperbolic-Token)
tags:
- sign-language-translation
- skeleton-based
- hyperbolic-geometry
- mT5
paperswithcode_id: geo-sign-hyperbolic-contrastive-regularisation
task:
- sign-language-translation
pipeline_tag: video-text-to-text
---

# Geo-Sign πŸŒβœ‹ β†’ πŸ“  
**Hyperbolic Contrastive Regularisation for Geometrically-Aware Sign-Language Translation**

**Paper**: *Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign-Language Translation*  
Edward Fish, Richard Bowden, CVSSP – University of Surrey (arXiv:2506.00129, May 2025)  
**Code**: <https://github.com/ed-fish/geo-sign>
**Paper** <https://arxiv.org/pdf/2506.00129v1>
**NeurIPS 2025**

## Code Use
Download the weights and data labels from the files section of this repo and add them to the github repository <https://github.com/ed-fish/geo-sign>.

You will also need the base-mt5 model from <https://huggingface.co/google/mt5-base> and put it in the pretrained_weight folder.

`Data -> ./Data`

`best.pth -> ./checkpoints/best.pth`

`pretraining.pth -> ./checkpoints/pretraining.pth`

`<https://huggingface.co/google/mt5-base> -> ./pretrained_weight`

## TL;DR  
Geo-Sign projects pose-based sign-language features into a learnable **PoincarΓ© ball** and aligns them with text embeddings via a geometric contrastive loss.  
Compared with the strong Uni-Sign pose baseline, Geo-Sign boosts BLEU-4 by **+1.81** and ROUGE-L by **+3.03** on the CSL-Daily benchmark while keeping privacy-friendly skeletal inputs only.

## Intended Uses & Scope
*   **Primary** – Sign-language-to-text translation research, especially for resource-constrained or privacy-sensitive settings where RGB video is unavailable.  
*   **Out-of-scope** – Real-time production deployments without reliable pose estimation, medical or legal interpretations, or languages beyond datasets the model was trained on.

## Evaluation

| Dataset          | Modality | BLEU-4 ↑ | ROUGE-L ↑ |
|------------------|----------|----------|-----------|
| CSL-Daily (test) | Pose-only | **27.42** | **57.95** |

Geo-Sign outperforms all previous gloss-free pose-only methods and rivals many RGB- or gloss-based systems.

---

## Limitations & Ethical Considerations

*   **Pose-estimation dependency** – Errors in upstream key-points propagate to the translation.  
*   **Training latency** – Hyperbolic operations slow training (~4–6 Γ—) but add **no** cost at inference.  
*   **Generalisation** – Evaluated only on Chinese Sign Language; other sign languages are not guaranteed.  
*   **Mis-translation risk** – Automatic SLT can mis-communicate; keep a human in the loop for critical use cases.  
*   **Biases** – CSL-Daily is domain-specific (news/TV); outputs may reflect that linguistic style.

---

## Citation

```bibtex
@article{fish2025geo,
  title={Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign Language Translation},
  author={Fish, Edward and Bowden, Richard},
  journal={arXiv preprint arXiv:2506.00129},
  year={2025}
}```