File size: 3,146 Bytes
2cca664 59d9e01 2cca664 59d9e01 2cca664 59d9e01 2cca664 4aef6ba c5165af 4aef6ba 1f7b9fd 9d676ab 3059c07 9d676ab 3059c07 2cca664 59d9e01 4aef6ba 59d9e01 4aef6ba 59d9e01 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
---
datasets:
- CSL-Daily
- CSL-News
language:
- zh
library_name: transformers
license: mit
model_name: Geo-Sign (Hyperbolic-Token)
tags:
- sign-language-translation
- skeleton-based
- hyperbolic-geometry
- mT5
paperswithcode_id: geo-sign-hyperbolic-contrastive-regularisation
task:
- sign-language-translation
pipeline_tag: video-text-to-text
---
# Geo-Sign πβ β π
**Hyperbolic Contrastive Regularisation for Geometrically-Aware Sign-Language Translation**
**Paper**: *Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign-Language Translation*
Edward Fish, Richard Bowden, CVSSP β University of Surrey (arXiv:2506.00129, May 2025)
**Code**: <https://github.com/ed-fish/geo-sign>
**Paper** <https://arxiv.org/pdf/2506.00129v1>
**NeurIPS 2025**
## Code Use
Download the weights and data labels from the files section of this repo and add them to the github repository <https://github.com/ed-fish/geo-sign>.
You will also need the base-mt5 model from <https://huggingface.co/google/mt5-base> and put it in the pretrained_weight folder.
`Data -> ./Data`
`best.pth -> ./checkpoints/best.pth`
`pretraining.pth -> ./checkpoints/pretraining.pth`
`<https://huggingface.co/google/mt5-base> -> ./pretrained_weight`
## TL;DR
Geo-Sign projects pose-based sign-language features into a learnable **PoincarΓ© ball** and aligns them with text embeddings via a geometric contrastive loss.
Compared with the strong Uni-Sign pose baseline, Geo-Sign boosts BLEU-4 by **+1.81** and ROUGE-L by **+3.03** on the CSL-Daily benchmark while keeping privacy-friendly skeletal inputs only.
## Intended Uses & Scope
* **Primary** β Sign-language-to-text translation research, especially for resource-constrained or privacy-sensitive settings where RGB video is unavailable.
* **Out-of-scope** β Real-time production deployments without reliable pose estimation, medical or legal interpretations, or languages beyond datasets the model was trained on.
## Evaluation
| Dataset | Modality | BLEU-4 β | ROUGE-L β |
|------------------|----------|----------|-----------|
| CSL-Daily (test) | Pose-only | **27.42** | **57.95** |
Geo-Sign outperforms all previous gloss-free pose-only methods and rivals many RGB- or gloss-based systems.
---
## Limitations & Ethical Considerations
* **Pose-estimation dependency** β Errors in upstream key-points propagate to the translation.
* **Training latency** β Hyperbolic operations slow training (~4β6 Γ) but add **no** cost at inference.
* **Generalisation** β Evaluated only on Chinese Sign Language; other sign languages are not guaranteed.
* **Mis-translation risk** β Automatic SLT can mis-communicate; keep a human in the loop for critical use cases.
* **Biases** β CSL-Daily is domain-specific (news/TV); outputs may reflect that linguistic style.
---
## Citation
```bibtex
@article{fish2025geo,
title={Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign Language Translation},
author={Fish, Edward and Bowden, Richard},
journal={arXiv preprint arXiv:2506.00129},
year={2025}
}``` |