Update README.md
Browse files
README.md
CHANGED
|
@@ -1,7 +1,4 @@
|
|
| 1 |
-
Geo-Sign
|
| 2 |
-
|
| 3 |
---
|
| 4 |
-
license: cc-by-nc-4.0
|
| 5 |
library_name: transformers
|
| 6 |
license: mit
|
| 7 |
model_name: Geo-Sign (Hyperbolic-Token)
|
|
@@ -26,6 +23,10 @@ task:
|
|
| 26 |
**Paper**: *Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign-Language Translation*
|
| 27 |
Edward Fish, Richard Bowden, CVSSP – University of Surrey (arXiv:2506.00129, May 2025)
|
| 28 |
**Code**: <https://github.com/ed-fish/geo-sign>
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
## TL;DR
|
| 31 |
Geo-Sign projects pose-based sign-language features into a learnable **Poincaré ball** and aligns them with text embeddings via a geometric contrastive loss.
|
|
@@ -33,7 +34,6 @@ Compared with the strong Uni-Sign pose baseline, Geo-Sign boosts BLEU-4 by **+1.
|
|
| 33 |
|
| 34 |
## Model Details
|
| 35 |
| | |
|
| 36 |
-
|---|---|
|
| 37 |
| **Backbone** | Four part-specific **ST-GCNs** (body / L-hand / R-hand / face) feeding an mT5-Base decoder |
|
| 38 |
| **Hyperbolic branch** | • Learnable curvature \(c\) (init 1.5) • 256-D Poincaré embeddings • Weighted Fréchet-mean pooling (global) or Token-level hyperbolic attention (this checkpoint = **Token**) |
|
| 39 |
| **Train data** | Pre-trained pose encoder on **CSL-News** (1 985 h) then fine-tuned 40 epochs on **CSL-Daily** (20 k videos) |
|
|
@@ -44,3 +44,34 @@ Compared with the strong Uni-Sign pose baseline, Geo-Sign boosts BLEU-4 by **+1.
|
|
| 44 |
## Intended Uses & Scope
|
| 45 |
* **Primary** – Sign-language-to-text translation research, especially for resource-constrained or privacy-sensitive settings where RGB video is unavailable.
|
| 46 |
* **Out-of-scope** – Real-time production deployments without reliable pose estimation, medical or legal interpretations, or languages beyond datasets the model was trained on.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
library_name: transformers
|
| 3 |
license: mit
|
| 4 |
model_name: Geo-Sign (Hyperbolic-Token)
|
|
|
|
| 23 |
**Paper**: *Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign-Language Translation*
|
| 24 |
Edward Fish, Richard Bowden, CVSSP – University of Surrey (arXiv:2506.00129, May 2025)
|
| 25 |
**Code**: <https://github.com/ed-fish/geo-sign>
|
| 26 |
+
**Paper** <https://arxiv.org/pdf/2506.00129v1>
|
| 27 |
+
|
| 28 |
+
## Code Use
|
| 29 |
+
Download the weights and data labels from the files section of this repo and add them to the github repository <https://github.com/ed-fish/geo-sign> under the correct folders (data, checkpoints).
|
| 30 |
|
| 31 |
## TL;DR
|
| 32 |
Geo-Sign projects pose-based sign-language features into a learnable **Poincaré ball** and aligns them with text embeddings via a geometric contrastive loss.
|
|
|
|
| 34 |
|
| 35 |
## Model Details
|
| 36 |
| | |
|
|
|
|
| 37 |
| **Backbone** | Four part-specific **ST-GCNs** (body / L-hand / R-hand / face) feeding an mT5-Base decoder |
|
| 38 |
| **Hyperbolic branch** | • Learnable curvature \(c\) (init 1.5) • 256-D Poincaré embeddings • Weighted Fréchet-mean pooling (global) or Token-level hyperbolic attention (this checkpoint = **Token**) |
|
| 39 |
| **Train data** | Pre-trained pose encoder on **CSL-News** (1 985 h) then fine-tuned 40 epochs on **CSL-Daily** (20 k videos) |
|
|
|
|
| 44 |
## Intended Uses & Scope
|
| 45 |
* **Primary** – Sign-language-to-text translation research, especially for resource-constrained or privacy-sensitive settings where RGB video is unavailable.
|
| 46 |
* **Out-of-scope** – Real-time production deployments without reliable pose estimation, medical or legal interpretations, or languages beyond datasets the model was trained on.
|
| 47 |
+
|
| 48 |
+
## Evaluation
|
| 49 |
+
|
| 50 |
+
| Dataset | Modality | BLEU-4 ↑ | ROUGE-L ↑ |
|
| 51 |
+
|------------------|----------|----------|-----------|
|
| 52 |
+
| CSL-Daily (test) | Pose-only | **27.42** | **57.95** |
|
| 53 |
+
|
| 54 |
+
Geo-Sign outperforms all previous gloss-free pose-only methods and rivals many RGB- or gloss-based systems.
|
| 55 |
+
|
| 56 |
+
---
|
| 57 |
+
|
| 58 |
+
## Limitations & Ethical Considerations
|
| 59 |
+
|
| 60 |
+
* **Pose-estimation dependency** – Errors in upstream key-points propagate to the translation.
|
| 61 |
+
* **Training latency** – Hyperbolic operations slow training (~4–6 ×) but add **no** cost at inference.
|
| 62 |
+
* **Generalisation** – Evaluated only on Chinese Sign Language; other sign languages are not guaranteed.
|
| 63 |
+
* **Mis-translation risk** – Automatic SLT can mis-communicate; keep a human in the loop for critical use cases.
|
| 64 |
+
* **Biases** – CSL-Daily is domain-specific (news/TV); outputs may reflect that linguistic style.
|
| 65 |
+
|
| 66 |
+
---
|
| 67 |
+
|
| 68 |
+
## Citation
|
| 69 |
+
|
| 70 |
+
```bibtex
|
| 71 |
+
@article{fish2025geo,
|
| 72 |
+
title={Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign Language Translation},
|
| 73 |
+
author={Fish, Edward and Bowden, Richard},
|
| 74 |
+
journal={arXiv preprint arXiv:2506.00129},
|
| 75 |
+
year={2025}
|
| 76 |
+
}```
|
| 77 |
+
|