Add pipeline tag: video-text-to-text

This PR enhances the model card by adding the `pipeline_tag: video-text-to-text` to the metadata. This tag accurately categorizes the model's functionality of translating sign language (from visual input, specifically pose data) into text, improving its discoverability on the Hugging Face Hub.

The existing `library_name: transformers` and `license: mit` fields are retained as they are validated by the GitHub repository's content. No sample usage section is added, adhering to the guidelines that require explicit code snippets from the original repository's README for inference.

Files changed (1) hide show

README.md +20 -20

README.md CHANGED Viewed

@@ -1,20 +1,21 @@
 ---
 library_name: transformers
 license: mit
 model_name: Geo-Sign (Hyperbolic-Token)
-paperswithcode_id: geo-sign-hyperbolic-contrastive-regularisation
 tags:
-  - sign-language-translation
-  - skeleton-based
-  - hyperbolic-geometry
-  - mT5
-datasets:
-  - CSL-Daily
-  - CSL-News
-language:
-  - zh
 task:
-  - sign-language-translation
 ---
 # Geo-Sign 🌐✋ → 📝
@@ -43,8 +44,8 @@ Geo-Sign projects pose-based sign-language features into a learnable **Poincaré
 Compared with the strong Uni-Sign pose baseline, Geo-Sign boosts BLEU-4 by **+1.81** and ROUGE-L by **+3.03** on the CSL-Daily benchmark while keeping privacy-friendly skeletal inputs only.
 ## Intended Uses & Scope
-* **Primary** – Sign-language-to-text translation research, especially for resource-constrained or privacy-sensitive settings where RGB video is unavailable.
-* **Out-of-scope** – Real-time production deployments without reliable pose estimation, medical or legal interpretations, or languages beyond datasets the model was trained on.
 ## Evaluation
@@ -58,11 +59,11 @@ Geo-Sign outperforms all previous gloss-free pose-only methods and rivals many R
 ## Limitations & Ethical Considerations
-* **Pose-estimation dependency** – Errors in upstream key-points propagate to the translation.
-* **Training latency** – Hyperbolic operations slow training (~4–6 ×) but add **no** cost at inference.
-* **Generalisation** – Evaluated only on Chinese Sign Language; other sign languages are not guaranteed.
-* **Mis-translation risk** – Automatic SLT can mis-communicate; keep a human in the loop for critical use cases.
-* **Biases** – CSL-Daily is domain-specific (news/TV); outputs may reflect that linguistic style.
 ---
@@ -74,5 +75,4 @@ Geo-Sign outperforms all previous gloss-free pose-only methods and rivals many R
   author={Fish, Edward and Bowden, Richard},
   journal={arXiv preprint arXiv:2506.00129},
   year={2025}
-}```

 ---
+datasets:
+- CSL-Daily
+- CSL-News
+language:
+- zh
 library_name: transformers
 license: mit
 model_name: Geo-Sign (Hyperbolic-Token)
 tags:
+- sign-language-translation
+- skeleton-based
+- hyperbolic-geometry
+- mT5
+paperswithcode_id: geo-sign-hyperbolic-contrastive-regularisation
 task:
+- sign-language-translation
+pipeline_tag: video-text-to-text
 ---
 # Geo-Sign 🌐✋ → 📝
 Compared with the strong Uni-Sign pose baseline, Geo-Sign boosts BLEU-4 by **+1.81** and ROUGE-L by **+3.03** on the CSL-Daily benchmark while keeping privacy-friendly skeletal inputs only.
 ## Intended Uses & Scope
+*   **Primary** – Sign-language-to-text translation research, especially for resource-constrained or privacy-sensitive settings where RGB video is unavailable.
+*   **Out-of-scope** – Real-time production deployments without reliable pose estimation, medical or legal interpretations, or languages beyond datasets the model was trained on.
 ## Evaluation
 ## Limitations & Ethical Considerations
+*   **Pose-estimation dependency** – Errors in upstream key-points propagate to the translation.
+*   **Training latency** – Hyperbolic operations slow training (~4–6 ×) but add **no** cost at inference.
+*   **Generalisation** – Evaluated only on Chinese Sign Language; other sign languages are not guaranteed.
+*   **Mis-translation risk** – Automatic SLT can mis-communicate; keep a human in the loop for critical use cases.
+*   **Biases** – CSL-Daily is domain-specific (news/TV); outputs may reflect that linguistic style.
 ---
   author={Fish, Edward and Bowden, Richard},
   journal={arXiv preprint arXiv:2506.00129},
   year={2025}
+}```