Improve model card: Add pipeline tag, library name, and links

This PR significantly improves the model card for `MomaGraph-R1` by making the following updates:

- Adds `pipeline_tag: image-text-to-text` to the metadata, enabling better discoverability and the inference widget on the Hugging Face Hub.
- Adds `library_name: transformers` to the metadata, as indicated by the presence of `transformers_version` and `Qwen2_5_VLForConditionalGeneration` in `config.json`. This will enable the automated "how to use" code snippet.
- Adds a comprehensive model description based on the paper abstract.
- Includes links to the official paper, project page, and GitHub repository in the model card content.
- Adds a "Usage" section that directs users to the GitHub repository for code examples and a "Citation" section.

Please review and merge if these improvements align with the project goals.

Files changed (1) hide show

README.md +36 -3

README.md CHANGED Viewed

@@ -1,3 +1,36 @@
----
-license: mit
----

+---
+license: mit
+pipeline_tag: image-text-to-text
+library_name: transformers
+---
+# MomaGraph-R1
+This repository contains **MomaGraph-R1**, a 7B vision-language model presented in the paper [MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning](https://huggingface.co/papers/2512.16909).
+MomaGraph-R1 introduces a unified scene representation for embodied agents, integrating spatial-functional relationships and part-level interactive elements to address the needs of mobile manipulators in household environments. Trained with reinforcement learning on the MomaGraph-Scenes dataset, MomaGraph-R1 predicts task-oriented scene graphs and serves as a zero-shot task planner under a Graph-then-Plan framework.
+It achieves state-of-the-art results among open-source models, reaching 71.6% accuracy on the MomaGraph-Bench benchmark (+11.4% over the best baseline), while generalizing across public benchmarks and transferring effectively to real-robot experiments.
+*   **Project Page:** https://hybridrobotics.github.io/MomaGraph/
+*   **Code:** https://github.com/HybridRobotics/MomaGraph
+## Usage
+This model is compatible with the Hugging Face `transformers` library. For detailed usage instructions and code examples, please refer to the official [GitHub repository](https://github.com/HybridRobotics/MomaGraph).
+## Citation
+If you find our work helpful or inspiring, please consider citing the paper:
+```bibtex
+@misc{ju2025momagraph,
+      title={MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning},
+      author={Yuanchen Ju and Yongyuan Liang and Yen-Jen Wang and Nandiraju Gireesh and Yuanliang Ju and Seungjae Lee and Qiao Gu and Elvis Hsieh and Furong Huang and Koushil Sreenath},
+      year={2025},
+      eprint={2512.16909},
+      archivePrefix={arXiv},
+      primaryClass={cs.RO},
+      url={https://arxiv.org/abs/2512.16909},
+}
+```