MomaGraph-R1 / README.md

Improve model card: Add pipeline tag, library name, and links

79c747f verified about 1 month ago

1.94 kB

	---
	license: mit
	pipeline_tag: image-text-to-text
	library_name: transformers
	---

	# MomaGraph-R1

	This repository contains MomaGraph-R1, a 7B vision-language model presented in the paper [MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning](https://huggingface.co/papers/2512.16909).

	MomaGraph-R1 introduces a unified scene representation for embodied agents, integrating spatial-functional relationships and part-level interactive elements to address the needs of mobile manipulators in household environments. Trained with reinforcement learning on the MomaGraph-Scenes dataset, MomaGraph-R1 predicts task-oriented scene graphs and serves as a zero-shot task planner under a Graph-then-Plan framework.

	It achieves state-of-the-art results among open-source models, reaching 71.6% accuracy on the MomaGraph-Bench benchmark (+11.4% over the best baseline), while generalizing across public benchmarks and transferring effectively to real-robot experiments.

	* Project Page: https://hybridrobotics.github.io/MomaGraph/
	* Code: https://github.com/HybridRobotics/MomaGraph

	## Usage

	This model is compatible with the Hugging Face `transformers` library. For detailed usage instructions and code examples, please refer to the official [GitHub repository](https://github.com/HybridRobotics/MomaGraph).

	## Citation

	If you find our work helpful or inspiring, please consider citing the paper:

	```bibtex
	@misc{ju2025momagraph,
	title={MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning},
	author={Yuanchen Ju and Yongyuan Liang and Yen-Jen Wang and Nandiraju Gireesh and Yuanliang Ju and Seungjae Lee and Qiao Gu and Elvis Hsieh and Furong Huang and Koushil Sreenath},
	year={2025},
	eprint={2512.16909},
	archivePrefix={arXiv},
	primaryClass={cs.RO},
	url={https://arxiv.org/abs/2512.16909},
	}
	```