README: put author/affiliation/email on separate lines

12da83a verified 4 days ago

3.5 kB

	---
	license: mit
	library_name: pytorch
	pipeline_tag: robotics
	tags:
	- robotics
	- vision-language-action
	- vla
	- libero
	- manipulation
	- qwen-vl
	---

	# SemanticVLA · LIBERO

	> 🎉 Accepted to [CVPR 2026](https://cvpr.thecvf.com/virtual/2026/poster/39352).
	> ✍️ Fei Ni¹, Zhuo Chen², Yifu Yuan³, Zibin Dong³, Xianze Yao³, Shan Luo², Jianye Hao³, Jiankang Deng¹†, Stefanos Zafeiriou¹†<br>
	> 🏫 ¹Imperial College London    ²King's College London    ³Tianjin University<br>
	> ✉️ Primary contact: [f.ni@imperial.ac.uk](mailto:f.ni@imperial.ac.uk)

	[SemanticVLA](https://github.com/Fei-Ni/SemanticVLA_Offcial) finetuned on the [LIBERO](https://github.com/Lifelong-Robot-Learning/LIBERO) benchmark. The unified OXE LAM is used as the latent-action tokenizer, and the trace + latent-action auxiliary heads are supervised in the VLM's language stream.

	## Headline result

	\| Suite \| Success rate \|
	\|---\|---:\|
	\| `libero_spatial` \| 0.988 \|
	\| `libero_object` \| 0.996 \|
	\| `libero_goal` \| 0.974 \|
	\| `libero_10` \| 0.970 \|
	\| 4-suite mean \| 0.982 \|

	## Architecture

	\| Component \| Choice \|
	\|---\|---\|
	\| VLM backbone \| Qwen3-VL-4B-Instruct \|
	\| Action head \| DiT-B (flow matching) \|
	\| LAM tokenizer \| [`SemanticVLA-LAM`](https://huggingface.co/spikefly/SemanticVLA-LAM) (unified OXE LAM) \|
	\| Semantic supervision \| Trace + latent action tokens predicted in the VLM's language stream; action decoder unmodified \|
	\| Latent vocabulary size \| 32 \|
	\| Latent tokens per sample \| 4 \|
	\| Action horizon \| 8 \|

	## Files

	```
	SemanticVLA-LIBERO/
	├── README.md
	├── config.yaml # loadable model config
	├── dataset_statistics.json # action normalization stats
	└── final_model/
	└── pytorch_model.pt # policy state_dict
	```

	## How to load

	```python
	from semanticvla.model.framework.base_framework import baseframework

	policy = baseframework.from_pretrained("pytorch_model.pt")
	policy.eval()
	```

	`baseframework.from_pretrained()` walks two directory levels up from the checkpoint file to locate `config.yaml` and `dataset_statistics.json`. The released layout follows this convention.

	To run a full LIBERO evaluation, see [`examples/LIBERO/`](https://github.com/Fei-Ni/SemanticVLA_Offcial/tree/main/examples/LIBERO) in the code repo.

	## Sibling SemanticVLA checkpoint repos

	\| Repo \| Purpose \|
	\|---\|---\|
	\| 🤗 [`SemanticVLA-LAM`](https://huggingface.co/spikefly/SemanticVLA-LAM) \| Unified OXE LAM consumed by this policy \|
	\| 🤗 [`SemanticVLA-SimplerEnv`](https://huggingface.co/spikefly/SemanticVLA-SimplerEnv) \| SimplerEnv WidowX policy \|

	## Related resources

	- Code: https://github.com/Fei-Ni/SemanticVLA_Offcial
	- Datasets collection: https://hf.co/collections/spikefly/semanticvla-datasets
	- Model Zoo collection: https://hf.co/collections/spikefly/semanticvla-model-zoo

	## Citation

	```bibtex
	@inproceedings{ni2026semanticvla,
	title = {SemanticVLA: Towards Semantic Reasoning over Action Memorization via Synergistic Explicit Trace and Latent Action Planning},
	author = {Ni, Fei and Chen, Zhuo and Yuan, Yifu and Dong, Zibin and Yao, Xianze and Luo, Shan and Hao, Jianye and Deng, Jiankang and Zafeiriou, Stefanos},
	booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
	year = {2026}
	}
	```

	## License

	Released under the [MIT License](https://github.com/Fei-Ni/SemanticVLA_Offcial/blob/main/LICENSE).