Improve model card: add metadata, paper, project, and code links

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +48 -3
README.md CHANGED
@@ -1,3 +1,48 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: robotics
4
+ library_name: transformers
5
+ ---
6
+
7
+ # Rethinking Visual-Language-Action Model Scaling: Alignment, Mixture, and Regularization
8
+
9
+ This repository contains the weights for the Vision-Language-Action (VLA) models presented in the paper [Rethinking Visual-Language-Action Model Scaling: Alignment, Mixture, and Regularization](https://huggingface.co/papers/2602.09722).
10
+
11
+ [**Project Website**](https://research.beingbeyond.com/rethink_vla) | [**GitHub Repository**](https://github.com/BeingBeyond/Rethink_VLA)
12
+
13
+ ## Summary
14
+
15
+ This work presents a systematic and controlled study of Vision-Language-Action (VLA) model scaling, aiming to clarify whether standard data scaling recipes apply to robotics given the inherent heterogeneity of training data across embodiments, sensors, and action spaces.
16
+
17
+ The analysis targets three key dimensions of VLA scaling:
18
+ 1. **Physical alignment**: A unified end-effector (EEF)-relative action representation is critical for robust cross-embodiment transfer.
19
+ 2. **Embodiment mixture**: Naively pooling heterogeneous robot datasets often leads to negative transfer, highlighting the challenges of indiscriminate data scaling.
20
+ 3. **Training regularization**: Intuitive strategies like sensory dropout and multi-stage fine-tuning do not consistently improve performance at scale.
21
+
22
+ ## Usage
23
+
24
+ Please refer to the [GitHub Repository](https://github.com/BeingBeyond/Rethink_VLA) for detailed instructions on pre-training, post-training, and evaluation using benchmarks such as LIBERO and RoboCasa.
25
+
26
+ ## Citation
27
+
28
+ If you find this work useful, please cite it as:
29
+
30
+ ```bibtex
31
+ @article{rethinkvla2025,
32
+ title={Rethinking Visual-Language-Action Model Scaling: Alignment, Mixture, and Regularization},
33
+ author={Anonymous Authors},
34
+ journal={arXiv preprint arXiv:2602.09722},
35
+ year={2025}
36
+ }
37
+ ```
38
+
39
+ ## Acknowledgments
40
+
41
+ We thank the authors of the following projects for their contributions to the robotics and machine learning communities:
42
+
43
+ * [BeingH0.5](https://github.com/BeingBeyond/Being-H): VLA framework
44
+ * [InternVL](https://github.com/OpenGVLab/InternVL): Vision-Language model backbone
45
+ * [Bagel](https://github.com/ByteDance-Seed/Bagel): Training framework
46
+ * [Qwen](https://github.com/QwenLM/Qwen): Language model
47
+ * [LIBERO](https://github.com/Lifelong-Robot-Learning/LIBERO): Benchmark for lifelong robot learning
48
+ * [RoboCasa](https://github.com/robocasa/robocasa): Large-scale simulation benchmark for everyday tasks