Improve model card and add metadata
#2
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,11 +1,13 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
| 4 |
# PhysBrain-VLA
|
| 5 |
|
| 6 |
PhysBrain 1.0 β Physical Intelligence for Embodied General AI
|
| 7 |
|
| 8 |
-
[
|
| 9 |
|
| 10 |
---
|
| 11 |
|
|
@@ -15,25 +17,43 @@ PhysBrain 1.0 β Physical Intelligence for Embodied General AI
|
|
| 15 |
|
| 16 |
The data engine processes over **3,000 hours** of human video with fine-grained annotations across spatial relationships, action feasibility, and multi-step logical reasoning in real 3D environments. This corpus moves beyond simple action replication to extract the underlying physical laws and commonsense logic embedded in everyday human activity. When injected into a multimodal large model, it successfully elicits **human-like physical intelligence**, enabling the model to *understand physics* rather than merely imitate motions.
|
| 17 |
|
| 18 |
-
The resulting PhysBrain base model achieves **state-of-the-art (SOTA) performance** across multiple authoritative benchmarks in spatial intelligence and embodied interaction.
|
| 19 |
-
|
| 20 |
Built upon this foundation, this repository provides the **Vision-Language-Action (VLA)** model for robot control β the bridge from physical intelligence to real-world robotic applications.
|
| 21 |
|
| 22 |
## Key Technologies
|
| 23 |
|
| 24 |
### PhysBrain Data Engine
|
| 25 |
-
|
| 26 |
-
A scalable pipeline that transforms raw human egocentric video into structured, multimodal embodied training data β annotated with spatial structure, motion feasibility, and causal reasoning chains. This zero-cost approach to **physical commonsense injection** removes the bottleneck of robot-collected data and enables training at unprecedented scale.
|
| 27 |
|
| 28 |
### TwinBrainVLA β Dual-Brain Fusion Architecture
|
| 29 |
-
|
| 30 |
-
A novel architecture that addresses the industry-wide challenge of **catastrophic forgetting** during embodied fine-tuning. By maintaining a parallel general-purpose brain alongside a task-specific embodied brain, TwinBrainVLA achieves **"generalist-specialist fusion"** β retaining broad semantic understanding while efficiently acquiring domain-specific embodied skills.
|
| 31 |
|
| 32 |
### LangForce β Physics-Grounded Training Strategy
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
-
|
| 35 |
|
| 36 |
-
|
| 37 |
|
| 38 |
## Citation
|
| 39 |
|
|
@@ -63,4 +83,4 @@ If you find PhysBrain useful in your research, please consider citing our work:
|
|
| 63 |
|
| 64 |
## License
|
| 65 |
|
| 66 |
-
This project is released under the [Apache 2.0 License](LICENSE).
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: robotics
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
# PhysBrain-VLA
|
| 7 |
|
| 8 |
PhysBrain 1.0 β Physical Intelligence for Embodied General AI
|
| 9 |
|
| 10 |
+
[Paper](https://huggingface.co/papers/2605.15298) β’ [Project Page](https://phys-brain.github.io/) β’ [Code](https://github.com/Phys-Brain/PhysBrain-VLA)
|
| 11 |
|
| 12 |
---
|
| 13 |
|
|
|
|
| 17 |
|
| 18 |
The data engine processes over **3,000 hours** of human video with fine-grained annotations across spatial relationships, action feasibility, and multi-step logical reasoning in real 3D environments. This corpus moves beyond simple action replication to extract the underlying physical laws and commonsense logic embedded in everyday human activity. When injected into a multimodal large model, it successfully elicits **human-like physical intelligence**, enabling the model to *understand physics* rather than merely imitate motions.
|
| 19 |
|
|
|
|
|
|
|
| 20 |
Built upon this foundation, this repository provides the **Vision-Language-Action (VLA)** model for robot control β the bridge from physical intelligence to real-world robotic applications.
|
| 21 |
|
| 22 |
## Key Technologies
|
| 23 |
|
| 24 |
### PhysBrain Data Engine
|
| 25 |
+
A scalable pipeline that transforms raw human egocentric video into structured, multimodal embodied training data β annotated with spatial structure, motion feasibility, and causal reasoning chains.
|
|
|
|
| 26 |
|
| 27 |
### TwinBrainVLA β Dual-Brain Fusion Architecture
|
| 28 |
+
A novel architecture that addresses the industry-wide challenge of **catastrophic forgetting** during embodied fine-tuning. By maintaining a parallel general-purpose brain alongside a task-specific embodied brain, TwinBrainVLA achieves **"generalist-specialist fusion"**.
|
|
|
|
| 29 |
|
| 30 |
### LangForce β Physics-Grounded Training Strategy
|
| 31 |
+
A principled training methodology that breaks the **visual shortcut dilemma** in VLA learning through a Bayesian statistical lens. LangForce fundamentally shifts the training objective from behavioral cloning to **physical commonsense acquisition**.
|
| 32 |
+
|
| 33 |
+
## Open Source Plan
|
| 34 |
+
|
| 35 |
+
All PhysBrain 1.0 VLA model checkpoints are now available. You can find the full collection at [π€ Hugging Face](https://huggingface.co/collections/Phys-Brain/physbrain-10-vla).
|
| 36 |
+
|
| 37 |
+
| Component | Status |
|
| 38 |
+
| ------------------------------------------------------ | ------------ |
|
| 39 |
+
| PhysBrain 1.0 VLA (RoboCasa Fine-Tuned) | β
Available |
|
| 40 |
+
| PhysBrain 1.0 VLA (LIBERO Fine-Tuned) | β
Available |
|
| 41 |
+
| PhysBrain 1.0 VLA (SIMPLER WidowX Robot Fine-Tuned) | β
Available |
|
| 42 |
+
| PhysBrain 1.0 VLA (SIMPLER Google Robot Fine-Tuned) | β
Available |
|
| 43 |
+
| Inference Code | β
Available |
|
| 44 |
+
|
| 45 |
+
## Getting Started
|
| 46 |
+
|
| 47 |
+
PhysBrain-VLA is built on top of the **starVLA** scaffold. To use it, follow these two steps:
|
| 48 |
+
|
| 49 |
+
1. **Copy the framework file** into your starVLA codebase:
|
| 50 |
+
```powershell
|
| 51 |
+
cp physbrain_vla/PhysBrainVLA.py <path-to-starVLA>/starVLA/model/framework/
|
| 52 |
+
```
|
| 53 |
|
| 54 |
+
2. **Load and deploy** the model following the standard starVLA checkpoint loading workflow.
|
| 55 |
|
| 56 |
+
For detailed starVLA setup and inference instructions, please refer to the starVLA repository.
|
| 57 |
|
| 58 |
## Citation
|
| 59 |
|
|
|
|
| 83 |
|
| 84 |
## License
|
| 85 |
|
| 86 |
+
This project is released under the [Apache 2.0 License](LICENSE).
|