pepijn223 HF Staff commited on
Commit
6f0faba
·
verified ·
1 Parent(s): 5f59472

Add generated README

Browse files
Files changed (1) hide show
  1. README.md +93 -0
README.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PI0.5 Pi05 Base (PyTorch, 32-bit floating point)
2
+
3
+ This is a PyTorch version of the PI0.5 pi05_base model, converted from the original JAX/Flax implementation.
4
+
5
+ ## Model Details
6
+
7
+ - **Architecture**: PI0.5 (Vision-Language-Action model with discrete state input)
8
+ - **Model Type**: PI0.5
9
+ - **Domain**: Base model (general purpose)
10
+ - **Precision**: 32-bit floating point (fp32)
11
+ - **Action Dimension**: 32
12
+ - **Action Horizon**: 50
13
+ - **Max Token Length**: 200
14
+ - **Vision Model**: PaliGemma (gemma_2b)
15
+ - **Action Expert**: gemma_300m
16
+
17
+ ## Key Features
18
+
19
+ - **Discrete State Input**: Uses discrete language tokens for state representation
20
+ - **Flow Matching**: Utilizes adaRMSNorm for timestep injection in action expert
21
+ - **Enhanced Action Modeling**: Improved action prediction with flow matching approach
22
+
23
+ ## Conversion Details
24
+
25
+ This model was converted from JAX to PyTorch using the OpenPI conversion script:
26
+
27
+ ```bash
28
+ python examples/convert_jax_model_to_pytorch.py \
29
+ --checkpoint_dir /fsx/pepijn/pi05_base \
30
+ --config_name pi05_base \
31
+ --output_path /fsx/pepijn/pi05_base/pytorch/fp32/ \
32
+ --precision float32
33
+ ```
34
+
35
+ **Conversion Date**: 2025-09-09
36
+
37
+ ## Usage
38
+
39
+ ```python
40
+ from openpi.models_pytorch.pi0_pytorch import PI0Pytorch
41
+ import torch
42
+
43
+ # Load the model
44
+ model = PI0Pytorch.from_pretrained("pepijn223/pi05_base_fp32")
45
+
46
+ # The model expects inputs in the format:
47
+ # - images: torch.Tensor of shape [batch, height, width, channels]
48
+ # - text: tokenized text prompts
49
+ # - proprioceptive_state: robot state information (if applicable)
50
+ ```
51
+
52
+ ## Model Architecture
53
+
54
+ The model consists of:
55
+ 1. **Vision Encoder**: PaliGemma-based vision processing
56
+ 2. **Language Encoder**: Text prompt understanding
57
+ 3. **Action Expert**: Specialized network for action prediction
58
+ 4. **Integration Layer**: Combines multimodal information for action output
59
+
60
+ ## Training Data
61
+
62
+ This model was trained on robotics datasets appropriate for its domain:
63
+ - **DROID models**: Trained on diverse robot manipulation data
64
+ - **ALOHA models**: Trained on bimanual manipulation tasks
65
+ - **LIBERO models**: Trained on diverse tabletop manipulation scenarios
66
+ - **Base models**: Trained on general robotics datasets
67
+
68
+ ## Limitations
69
+
70
+ - Model performance depends on similarity between deployment and training environments
71
+ - May require domain-specific fine-tuning for optimal performance
72
+ - Action space must match the trained action dimension (32)
73
+
74
+ ## Citation
75
+
76
+ If you use this model, please cite the original OpenPI work:
77
+
78
+ ```bibtex
79
+ @article{openpi2024,
80
+ title={Open-World Robotic Manipulation with Vision-Language-Action Models},
81
+ author={Physical Intelligence},
82
+ year={2024},
83
+ url={https://github.com/Physical-Intelligence/openpi}
84
+ }
85
+ ```
86
+
87
+ ## Original Repository
88
+
89
+ [OpenPI GitHub Repository](https://github.com/Physical-Intelligence/openpi)
90
+
91
+ ## License
92
+
93
+ This model follows the same license as the original OpenPI repository.