jj701 commited on
Commit
784db84
Β·
verified Β·
1 Parent(s): 049325a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +253 -7
README.md CHANGED
@@ -1,10 +1,256 @@
1
  ---
2
- title: README
3
- emoji: πŸš€
4
- colorFrom: purple
5
- colorTo: indigo
6
- sdk: static
7
- pinned: false
 
 
 
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ emoji: πŸ€–
3
+ license: mit
4
+ model-cards:
5
+ - asgard-robot/groot-potato-inference
6
+ - asgard-robot/groot-condiment-handover
7
+ datasets:
8
+ - asgard-robot/asgard_training_data_potato
9
+ - asgard-robot/asgard_training_data_condiment
10
+ pipeline_tag: robotics
11
  ---
12
 
13
+ # ASGARD Robot πŸ€–
14
+
15
+ **Creating Intelligent Home Assistant Robots for Human-Robot Interaction**
16
+
17
+ ASGARD (Autonomous Service Generation for Advanced Robot Deployment) is a research and development initiative focused on creating practical home assistant robots capable of safely interacting with humans in domestic environments.
18
+
19
+ ---
20
+
21
+ ## 🎯 Mission
22
+
23
+ To develop autonomous robots that can:
24
+ - **Handle everyday household tasks** safely and reliably
25
+ - **Interact naturally** with humans in home environments
26
+ - **Hand over objects** to humans with proper coordination and social awareness
27
+ - **Adapt to diverse home environments** and situations
28
+
29
+ ---
30
+
31
+ ## 🏠 Focus Areas
32
+
33
+ ### 1. Home Environment Manipulation
34
+ Our robots are designed to handle common household objects:
35
+ - Food items (potatoes, condiments, containers)
36
+ - Daily use objects (cups, utensils, small tools)
37
+ - Delicate items requiring careful handling
38
+
39
+ ### 2. Human-Robot Handover
40
+ Developing sophisticated coordination for:
41
+ - **Gesture Recognition**: Understanding when and how humans want to receive objects
42
+ - **Force Feedback**: Proper force control during handover to prevent accidents
43
+ - **Timing Coordination**: Synchronizing robot and human movements
44
+ - **Social Awareness**: Reading human intent and nonverbal cues
45
+
46
+ ### 3. Multi-Modal Understanding
47
+ Our robots integrate:
48
+ - **Vision**: Dual camera systems (wrist + external) for comprehensive scene understanding
49
+ - **Touch**: Force/torque feedback for delicate manipulation
50
+ - **Language**: Natural language understanding for task specification
51
+ - **Context**: Awareness of household context and social norms
52
+
53
+ ---
54
+
55
+ ## πŸ“Š Current Models
56
+
57
+ ### Trained GROOT Models
58
+
59
+ #### 1. Potato Manipulation Model
60
+ - **Model:** [groot-potato-inference](https://huggingface.co/asgard-robot/groot-potato-inference)
61
+ - **Task:** Potato handling and cleaning in kitchen environments
62
+ - **Checkpoint:** Step 2000
63
+ - **Base Model:** NVIDIA GR00T N1.5-3B
64
+ - **Robot:** ASGARD so101_follower (single-arm 6 DOF)
65
+ - **Performance:** 99.53% loss reduction from initial training
66
+ - **Dataset:** 40 episodes, 30,795 frames
67
+
68
+ #### 2. Condiment Handover Model
69
+ - **Model:** [groot-condiment-handover](https://huggingface.co/asgard-robot/groot-condiment-handover)
70
+ - **Task:** Condiment bottle handling and handover to humans
71
+ - **Checkpoint:** Step 2000
72
+ - **Base Model:** NVIDIA GR00T N1.5-3B
73
+ - **Robot:** ASGARD so101_follower (single-arm 6 DOF)
74
+ - **Dataset:** 40 episodes, 31,522 frames
75
+ - **Focus:** Human-robot coordination for object handover
76
+
77
+ ---
78
+
79
+ ## πŸ—‚οΈ Datasets
80
+
81
+ ### Training Datasets
82
+
83
+ #### 1. Potato Training Data
84
+ - **Dataset:** [asgard_training_data_potato](https://huggingface.co/datasets/asgard-robot/asgard_training_data_potato)
85
+ - **Type:** LeRobot v3.0 format
86
+ - **Episodes:** 40 demonstrations
87
+ - **Frames:** 30,795 (avg 770 per episode)
88
+ - **Duration:** ~26 seconds per episode at 30 FPS
89
+ - **Modalities:**
90
+ - Dual RGB cameras (wrist + realsense)
91
+ - 6 DOF joint positions
92
+ - Force feedback
93
+ - **Task:** Potato manipulation and cleaning
94
+
95
+ #### 2. Condiment Training Data
96
+ - **Dataset:** [asgard_training_data_condiment](https://huggingface.co/datasets/asgard-robot/asgard_training_data_condiment)
97
+ - **Type:** LeRobot v3.0 format
98
+ - **Episodes:** 40 demonstrations
99
+ - **Frames:** 31,522 (avg 788 per episode)
100
+ - **Duration:** ~26 seconds per episode at 30 FPS
101
+ - **Modalities:**
102
+ - Dual RGB cameras (wrist + realsense)
103
+ - 6 DOF joint positions
104
+ - Force feedback
105
+ - **Task:** Condiment handling and human handover
106
+
107
+ ---
108
+
109
+ ## πŸ€– Robot Platform
110
+
111
+ ### ASGARD so101_follower
112
+ - **Type:** Single-arm manipulator
113
+ - **Degrees of Freedom:** 6 (shoulder_pan, shoulder_lift, elbow_flex, wrist_flex, wrist_roll, gripper)
114
+ - **Sensors:**
115
+ - Wrist-mounted RGB camera (640Γ—480)
116
+ - External RGB camera (640Γ—480)
117
+ - Force/torque sensors
118
+ - Joint position encoders
119
+ - **Capabilities:**
120
+ - Precise object manipulation
121
+ - Force-controlled grasping
122
+ - Human-safe operation
123
+ - Real-time perception
124
+
125
+ ---
126
+
127
+ ## 🧠 Technology Stack
128
+
129
+ ### Base Models
130
+ - **NVIDIA GR00T N1.5-3B**: Foundation model for robot manipulation
131
+ - Generalist robot foundation model
132
+ - Trained on diverse manipulation tasks
133
+ - Multi-modal understanding (vision + language + actions)
134
+ - Flow matching for continuous action generation
135
+
136
+ ### Training Framework
137
+ - **LeRobot**: PyTorch-based robotics framework
138
+ - ASGARD teleop control branch
139
+ - GROOT policy support
140
+ - Dataset format v3.0
141
+ - Multi-GPU training with Hugging Face Accelerate
142
+
143
+ ### Hardware
144
+ - **Training:** 4Γ— NVIDIA H100 PCIe GPUs (80GB VRAM each)
145
+ - **Inference:** Optimized for edge deployment
146
+ - **Compute:** 320GB total VRAM for full fine-tuning
147
+
148
+ ---
149
+
150
+ ## πŸ”¬ Research Goals
151
+
152
+ ### Short-Term
153
+ 1. **Robust Manipulation**: Reliable handling of diverse household objects
154
+ 2. **Safe Handover**: Zero accidents in human-robot handover scenarios
155
+ 3. **Context Awareness**: Understanding household context and social norms
156
+ 4. **Adaptation**: Quick adaptation to new objects and scenarios
157
+
158
+ ### Long-Term
159
+ 1. **General Household Assistance**: Cooking, cleaning, organization
160
+ 2. **Human-Robot Collaboration**: Seamless teamwork with humans
161
+ 3. **Learning from Demonstration**: Improved generalization from limited data
162
+ 4. **Real-Time Adaptation**: Dynamic adjustment to unexpected situations
163
+
164
+ ---
165
+
166
+ ## πŸ—οΈ Architecture
167
+
168
+ ### Model Architecture
169
+ Our models are fine-tuned from GR00T N1.5-3B:
170
+ - **Frozen Components:**
171
+ - Vision encoder (preserves visual understanding)
172
+ - LLM (maintains language understanding)
173
+ - **Trainable Components:**
174
+ - Diffusion transformer (action generation)
175
+ - Projector (vision-language β†’ action mapping)
176
+
177
+ ### Training Strategy
178
+ - **Full Fine-Tuning**: All trainable parameters updated
179
+ - **Batch Size:** 512 (128 per GPU Γ— 4 GPUs)
180
+ - **Training Steps:** 2,000 per task
181
+ - **Approx. Epochs:** ~33 (potato) / ~32 (condiment)
182
+ - **Learning Rate:** 1e-4 with warmup
183
+ - **Precision:** bf16 mixed precision
184
+
185
+ ---
186
+
187
+ ## πŸ“ˆ Performance
188
+
189
+ ### Training Results
190
+ Both models show excellent convergence:
191
+ - **Loss Reduction:** 99%+ from initial to final
192
+ - **Stability:** No overfitting observed
193
+ - **Convergence:** Achieved around steps 1200-1600
194
+ - **Final Loss:** ~0.006 (from initial ~1.2)
195
+
196
+ ### Metrics
197
+ - **Training Time:** ~2 hours per model
198
+ - **Memory Usage:** 60-70GB per GPU
199
+ - **Throughput:** 2-3 samples/second per GPU
200
+ - **Checkpoints:** 5 saved per training run (steps 400, 800, 1200, 1600, 2000)
201
+
202
+ ---
203
+
204
+ ## 🀝 Contributing
205
+
206
+ We welcome contributions in:
207
+ - Additional household task datasets
208
+ - Improved handover algorithms
209
+ - Multi-robot coordination
210
+ - Human behavior modeling
211
+ - Safety protocols
212
+
213
+ ---
214
+
215
+ ## πŸ“š Citations
216
+
217
+ If you use our models or datasets, please cite:
218
+
219
+ ```bibtex
220
+ @organization{asgard_robot_2024,
221
+ title={ASGARD Robot: Home Assistant Robot for Human-Robot Interaction},
222
+ author={ASGARD Team},
223
+ year={2024},
224
+ url={https://huggingface.co/asgard-robot},
225
+ models={groot-potato-inference, groot-condiment-handover},
226
+ datasets={asgard_training_data_potato, asgard_training_data_condiment}
227
+ }
228
+ ```
229
+
230
+ ---
231
+
232
+ ## πŸ“ž Contact
233
+
234
+ - **Organization:** [asgard-robot](https://huggingface.co/asgard-robot)
235
+ - **Models:** https://huggingface.co/asgard-robot
236
+ - **Datasets:** https://huggingface.co/asgard-robot
237
+
238
+ ---
239
+
240
+ ## πŸŽ–οΈ Acknowledgments
241
+
242
+ - **Base Model:** NVIDIA GR00T N1.5-3B
243
+ - **Framework:** LeRobot (Hugging Face)
244
+ - **Hardware:** Shadeform H100 Multi-GPU Cluster
245
+ - **Research:** ASGARD Team
246
+
247
+ ---
248
+
249
+ ## 🌟 Vision
250
+
251
+ We envision a future where robots seamlessly integrate into home environments, assisting humans with daily tasks while maintaining the highest standards of safety, reliability, and social awareness. Our work focuses on practical applications that can improve quality of life and enable independent living.
252
+
253
+ ---
254
+
255
+ **Building the future of home robotics, one handover at a time.** πŸ€–β€οΈ
256
+