Hello, amazing robotics people ๐ ๐ ๐ We have FINALLY delivered on your major request! Ark just got a major upgrade:
Weโve now integrated Vision-Language-Action Models (VLAs) into Ark ๐ VLAs = models that connect vision + language โ robot actions (see image)
What does this mean?
๐ฃ๏ธ Give robots natural language instructions โ they act ๐ Combine perception + language for real-world control ๐ฆพ Powered by pi0 pretrained models for fast prototyping โก Supports easy data collection and fine-tuning within Ark within a couple of lines of code
Next, we plan to go into the world of designing worlds ๐ Who knows, maybe those video models are actually zero-shot learners and reasoners?