Video-Text-to-Text
robotic-manipulation
reinforcement-learning
chain-of-thought