Spaces:
Runtime error
Runtime error
| title: VoxPoserExamples | |
| emoji: 🔥 | |
| colorFrom: pink | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: 3.40.1 | |
| app_file: app.py | |
| pinned: false | |
| # VoxPoser API Examples | |
| ## Usage | |
| ```bash | |
| python3 app.py | |
| ``` | |
| 1. 在界面中填写OpenAI API Key,使用的代理地址,选择需要的configuration | |
| 2. 点击Setup/Reset Simulation | |
| 3. 填写自定义Instruction | |
| 4. 点击Run执行(需要等待较长时间) | |
| ## Example | |
| ### VLM & Perception | |
| 1. Open Vocab object detection [owlvit](https://huggingface.co/docs/transformers/model_doc/owlvit) | |
| 2. [SAM](https://github.com/facebookresearch/segment-anything) | |
| 3. Object mask tracking [XMem](https://github.com/hkchengrex/XMem) | |
| 4. 使用realsense获得深度图 | |
| 5. 使用深度图获得法向量(抓取位姿) | |
| 可替代性: | |
| - [x] owlvit -> Grounded SAM / YOLO | |
| - [x] SAM -> FastSAM / YOLO-seg | |
| - [ ] XMem -> DeepSORT(?) ByteTrack(?) | |
| ### LMP语言模型编程 | |
| 语言模型编程:使用GPT-4 | |
| VoxPoser需要三大类LMP: | |
| 1. Planner | |
| 2. Composer | |
| 3. Value map generator | |
| 可替代性: | |
| - [ ] GPT-4 -> LLaMA2 (?) | |
| ## LMPs | |
| ### Planner | |
| LMP的输出是一系列的编程模型接口,Planner将这些语言描述转化为一系列高层级的规划,每步规划这些动作将被Composer执行。 | |
| 模拟环境中不使用规划器,因为评估的任务由单个操作阶段组成。 | |
| ### Composer | |
| Composer LMP 从依次逐渐调用如下模组: | |
| 1. 感知模组调用获得感知结果 | |
| 2. [optional] Affordance LMP | |
| 3. [optional] Avoidance LMP | |
| 4. [optional] End Effector Velocity LMP | |
| 5. [optional] End Effector Rotation LMP | |
| 6. [optional] Gripper Action LMP | |
| 7. Execute | |
| ### Value Maps | |
| TODO | |
| ### Execution | |
| 1. Motion Planner: 贪心搜索得到一系列末端位姿,仅适用Affordance Map 和 Avoidance Map | |
| 2. Cost map: $W = -2 * \text{norm}(\text{Affordance}) - \text{norm}(\text{Avoidance})$ | |
| 3. 根据离开/接近,调用目标法向量的正/负值方向上的Affordance Map | |
| 4. 根据避障目标的占据栅格occupancy_map,调整Avoidance Map |