| # 批量视频标注说明文档 |
|
|
| ## 功能概述 |
|
|
| `infer_caption_batch.py` 是基于 `infer_caption_v0.py` 的批量处理版本,支持: |
|
|
| - ✅ 自动查找符合条件的视频文件 |
| - ✅ 批量推理和标注 |
| - ✅ 灵活的处理范围配置 |
| - ✅ 结果保存为结构化 JSON |
| - ✅ 进度显示和错误处理 |
| - ✅ 自动提取元信息(数据源、任务描述等) |
|
|
| ## 视频筛选规则 |
|
|
| 从 `/playpen-ssd/dataset/droid_raw/1.0.1/` 中查找视频: |
|
|
| ``` |
| 数据集结构: |
| /playpen-ssd/dataset/droid_raw/1.0.1/ |
| ├── AUTOLab/ |
| │ ├── failure/ |
| │ │ └── 2023-07-12/ |
| │ │ └── Wed_Jul_12_12:25:25_2023/ |
| │ │ ├── recordings/ |
| │ │ │ └── MP4/ |
| │ │ │ ├── 22008760.mp4 ✓ 匹配(以2开头) |
| │ │ │ ├── 24400334.mp4 ✓ 匹配 |
| │ │ │ ├── 18026681.mp4 ✗ 不匹配(以1开头) |
| │ │ │ └── 22008760-stereo.mp4 ✗ 不匹配(包含stereo) |
| │ │ └── metadata_*.json |
| │ └── success/ |
| │ └── ... |
| ├── CLVR/ |
| ├── GuptaLab/ |
| └── ... |
| ``` |
|
|
| **筛选条件**: |
| 1. 只处理 `failure` 文件夹(可配置为 `success` 或 `both`) |
| 2. 文件名以数字 `2` 开头 |
| 3. 不包含 `stereo` 字符 |
| 4. 文件扩展名为 `.mp4` |
|
|
| ## 配置参数 |
|
|
| 在 `infer_caption_batch.py` 顶部修改配置: |
|
|
| ```python |
| # ========== 数据集配置 ========== |
| ROOT = Path("/playpen-ssd/dataset/droid_raw/1.0.1") |
| OUTPUT_FILE = "./output/caption_results_batch.json" |
| |
| # ========== 模型配置 ========== |
| MODEL_NAME = 'Qwen/Qwen3-VL-8B-Instruct' |
| MAX_BATCH_SIZE = 2 |
| MAX_TOKENS = 1024 |
| TEMPERATURE = 0 |
| |
| # ========== 批处理配置 ========== |
| START_INDEX = 0 # 从第几个视频开始(0表示从头开始) |
| MAX_VIDEOS = 10 # 最多处理多少个(None表示全部) |
| PROCESS_BATCH = 1 # 每次推理几个视频(受GPU内存限制) |
| |
| # ========== 视频筛选规则 ========== |
| PROCESS_TYPE = "failure" # "failure" | "success" | "both" |
| VIDEO_NAME_PATTERN = "2*.mp4" # 文件名模式 |
| EXCLUDE_STEREO = True # 是否排除stereo文件 |
| ``` |
|
|
| ### 参数说明 |
|
|
| | 参数 | 说明 | 示例 | |
| |-----|------|-----| |
| | `START_INDEX` | 起始索引(从第几个开始) | `0`(从头), `100`(跳过前100个) | |
| | `MAX_VIDEOS` | 最多处理多少个 | `10`(处理10个), `None`(全部) | |
| | `PROCESS_BATCH` | 每次推理几个视频 | `1`(逐个处理,稳定), `2`(批量,更快但需要更多内存) | |
| | `PROCESS_TYPE` | 处理哪种类型 | `"failure"`, `"success"`, `"both"` | |
| | `VIDEO_NAME_PATTERN` | 文件名匹配模式 | `"2*.mp4"`, `"*.mp4"` | |
|
|
| ## 使用方法 |
|
|
| ### 1. 基础使用 |
|
|
| ```bash |
| cd /home/jqliu/projects/RewardModel/caption |
| |
| # 处理前10个failure视频 |
| python infer_caption_batch.py |
| ``` |
|
|
| ### 2. 处理指定范围 |
|
|
| ```python |
| # 编辑配置 |
| START_INDEX = 50 # 从第50个开始 |
| MAX_VIDEOS = 20 # 处理20个 |
| ``` |
|
|
| ```bash |
| python infer_caption_batch.py |
| ``` |
|
|
| 这样会处理第 50-69 个视频(共20个)。 |
|
|
| ### 3. 处理全部视频 |
|
|
| ```python |
| START_INDEX = 0 |
| MAX_VIDEOS = None # 处理全部 |
| ``` |
|
|
| ### 4. 处理 success 案例 |
|
|
| ```python |
| PROCESS_TYPE = "success" |
| ``` |
|
|
| ### 5. 处理所有视角 |
|
|
| ```python |
| VIDEO_NAME_PATTERN = "*.mp4" # 不限制文件名 |
| EXCLUDE_STEREO = False # 包含stereo视频 |
| ``` |
|
|
| ## 输出格式 |
|
|
| 输出为 JSON 文件,结构如下: |
|
|
| ```json |
| { |
| "config": { |
| "root": "/playpen-ssd/dataset/droid_raw/1.0.1", |
| "process_type": "failure", |
| "pattern": "2*.mp4", |
| "exclude_stereo": true, |
| "start_index": 0, |
| "max_videos": 10, |
| "model": "Qwen/Qwen3-VL-8B-Instruct", |
| "total_videos_found": 5432, |
| "videos_processed": 10 |
| }, |
| "results": [ |
| { |
| "index": 0, |
| "metadata": { |
| "video_path": "/playpen-ssd/.../22008760.mp4", |
| "video_name": "22008760.mp4", |
| "source": "AUTOLab", |
| "task_type": "failure", |
| "date": "2023-07-12", |
| "task_description": "Move object into or out of container", |
| "metadata_path": "/playpen-ssd/.../metadata_*.json" |
| }, |
| "caption": [ |
| { |
| "stage": 0, |
| "stage_name": "reach", |
| "start": 0, |
| "end": 45, |
| "caption": "Robot arm extends toward the container", |
| "reason": "Arm motion indicates reaching phase" |
| }, |
| { |
| "stage": 1, |
| "stage_name": "grasp", |
| "start": 46, |
| "end": 78, |
| "caption": "Gripper closes around the object", |
| "reason": "Visible gripper closure and contact with object" |
| }, |
| { |
| "task_success": 0, |
| "reason": "Object slipped from gripper during lift phase" |
| } |
| ], |
| "raw_caption": "...", |
| "timestamp": "2025-12-10T12:34:56" |
| }, |
| ... |
| ], |
| "timestamp": "2025-12-10T12:35:00" |
| } |
| ``` |
|
|
| ### 字段说明 |
|
|
| - `config`: 处理配置信息 |
| - `results`: 所有视频的标注结果列表 |
| - `index`: 全局索引 |
| - `metadata`: 视频元信息 |
| - `video_path`: 完整路径 |
| - `source`: 数据源(如 AUTOLab) |
| - `task_description`: 任务描述(从metadata JSON提取) |
| - `caption`: 解析后的标注结果(JSON格式) |
| - `raw_caption`: 模型原始输出 |
| - `timestamp`: 处理时间 |
|
|
| ## 查看结果 |
|
|
| ### 1. 使用 jq 查看 |
|
|
| ```bash |
| # 查看配置 |
| cat output/caption_results_batch.json | jq '.config' |
| |
| # 查看处理的视频总数 |
| cat output/caption_results_batch.json | jq '.results | length' |
| |
| # 查看第一个结果 |
| cat output/caption_results_batch.json | jq '.results[0]' |
| |
| # 查看所有任务成功/失败情况 |
| cat output/caption_results_batch.json | jq '.results[].caption[-1].task_success' |
| |
| # 统计成功率 |
| cat output/caption_results_batch.json | jq '[.results[].caption[-1].task_success] | add / length' |
| ``` |
|
|
| ### 2. 使用 Python 分析 |
|
|
| ```python |
| import json |
| |
| with open('output/caption_results_batch.json', 'r') as f: |
| data = json.load(f) |
| |
| # 统计成功/失败 |
| success = sum(1 for r in data['results'] |
| if r['caption'] and r['caption'][-1].get('task_success') == 1) |
| failure = sum(1 for r in data['results'] |
| if r['caption'] and r['caption'][-1].get('task_success') == 0) |
| |
| print(f"Success: {success}, Failure: {failure}") |
| |
| # 查看失败原因 |
| for r in data['results']: |
| if r['caption'] and r['caption'][-1].get('task_success') == 0: |
| reason = r['caption'][-1].get('reason') |
| print(f"Failure: {reason}") |
| ``` |
|
|
| ## 分批处理策略 |
|
|
| 如果视频数量很多,建议分批处理: |
|
|
| ### 方案1: 分段处理 |
|
|
| ```bash |
| # 第一批:0-100 |
| START_INDEX=0 MAX_VIDEOS=100 python infer_caption_batch.py |
| |
| # 第二批:100-200 |
| START_INDEX=100 MAX_VIDEOS=100 python infer_caption_batch.py |
| |
| # 第三批:200-300 |
| START_INDEX=200 MAX_VIDEOS=100 python infer_caption_batch.py |
| ``` |
|
|
| ### 方案2: 使用脚本自动分批 |
|
|
| 创建 `run_batch.sh`: |
|
|
| ```bash |
| #!/bin/bash |
| |
| TOTAL=1000 |
| BATCH_SIZE=100 |
| |
| for i in $(seq 0 $BATCH_SIZE $TOTAL); do |
| echo "Processing batch starting at $i" |
| |
| # 修改配置并运行 |
| python infer_caption_batch.py \ |
| --start-index $i \ |
| --max-videos $BATCH_SIZE \ |
| --output "output/batch_${i}.json" |
| done |
| ``` |
|
|
| ## 性能优化 |
|
|
| ### GPU 内存优化 |
|
|
| 如果遇到 OOM(内存不足): |
|
|
| ```python |
| # 方案1: 减小批处理大小 |
| PROCESS_BATCH = 1 # 逐个处理 |
| |
| # 方案2: 减少帧数 |
| os.environ['FPS_MAX_FRAMES'] = '30' # 减少到30帧 |
| |
| # 方案3: 使用更小的模型 |
| MODEL_NAME = 'Qwen/Qwen2.5-VL-7B-Instruct' |
| ``` |
|
|
| ### 加速处理 |
|
|
| 如果内存充足: |
|
|
| ```python |
| # 增加批处理大小 |
| PROCESS_BATCH = 4 # 一次处理4个视频 |
| MAX_BATCH_SIZE = 4 |
| ``` |
|
|
| ## 错误处理 |
|
|
| 脚本会自动处理错误: |
|
|
| 1. **单个视频失败**: 记录错误,继续处理下一个 |
| 2. **批次失败**: 整个批次标记为错误,继续下一批 |
| 3. **模型加载失败**: 程序终止 |
|
|
| 错误记录示例: |
|
|
| ```json |
| { |
| "index": 5, |
| "metadata": {...}, |
| "caption": null, |
| "raw_caption": null, |
| "error": "CUDA out of memory", |
| "timestamp": "2025-12-10T12:35:00" |
| } |
| ``` |
|
|
| ## 常见问题 |
|
|
| ### Q1: 如何只处理某个数据源? |
|
|
| 修改查找逻辑: |
|
|
| ```python |
| # 在 find_video_files 函数中 |
| sources = [d for d in root_dir.iterdir() |
| if d.is_dir() and d.name == "AUTOLab"] # 只处理AUTOLab |
| ``` |
|
|
| ### Q2: 如何处理所有以2开头的视频,包括stereo? |
|
|
| ```python |
| EXCLUDE_STEREO = False |
| ``` |
|
|
| ### Q3: 如何查看处理进度? |
|
|
| 程序使用 `tqdm` 显示进度条,会实时显示: |
| ``` |
| Processing: 45%|████████ | 45/100 [12:34<14:23, 0.06it/s] |
| ``` |
|
|
| ### Q4: 如何验证结果质量? |
|
|
| ```python |
| # 抽样检查 |
| import random |
| results = data['results'] |
| sample = random.sample(results, 5) |
| |
| for r in sample: |
| print(f"Video: {r['metadata']['video_name']}") |
| print(f"Caption: {r['caption']}") |
| print("-" * 50) |
| ``` |
|
|
| ## 后续改进 |
|
|
| 可以进一步添加: |
|
|
| 1. **并行处理**: 使用多GPU并行处理 |
| 2. **断点续传**: 保存中间结果,支持从断点继续 |
| 3. **质量检查**: 自动验证输出JSON格式 |
| 4. **可视化**: 生成HTML报告展示标注结果 |
| 5. **导出**: 转换为其他格式(CSV, HDF5等) |
|
|
| ## 示例:完整工作流 |
|
|
| ```bash |
| # 1. 首先测试单个视频 |
| python infer_caption_v0.py |
| |
| # 2. 测试批量处理(小批量) |
| # 编辑 infer_caption_batch.py: MAX_VIDEOS = 5 |
| python infer_caption_batch.py |
| |
| # 3. 检查结果 |
| cat output/caption_results_batch.json | jq '.results[0]' |
| |
| # 4. 如果结果正常,处理全部 |
| # 编辑: MAX_VIDEOS = None |
| python infer_caption_batch.py |
| |
| # 5. 分析结果 |
| python analyze_captions.py output/caption_results_batch.json |
| ``` |
|
|