| # Preparing and Training with Video Metadata | |
| This guide walks you through preparing your video metadata, splitting it for efficient training, and running the training scripts. | |
| ## 1. Prepare Your Data in `.jsonl` Format | |
| Your video metadata should be organized in JSON Lines (`.jsonl`) format, where each line is a valid JSON object representing one video. | |
| **Example:** | |
| ```json | |
| { | |
| "video_path": "data/infinitystar_toy_data/videos/e06b8ca5dbc6.mp4", | |
| "begin_frame_id": 0, | |
| "end_frame_id": 120, | |
| "tarsier2_caption": "The video features an animated character with long light orange hair and brown eyes.", | |
| "width": 1280, | |
| "height": 720, | |
| "h_div_w": 0.5625, | |
| "fps": 24 | |
| } | |
| ``` | |
| ## 2. Split Metadata for Training | |
| For efficient training, large `.jsonl` files can be split into smaller chunks. | |
| ```bash | |
| python3 data/infinitystar_toy_data/split_jsonls_for_training.py --jsonl_folder_list JSONL_DIR --save_dir SAVE_DIR --chunk_size 100 | |
| ``` | |
| ## 3. Extract Video Features | |
| To extract video features, modify the `extract_video_features.sh` script. Set the `video_data_path` and choose the desired resolution. | |
| * **480p (5s):** `pn=0.40M` | |
| * **480p (10s):** `pn=0.40M` with `video_frames=161` | |
| * **720p (5s):** `pn=0.90M` | |
| Then, run the script: | |
| ```bash | |
| bash scripts/extract_video_features.sh | |
| ``` | |
| ## 4. Run Training Scripts | |
| Once your metadata is prepared and features are extracted, you can start training. | |
| **480p Training (5s or 10s):** | |
| ```bash | |
| bash scripts/train_480p.sh | |
| ``` | |
| **720p Training (only 5s):** | |
| ```bash | |
| bash scripts/train_720p.sh | |
| ``` | |
| The 480p configuration supports both 5-second and 10-second video training. For 10-second training, ensure that `video_frames` is set to `161` in `extract_video_features.sh` and `train_480p.sh`. |