File size: 1,742 Bytes
3d1c0e1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# Preparing and Training with Video Metadata

This guide walks you through preparing your video metadata, splitting it for efficient training, and running the training scripts.

## 1. Prepare Your Data in `.jsonl` Format

Your video metadata should be organized in JSON Lines (`.jsonl`) format, where each line is a valid JSON object representing one video.

**Example:**
```json
{
  "video_path": "data/infinitystar_toy_data/videos/e06b8ca5dbc6.mp4",
  "begin_frame_id": 0,
  "end_frame_id": 120,
  "tarsier2_caption": "The video features an animated character with long light orange hair and brown eyes.",
  "width": 1280,
  "height": 720,
  "h_div_w": 0.5625,
  "fps": 24
}
```

## 2. Split Metadata for Training

For efficient training, large `.jsonl` files can be split into smaller chunks.

```bash
python3 data/infinitystar_toy_data/split_jsonls_for_training.py --jsonl_folder_list JSONL_DIR --save_dir SAVE_DIR --chunk_size 100
```

## 3. Extract Video Features

To extract video features, modify the `extract_video_features.sh` script. Set the `video_data_path` and choose the desired resolution.

*   **480p (5s):** `pn=0.40M`
*   **480p (10s):** `pn=0.40M` with `video_frames=161`
*   **720p (5s):** `pn=0.90M`

Then, run the script:
```bash
bash scripts/extract_video_features.sh
```

## 4. Run Training Scripts

Once your metadata is prepared and features are extracted, you can start training.

**480p Training (5s or 10s):**
```bash
bash scripts/train_480p.sh
```

**720p Training (only 5s):**
```bash
bash scripts/train_720p.sh
```
The 480p configuration supports both 5-second and 10-second video training. For 10-second training, ensure that `video_frames` is set to `161` in `extract_video_features.sh` and `train_480p.sh`.