PEFT
Safetensors
English
Changli commited on
Commit
7a4fb89
·
verified ·
1 Parent(s): cec43c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -21,13 +21,15 @@ video-SALMONN 2+ is built on Qwen 2.5-VL using a similar pipeline of video-SALMO
21
 
22
  ## How to Use
23
 
 
 
24
  1. Prepare the dataset following `scripts/example_av.json`, `scripts/example_v.json`, `scripts/example_dpo.json`, and `scripts/example_a.json`
25
  2. Prepare base audio model through modifying the path in `gen_audio_model.py`
26
  3. To conduct audio alignment, use the following script:
27
  ```bash
28
  bash scripts/train.sh --interval 0.1 --run_name audio_alignment --dataset path_to_dataset --lr 2e-5 --train_qformer --max_frames 768 --max_pixels 61250 --model path_to_audio_model --model_base path_to_audio_model --bs 16 --epoch 5 --save_steps 5000
29
  ```
30
- 4. To conduct audio visual SFT, use the following script:
31
  ```bash
32
  bash scripts/train.sh --interval 0.1 --run_name av_sft --dataset path_to_dataset --lr 2e-5 --train_qformer --train_proj --max_frames 768 --max_pixels 61250 --model audio_align_model --model_base path_to_audio_model --epoch 5 --save_steps 2000 --use_lora --lora_r 128 --lora_alpha 256
33
  ```
@@ -35,11 +37,11 @@ video-SALMONN 2+ is built on Qwen 2.5-VL using a similar pipeline of video-SALMO
35
  ```bash
36
  bash scripts/train.sh --interval 0.1 --run_name dpo --dataset path_to_dataset --max_frames 768 --max_pixels 61250 --model audio_visual_base --model_base audio_align_model --lora_ckpt audio_visual_checkpoint --train_type gdpo --use_lora --lora_r 128 --lora_alpha 256 --lr 5e-6 --epoch 1 --save_steps 200 --train_qformer --train_proj
37
  ```
38
- 6. To evaluate 3B/7B model, use the following script:
39
  ```bash
40
  bash scripts/test.sh --interval 0.1 --run_name eval --dataset path_to_dataset --max_frames 768 --max_pixels 61250 --model path_to_audio_model --model_base path_to_audio_model --lora_ckpt model_ckpt
41
  ```
42
  7. To evaluate 72B model, use the following script:
43
  ```bash
44
  bash scripts/test_8.sh --interval 0.1 --run_name eval --dataset path_to_dataset --max_frames 768 --max_pixels 61250 --model path_to_audio_model --model_base path_to_audio_model --lora_ckpt model_ckpt
45
- ```
 
21
 
22
  ## How to Use
23
 
24
+ **IMPORTANT**: To get the same evaluation result, please use `--max_frames 768 --max_pixels 61250`. Using excessively high resolution or frame rate for evaluation may lead to too much input token count for the model, potentially causing performance degradation.
25
+
26
  1. Prepare the dataset following `scripts/example_av.json`, `scripts/example_v.json`, `scripts/example_dpo.json`, and `scripts/example_a.json`
27
  2. Prepare base audio model through modifying the path in `gen_audio_model.py`
28
  3. To conduct audio alignment, use the following script:
29
  ```bash
30
  bash scripts/train.sh --interval 0.1 --run_name audio_alignment --dataset path_to_dataset --lr 2e-5 --train_qformer --max_frames 768 --max_pixels 61250 --model path_to_audio_model --model_base path_to_audio_model --bs 16 --epoch 5 --save_steps 5000
31
  ```
32
+ 4. To conduct audio-visual SFT, use the following script:
33
  ```bash
34
  bash scripts/train.sh --interval 0.1 --run_name av_sft --dataset path_to_dataset --lr 2e-5 --train_qformer --train_proj --max_frames 768 --max_pixels 61250 --model audio_align_model --model_base path_to_audio_model --epoch 5 --save_steps 2000 --use_lora --lora_r 128 --lora_alpha 256
35
  ```
 
37
  ```bash
38
  bash scripts/train.sh --interval 0.1 --run_name dpo --dataset path_to_dataset --max_frames 768 --max_pixels 61250 --model audio_visual_base --model_base audio_align_model --lora_ckpt audio_visual_checkpoint --train_type gdpo --use_lora --lora_r 128 --lora_alpha 256 --lr 5e-6 --epoch 1 --save_steps 200 --train_qformer --train_proj
39
  ```
40
+ 6. To evaluate 7B model, use the following script:
41
  ```bash
42
  bash scripts/test.sh --interval 0.1 --run_name eval --dataset path_to_dataset --max_frames 768 --max_pixels 61250 --model path_to_audio_model --model_base path_to_audio_model --lora_ckpt model_ckpt
43
  ```
44
  7. To evaluate 72B model, use the following script:
45
  ```bash
46
  bash scripts/test_8.sh --interval 0.1 --run_name eval --dataset path_to_dataset --max_frames 768 --max_pixels 61250 --model path_to_audio_model --model_base path_to_audio_model --lora_ckpt model_ckpt
47
+ ```