yuzaa commited on
Commit
ce47a31
·
verified ·
1 Parent(s): 14e45aa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -213,7 +213,7 @@ You can customize image/video processing by passing additional parameters to `ap
213
  |-----------|---------|------------|-------------|
214
  | `downsample_mode` | `"16x"` | Image & Video | Visual token downsampling. `"16x"` merges tokens for efficiency; `"4x"` keeps 4× more tokens for finer detail. Must also be passed to `generate()`. |
215
  | `max_slice_nums` | `9` | Image & Video | Maximum number of slices when splitting a high-resolution image. Higher values preserve more detail for large images. Recommended: `36` for image, `1` for video. |
216
- | `max_num_frames` | `128` | Video only | Maximum number of main frames sampled from the video. |
217
  | `stack_frames` | `1` | Video only | Total sample points per second. `1` = main frame only (no stacking). `N` (N>1) = 1 main frame + N−1 sub-frames per second; the sub-frames are composited into a grid image and interleaved with main frames. Recommended setting is `1` for short videos, and `3` or `5` for long videos. |
218
  | `use_image_id` | `True` | Image & Video | Whether to prepend `<image_id>N</image_id>` tags before each image/frame placeholder. Set `True` for image, `False` for video. |
219
 
 
213
  |-----------|---------|------------|-------------|
214
  | `downsample_mode` | `"16x"` | Image & Video | Visual token downsampling. `"16x"` merges tokens for efficiency; `"4x"` keeps 4× more tokens for finer detail. Must also be passed to `generate()`. |
215
  | `max_slice_nums` | `9` | Image & Video | Maximum number of slices when splitting a high-resolution image. Higher values preserve more detail for large images. Recommended: `36` for image, `1` for video. |
216
+ | `max_num_frames` | `128` | Video only | The `max_num_frames` parameter dynamically controls the temporal context length and prevents VRAM overflow: <br> **Short Videos** (duration ≤ `max_num_frames` sec): The processor defaults to **1 FPS**, capturing second-by-second details without hitting the upper limit. <br> **Long Videos** (duration > `max_num_frames` sec): The processor automatically switches to **uniform sampling**, selecting exactly `max_num_frames` evenly spaced across the entire timeline. |
217
  | `stack_frames` | `1` | Video only | Total sample points per second. `1` = main frame only (no stacking). `N` (N>1) = 1 main frame + N−1 sub-frames per second; the sub-frames are composited into a grid image and interleaved with main frames. Recommended setting is `1` for short videos, and `3` or `5` for long videos. |
218
  | `use_image_id` | `True` | Image & Video | Whether to prepend `<image_id>N</image_id>` tags before each image/frame placeholder. Set `True` for image, `False` for video. |
219