BAAI
/

Video-XL-2

Video-Text-to-Text

text-generation

text-generation-inference

Model card Files Files and versions

3v324v23 commited on Jun 4, 2025

Commit

d1ed198

·

1 Parent(s): e3bf9ba

update

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -28,7 +28,7 @@ TODO
 **Tips: Our inference code still under updating, you could update it by assign "--include '\*.py'" in huggingface-cli to only update the inference code, avoid downloading the whole model.*
 ---
-### w/o. efficiency optimization
 ```python
 from transformers import AutoTokenizer, AutoModel, AutoConfig, BitsAndBytesConfig
 import torch
@@ -66,7 +66,7 @@ print(response)
 ```
 ---
-### w. chunk-based prefill
 Chunk-based prefill significantly reduces memory demands and response latency by encoding video input in a streaming manner. This advantage becomes particularly noticeable with longer videos.
 To enable this mode, you need to set `enable_chunk_prefill` to `True` and configure the `prefill_config` parameters:
@@ -130,7 +130,7 @@ print(response)
 ```
 ---
-### w. chunk-based prefill & bi-level kvs decoding
 coming soon
 ```python

 **Tips: Our inference code still under updating, you could update it by assign "--include '\*.py'" in huggingface-cli to only update the inference code, avoid downloading the whole model.*
 ---
+### 1. Inference w/o. Efficiency Optimization
 ```python
 from transformers import AutoTokenizer, AutoModel, AutoConfig, BitsAndBytesConfig
 import torch
 ```
 ---
+### 2. Inference w. Chunk-based Pre-filling
 Chunk-based prefill significantly reduces memory demands and response latency by encoding video input in a streaming manner. This advantage becomes particularly noticeable with longer videos.
 To enable this mode, you need to set `enable_chunk_prefill` to `True` and configure the `prefill_config` parameters:
 ```
 ---
+### 3. Inference w. Chunk-based Pre-filling & Bi-level KVs Decoding
 coming soon
 ```python