Allen8
/

TVC-7B

@@ -1,14 +1,18 @@
 ---
 library_name: transformers
 license: apache-2.0
-base_model: Qwen/Qwen2-VL-7B-Instruct
 tags:
 - llama-factory
 - full
 - generated_from_trainer
 model-index:
 - name: TVC-7B
   results: []
 ---
 ## Model Summary
@@ -16,10 +20,10 @@ model-index:
 The TVC models are 7B parameter models based on Qwen2-VL-7B-Instruct model with a context window of 8K tokens.
 - **Repository:** https://github.com/sun-hailong/TVC
 - **Languages:** English, Chinese
 - **Paper:** https://arxiv.org/abs/2503.13360
 ### Model Architecture
 - **Architecture:** Qwen2-VL-7B-Instruct
@@ -39,6 +43,54 @@ The TVC models are 7B parameter models based on Qwen2-VL-7B-Instruct model with
 - Datasets 3.1.0
 - Tokenizers 0.20.3
 ## Citation
 ```

 ---
+base_model: Qwen/Qwen2-VL-7B-Instruct
 library_name: transformers
 license: apache-2.0
 tags:
 - llama-factory
 - full
 - generated_from_trainer
+- long-context
+- reasoning
+- multi-modal
 model-index:
 - name: TVC-7B
   results: []
+pipeline_tag: image-text-to-text
 ---
 ## Model Summary
 The TVC models are 7B parameter models based on Qwen2-VL-7B-Instruct model with a context window of 8K tokens.
 - **Repository:** https://github.com/sun-hailong/TVC
+- **Project Page:** https://sun-hailong.github.io/projects/TVC/
 - **Languages:** English, Chinese
 - **Paper:** https://arxiv.org/abs/2503.13360
 ### Model Architecture
 - **Architecture:** Qwen2-VL-7B-Instruct
 - Datasets 3.1.0
 - Tokenizers 0.20.3
+## Quick Start
+```python
+from vllm import LLM, SamplingParams
+from PIL import Image
+model_name = "Allen8/TVC-72B"
+llm = LLM(
+        model=model_name,
+        trust_remote_code=True,
+        tensor_parallel_size=8,
+    )
+question = "Hint: Please answer the question requiring an integer answer and provide the final value, e.g., 1, 2, 3, at the end.
+Question: Subtract all red things. Subtract all tiny matte balls. How many objects are left?
+Please answer the question using a long-chain reasoning style and think step by step."
+placeholder = "<|image_pad|>"
+prompt = ("<|im_start|>system
+You are a helpful assistant.<|im_end|>
+"
+f"<|im_start|>user
+<|vision_start|>{placeholder}<|vision_end|>"
+f"{question}<|im_end|>
+"
+"<|im_start|>assistant
+")
+sampling_params = SamplingParams(
+    temperature=0.0,
+    top_k=1,
+    top_p=1.0,
+    stop_token_ids=[],
+    repetition_penalty=1.05,
+    max_tokens=8192
+)
+image = Image.open("images/case1.png")
+inputs = {
+            "prompt": prompt,
+            "multi_modal_data": {
+                "image": image
+            },
+        }
+outputs = llm.generate([inputs], sampling_params=sampling_params)
+print(outputs[0].outputs[0].text)
+```
 ## Citation
 ```