TVP-Thinking with Visual Primitives
Collection
4 items โข Updated
How to use yunfengwang/TVP-SFTBox-Qwen2VL-2B with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
model = PeftModel.from_pretrained(base_model, "yunfengwang/TVP-SFTBox-Qwen2VL-2B")Box expert LoRA adapter for Thinking with Visual Primitives.
Stage 2: Specialized SFT (Box Expert) โ Grounding, counting, and spatial reasoning with structured thinking.
1. **Analyzing the request**
The user asks me to locate the person in this image.
2. **Object grounding**
I see a <|ref|>person<|/ref|><|box|>[[511,208,738,963]]<|/box|>.
3. **Conclusion**
The person is located at the specified coordinates.
See the project repo for full instructions.