Instructions to use prithivMLmods/Lumian-VLR-7B-Thinking with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/Lumian-VLR-7B-Thinking with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="prithivMLmods/Lumian-VLR-7B-Thinking")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("prithivMLmods/Lumian-VLR-7B-Thinking")
model = AutoModelForMultimodalLM.from_pretrained("prithivMLmods/Lumian-VLR-7B-Thinking")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use prithivMLmods/Lumian-VLR-7B-Thinking with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prithivMLmods/Lumian-VLR-7B-Thinking"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Lumian-VLR-7B-Thinking",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/prithivMLmods/Lumian-VLR-7B-Thinking

SGLang

How to use prithivMLmods/Lumian-VLR-7B-Thinking with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prithivMLmods/Lumian-VLR-7B-Thinking" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Lumian-VLR-7B-Thinking",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prithivMLmods/Lumian-VLR-7B-Thinking" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/Lumian-VLR-7B-Thinking",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use prithivMLmods/Lumian-VLR-7B-Thinking with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/Lumian-VLR-7B-Thinking
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Lumian-VLR-7B-Thinking

The Lumian-VLR-7B-Thinking model is a high-fidelity vision-language reasoning (experimental model) system designed for fine-grained multimodal understanding. Built on Qwen2.5-VL-7B-Instruct, this model enhances image captioning, sampled video reasoning, and document comprehension through explicit grounded reasoning. It produces structured reasoning traces aligned with visual coordinates, enabling explainable multimodal reasoning. Trained via supervised fine-tuning (SFT) on visually-grounded reasoning traces and further refined using GRPO reinforcement learning, Lumian delivers superior step-by-step chain-of-thought reasoning with strong visual grounding.

Model Subfolder: Lumian-VLR-7B-Thinking(think-preview)

Model Folder: Lumian-VLR-7B-Thinking(no-think-single-shot)

Quick Start with Transformers(think-preview)🤗

pip install git+https://github.com/huggingface/transformers.git

# Load Lumian-VLR-7B-Thinking
from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

MODEL_ID = "prithivMLmods/Lumian-VLR-7B-Thinking"
SUBFOLDER = "think-preview"
processor = AutoProcessor.from_pretrained(MODEL_ID, trust_remote_code=True, subfolder=SUBFOLDER)
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    MODEL_ID,
    trust_remote_code=True,
    subfolder=SUBFOLDER,
    torch_dtype=torch.float16
).to(device).eval()

Key Enhancements

Visually-Grounded Reasoning and Thinking Traces: Generates explicit reasoning traces tied to image regions and document structures for transparent and explainable outputs.
Advanced Image Captioning: Produces detailed, grounded captions with reasoning steps for improved scene understanding.
Sampled Video Reasoning: Handles long-duration videos with temporal reasoning for question answering and summarization.
Context-Aware Document Analysis: Excels at structured and unstructured content extraction with visual grounding.
Fine-Grained Visual Grounding: Accurately links reasoning steps to tables, charts, and graphical elements.
Reinforcement-Learned Thinking: GRPO training incentivizes accurate, grounded reasoning with minimal hallucinations.

Colab Demo : https://huggingface.co/prithivMLmods/Lumian-VLR-7B-Thinking/blob/main/think-preview/Lumian-VLR-7B-Thinking-Demo-Notebook/Lumian-VLR-7B-Thinking.ipynb

Thinking Traces

The model outputs reasoning and answers in a structured format:

<think>
Step 1: Identify the main elements in the image and their positions.
Step 2: Analyze the relationships between objects and surrounding context.
Step 3: Derive the final answer based on spatial reasoning and visual cues.
</think>

<answer>
The image depicts a person holding an open book with highlighted sections on the left page.
</answer>

Quick Start with Transformers(single-shot)

from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "prithivMLmods/Lumian-VLR-7B-Thinking", torch_dtype="auto", device_map="auto"
)

processor = AutoProcessor.from_pretrained("prithivMLmods/Lumian-VLR-7B-Thinking")

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {"type": "text", "text": "Describe this image with thinking traces."},
        ],
    }
]

text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=256)
generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

Intended Use

Visual reasoning with grounded, step-by-step thinking traces.
Explainable image captioning and sampled video reasoning.
Multimodal document retrieval, extraction, and analytical interpretation.
Transparent chain-of-thought reasoning for educational, research, and enterprise use.
Multilingual reasoning and structured content extraction.
Robotic and mobile vision-based automation with grounded decision-making.

Limitations

High memory requirements for long videos and large document batches.
Degraded accuracy on extremely low-resolution or obscured visuals.
Suboptimal for real-time inference on edge devices.
Visual token configuration strongly influences reasoning fidelity.
Occasional reasoning drift or partial grounding errors.

References

YaRN: Efficient Context Window Extension of Large Language Models
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy
Ground-R1: Incentivizing Grounded Visual Reasoning via Reinforcement Learning

Downloads last month: 6

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for prithivMLmods/Lumian-VLR-7B-Thinking

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Finetuned

(1128)

this model

Finetunes

1 model

Quantizations

2 models

Spaces using prithivMLmods/Lumian-VLR-7B-Thinking 2

Collection including prithivMLmods/Lumian-VLR-7B-Thinking

Vision-Language for Reasoning (VLr)

Collection

Thinking / No-Thinking • 2 items • Updated 2 days ago • 2