Qwen3-VL-8B — Document → Markdown (Fine-Tuned)
Developed by: vanishingradient
License: Apache-2.0
Base model: unsloth/Qwen3-VL-8B-Instruct-unsloth-bnb-4bit
This is a fine-tuned Qwen3-VL-8B Vision-Language model optimized for document understanding and structured markdown generation from images such as scanned pages, PDFs, screenshots, and technical documents.
The model was fine-tuned using Unsloth and Hugging Face TRL, enabling faster training and reduced VRAM usage while maintaining output fidelity.
Capabilities
- Image → structured Markdown
- Document layout preservation
- Headings, lists, tables, inline formatting
- Technical and academic documents
- Low-VRAM inference (4-bit quantized)
Training Details
- Framework: Unsloth + Hugging Face TRL
- Quantization: 4-bit (bnb)
- Objective: Instruction-tuned image-to-text generation
- Domain focus: Documents and structured layouts
Inference Example
from transformers import AutoModelForVision2Seq, AutoProcessor, TextStreamer
import torch
from PIL import Image
model_id = "vanishingradient/qwen-docs-finetuned"
# Load model (4-bit, fits on 16GB VRAM)
model = AutoModelForVision2Seq.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True,
load_in_4bit=True,
)
processor = AutoProcessor.from_pretrained(
model_id,
trust_remote_code=True
)
# --------------------------------------------------
# PLACEHOLDER: path to your local image file
# --------------------------------------------------
image = Image.open("/path/to/your/document_image.png")
messages = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "Convert this image to markdown format."}
]
}
]
text = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = processor(
text=[text],
images=[image],
return_tensors="pt"
).to("cuda")
streamer = TextStreamer(
processor.tokenizer,
skip_prompt=True
)
_ = model.generate(
**inputs,
streamer=streamer,
max_new_tokens=1024,
temperature=0.1,
)
- Downloads last month
- 25
Model tree for vanishingradient/qwen-docs-finetuned
Base model
Qwen/Qwen3-VL-8B-Instruct