Qwen3-VL-8B — Document → Markdown (Fine-Tuned)

Developed by: vanishingradient
License: Apache-2.0
Base model: unsloth/Qwen3-VL-8B-Instruct-unsloth-bnb-4bit

This is a fine-tuned Qwen3-VL-8B Vision-Language model optimized for document understanding and structured markdown generation from images such as scanned pages, PDFs, screenshots, and technical documents.

The model was fine-tuned using Unsloth and Hugging Face TRL, enabling faster training and reduced VRAM usage while maintaining output fidelity.


Capabilities

  • Image → structured Markdown
  • Document layout preservation
  • Headings, lists, tables, inline formatting
  • Technical and academic documents
  • Low-VRAM inference (4-bit quantized)

Training Details

  • Framework: Unsloth + Hugging Face TRL
  • Quantization: 4-bit (bnb)
  • Objective: Instruction-tuned image-to-text generation
  • Domain focus: Documents and structured layouts

Inference Example

from transformers import AutoModelForVision2Seq, AutoProcessor, TextStreamer
import torch
from PIL import Image

model_id = "vanishingradient/qwen-docs-finetuned"

# Load model (4-bit, fits on 16GB VRAM)
model = AutoModelForVision2Seq.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
    load_in_4bit=True,
)

processor = AutoProcessor.from_pretrained(
    model_id,
    trust_remote_code=True
)

# --------------------------------------------------
# PLACEHOLDER: path to your local image file
# --------------------------------------------------
image = Image.open("/path/to/your/document_image.png")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Convert this image to markdown format."}
        ]
    }
]

text = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

inputs = processor(
    text=[text],
    images=[image],
    return_tensors="pt"
).to("cuda")

streamer = TextStreamer(
    processor.tokenizer,
    skip_prompt=True
)

_ = model.generate(
    **inputs,
    streamer=streamer,
    max_new_tokens=1024,
    temperature=0.1,
)
Downloads last month
25
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vanishingradient/qwen-docs-finetuned

Dataset used to train vanishingradient/qwen-docs-finetuned