Instructions to use sukhrobnurali/qwen3vl-resume-parser with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sukhrobnurali/qwen3vl-resume-parser with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="sukhrobnurali/qwen3vl-resume-parser")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("sukhrobnurali/qwen3vl-resume-parser")
model = AutoModelForImageTextToText.from_pretrained("sukhrobnurali/qwen3vl-resume-parser")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use sukhrobnurali/qwen3vl-resume-parser with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sukhrobnurali/qwen3vl-resume-parser"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sukhrobnurali/qwen3vl-resume-parser",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/sukhrobnurali/qwen3vl-resume-parser

SGLang

How to use sukhrobnurali/qwen3vl-resume-parser with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sukhrobnurali/qwen3vl-resume-parser" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sukhrobnurali/qwen3vl-resume-parser",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sukhrobnurali/qwen3vl-resume-parser" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sukhrobnurali/qwen3vl-resume-parser",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Unsloth Studio

How to use sukhrobnurali/qwen3vl-resume-parser with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sukhrobnurali/qwen3vl-resume-parser to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sukhrobnurali/qwen3vl-resume-parser to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for sukhrobnurali/qwen3vl-resume-parser to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="sukhrobnurali/qwen3vl-resume-parser",
    max_seq_length=2048,
)

Docker Model Runner
How to use sukhrobnurali/qwen3vl-resume-parser with Docker Model Runner:
```
docker model run hf.co/sukhrobnurali/qwen3vl-resume-parser
```

qwen3vl-resume-parser

A QLoRA fine-tune of Qwen/Qwen3-VL-8B-Instruct that reads resume/CV page images and returns a fixed 23-field JSON record. Published as merged full weights (BF16 safetensors, ~9B params), so it loads like any standard Qwen3-VL checkpoint — no adapter to attach.

It started as an internal project at Corporate Solutions Group: the production resume parser ran on Qwen2.5-VL-32B (8-bit, ~50 GB VRAM) behind a vLLM server, and the goal was a smaller model that kept parsing quality while cutting GPU cost. This repo is the public, portfolio version of that work. The training data is a private internal dataset and is not redistributed; all code (data prep, training, eval) is open at github.com/sukhrobnurali/resume-trainer.

TL;DR

Base: Qwen/Qwen3-VL-8B-Instruct (QLoRA, then merged → BF16).
Task: resume page image(s) → structured JSON (23 fields: identity, contact, skills, experiences, educations, languages, certificates, projects, preferences).
Why fine-tune: the 23-field schema and the project's formatting rules are baked into the weights, so a one-line prompt replaces the ~280-line schema prompt the 32B base needed.
Measured (full 51-sample held-out split, A100, BF16, greedy): 83.9% weighted score, 88.2% unweighted, 88.2% JSON-valid. See Evaluation for the honest caveats.
Footprint: ~23 GB VRAM in BF16 at 16K context (vs. ~50 GB for the 32B it replaces).

Intended use

Extracting structured data from resume/CV documents rendered to images (PDF → PNG per page). The model is tuned for a specific downstream schema (below) used by a recruiting/ATS pipeline, including its enum vocabularies (PascalCase country names, a fixed list of roles/technologies/industries). It is most useful when you want one model call to turn a resume into a database-ready record.

It is not a general document-VQA model and should not be used to make automated decisions about candidates — see Out-of-scope.

Input / output schema

Input: one or more page images of a single resume, plus the short instruction the model was trained with (see How to use).

Output: a single JSON object with 23 top-level fields. Scalars are null when absent; list fields default to []; address defaults to {country_name, region_name}.

Field	Type	Notes
`first_name`, `last_name`	string
`email`, `phone`	string
`date_of_birth`	string	`YYYY-MM-DD`
`desired_position`	string	mapped to a fixed role vocabulary
`about`	string	free-text summary
`job_experience`	number	total years
`job_expectations`, `min_salary`, `max_salary`	string / number
`ready_to_relocation`	bool
`work_modes`, `employment_types`, `employment_durations`	string[]	enum values
`hobbies`	string
`address`	object	`{country_name, region_name}`
`skills`	object[]	`{skill_name, level}`
`experiences`	object[]	`{company_name, job, date_from, date_to, description, country_name}`
`educations`	object[]	`{name, degree, location, programme, date_from, date_to, country_name}`
`languages`	object[]	`{language_name, level}` (level is an int)
`certificates`	object[]	`{certificate_name, certificate_programme, issuing_date, expiring_date}`
`projects`	object[]	`{title, summary, used_technologies[], role, industries[]}`

Dates are normalized to YYYY-MM-DD (year-only ranges expand to Jan 1 / Dec 31; ongoing roles set date_to: null). Classification fields (desired_position, project role / used_technologies / industries, and all country_name fields) are mapped to predefined option lists, falling back to "Other" when nothing matches.

Real (anonymized) output example:

{
  "first_name": "Jane",
  "last_name": "Doe",
  "date_of_birth": null,
  "email": "jane@example.com",
  "phone": "+1-555-0100",
  "desired_position": "Android Developer",
  "about": null,
  "job_experience": null,
  "job_expectations": null,
  "min_salary": null,
  "max_salary": null,
  "ready_to_relocation": false,
  "work_modes": [],
  "employment_types": [],
  "employment_durations": [],
  "hobbies": null,
  "address": { "country_name": "Uzbekistan", "region_name": "Tashkent" },
  "skills": [
    { "skill_name": "Android Development", "level": null },
    { "skill_name": "Kotlin", "level": null },
    { "skill_name": "Firebase", "level": null }
  ],
  "experiences": [
    {
      "company_name": "Android Development Course",
      "job": "Student / Trainee (Android Development)",
      "date_from": "2021-01-01",
      "date_to": null,
      "description": "Android development course focused on Java/Kotlin/Android.",
      "country_name": null
    }
  ],
  "languages": [
    { "language_name": "Uzbek", "level": 6 },
    { "language_name": "English", "level": 2 },
    { "language_name": "Russian", "level": 0 }
  ],
  "educations": [
    {
      "name": "Tashkent University of Information Technologies",
      "degree": "Bachelor",
      "location": "Tashkent",
      "programme": "E-Commerce",
      "date_from": null,
      "date_to": "2019-01-01",
      "country_name": "Uzbekistan"
    }
  ],
  "certificates": [],
  "projects": [
    {
      "title": "Wallpaper App",
      "summary": "Wallpaper app based on MVVM, Coin, Flow, Retrofit.",
      "used_technologies": ["Kotlin", "Other"],
      "role": "Mobile Developer(IOS/Android)",
      "industries": ["Other"]
    }
  ]
}

Training data

513 human-verified resume samples (private internal dataset). Each sample is a PDF rendered to one or more page PNGs plus a verified ground-truth JSON record.
Split: 462 train / 51 held-out eval, 90/10, fixed seed 42. Samples whose estimated token length exceeded ~15.2K (1K below the 16,384 context budget) were dropped from training, so the effective training count is ≤462.
Page distribution: 276 single-page, 136 two-page, 101 three-or-more-page (up to 8).
Language: predominantly English; some records contain non-English values (e.g. Russian/Uzbek company or language names).

The dataset is not released. Code to rebuild splits and bundles is in the repo (src/data_prep.py, src/export_eval_bundle.py).

Training procedure

QLoRA via Unsloth (FastVisionModel) + TRL SFTTrainer. The 4-bit base (unsloth/Qwen3-VL-8B-Instruct-unsloth-bnb-4bit, nf4) was adapted with LoRA on both the vision and language towers (attention + MLP modules), then the adapter was merged back into the full model and published.

Each training example is a single user turn — the page images followed by the combined system+user instruction — with the ground-truth JSON as the assistant target. There is no separate system role; this is why inference uses the same short prompt.

Dtype note: the merge used Unsloth's merged_16bit, and the original upload was labeled "float16", but the published config.json and stored tensors are bfloat16. Treat this model as BF16.

Hyperparameters

Hyperparameter	Value
Method	QLoRA (4-bit nf4 base + LoRA, merged after training)
LoRA rank / alpha / dropout	16 / 16 / 0
Target modules	vision + language layers, attention + MLP (`bias="none"`, no rslora)
Learning rate	2e-4
LR scheduler / warmup	cosine / 10 steps
Optimizer	`adamw_8bit`
Weight decay	0.01
Per-device batch / grad-accum	1 / 4 (effective batch 4)
Epochs	1
Max sequence length	16,384
Precision	bf16 (fp16 fallback if unsupported)
Seed	3407
Hardware	Google Colab L4 (24 GB)

Training time and final loss were not captured from the run.

Evaluation

Measured on 2026-06-05 with notebooks/eval_finetuned.ipynb against the held-out split, using the project's field-weighted scorer (src/evaluation.py). Setup: the published BF16 weights on a single A100, greedy decoding (do_sample=False, max_new_tokens=4096), on the full 51-sample held-out split.

Metric	Result
Overall weighted score	83.9%
Overall unweighted score	88.2%
JSON validity	88.2% (45/51 parsed; 6 failures)
Avg. inference	~92.0 s/resume
Peak VRAM	23.4 GB

Per-field accuracy (worst → best):

Field	Acc	Field	Acc
skills	67.5%	ready_to_relocation	88.2%
phone	74.5%	certificates	90.8%
desired_position	79.2%	projects	91.0%
address	81.2%	job_expectations	92.7%
experiences	81.7%	hobbies	96.1%
first_name	82.3%	date_of_birth	98.0%
last_name	82.3%	work_modes	98.0%
email	84.3%	employment_types	98.0%
job_experience	84.3%	employment_durations	98.0%
educations	84.5%	min_salary	100.0%
languages	87.2%	max_salary	100.0%
about	88.2%

Read these numbers with the following caveats:

Full held-out split, single run. These are all 51 held-out samples with greedy decoding — a real measurement, but one run on a modest test set, not a large benchmark.
Partial-credit metric. The scorer uses fuzzy string ratios, date/numeric tolerances, and greedy best-match over object arrays, with fields weighted by importance (work experience is weighted highest). It is not strict exact-match and is not comparable to other parsers' published numbers — it is an internal quality signal. The weighted score (83.9%) is below the unweighted (88.2%) because the highest-weighted fields — experiences, skills, identity/contact — are also the hardest ones.
The top-scoring fields are mostly "correctly empty." min_salary/max_salary (100%) and date_of_birth, work_modes, employment_types, employment_durations (~98%) are almost always absent in this data, so high scores largely reflect correctly returning empty — not hard extraction.
6/51 invalid JSON (~12%). Most likely 4096-token truncation on long multi-page resumes; downstream code must handle un-parseable output (retry, repair, or shorter prompts).

For context, the model-selection benchmark that led to Qwen3-VL-8B (base models, ~10 samples, not reproducible from committed outputs) is noted in the repo's SESSION_LOG.md; it is not a fine-tuned result and is excluded here.

How to use

Requires a recent transformers (≥4.57 for Qwen3-VL; latest recommended). The published processor carries the correct chat template, so the modern image-in-messages path works without extra utilities.

# pip install -U transformers accelerate
import json
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration

model_id = "sukhrobnurali/qwen3vl-resume-parser"
model = Qwen3VLForConditionalGeneration.from_pretrained(
    model_id, dtype="auto", device_map="auto", attn_implementation="sdpa",
)
processor = AutoProcessor.from_pretrained(model_id)

# The 23-field schema is baked into the weights, so the short training prompt is all it needs.
SYSTEM_PROMPT = "You are a resume parser. Extract information from resume images into structured JSON."
USER_PROMPT = "Parse this resume and return the structured JSON."

# One entry per page, top to bottom. "url" accepts a local file path or an http(s) URL.
pages = ["resume_page_1.png", "resume_page_2.png"]

messages = [{
    "role": "user",
    "content": (
        [{"type": "text", "text": SYSTEM_PROMPT}]
        + [{"type": "image", "url": p} for p in pages]
        + [{"type": "text", "text": USER_PROMPT}]
    ),
}]

inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)
inputs.pop("token_type_ids", None)

generated = model.generate(**inputs, max_new_tokens=4096, do_sample=False)
trimmed = generated[:, inputs["input_ids"].shape[1]:]
text = processor.batch_decode(trimmed, skip_special_tokens=True)[0]

resume = json.loads(text)  # the 23-field record
print(json.dumps(resume, indent=2, ensure_ascii=False))

Use greedy decoding (do_sample=False) for stable structured output. For long multi-page resumes, raise max_new_tokens if you see truncated JSON.

vLLM serving (the original deployment target):

vllm serve sukhrobnurali/qwen3vl-resume-parser \
  --dtype bfloat16 --max-model-len 16384 --trust-remote-code

When calling through the OpenAI-compatible API, pass extra_body={"chat_template_kwargs": {"enable_thinking": false}} to keep the model in non-thinking (direct-JSON) mode.

Limitations

Domain skew. Training resumes skew toward IT/software roles, and the enum vocabularies (roles, technologies, industries) are IT-centric. Expect degradation on non-technical resumes, unusual layouts, scans/photos, or handwriting.
Language. English-dominant; non-English resumes are under-represented.
Schema lock-in. The model is tuned to one specific 23-field schema and its enum lists. It will coerce values toward those vocabularies (including "Other"), which may not match a different downstream schema.
Invalid JSON happens (~12% on the held-out split). Always parse defensively.
Latency. ~90 s/resume on an A100 at 16K context — batch/offline, not real-time.
Quantization. BF16 peaks at ~23 GB VRAM; it runs in 4-bit on a 16 GB GPU, but accuracy was only measured in BF16.

Out-of-scope and responsible use

No automated candidate decisions. Resume parsing for screening/ranking carries fairness and bias risk. Keep a human in the loop; do not use this model to make or materially influence hiring decisions without review.
Not a general VQA / OCR model. It is specialized for this resume schema.
PII. Resumes contain personal data. Handle outputs under the applicable privacy law (e.g. GDPR) — secure storage, access control, retention limits, and a lawful basis for processing.
Verify before trusting. Outputs are model predictions, not ground truth; validate critical fields (contact info, dates) downstream.

License

Released under Apache-2.0, inherited from the Qwen/Qwen3-VL-8B-Instruct base model.

Citation

@misc{nurali2026qwen3vlresumeparser,
  title        = {qwen3vl-resume-parser: a Qwen3-VL-8B fine-tune for resume-to-JSON extraction},
  author       = {Nurali, Sukhrob},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/sukhrobnurali/qwen3vl-resume-parser}}
}

Built on Qwen3-VL by the Qwen team; see the Qwen3-VL model card and Unsloth for the training stack.

Author

Sukhrob Nurali — sukhrobnurali@gmail.com Hugging Face: @sukhrobnurali · GitHub: @sukhrobnurali

Downloads last month: 8

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for sukhrobnurali/qwen3vl-resume-parser

Base model

Qwen/Qwen3-VL-8B-Instruct

Finetuned

(295)

this model