Instructions to use sukhrobnurali/qwen3vl-resume-parser with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sukhrobnurali/qwen3vl-resume-parser with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="sukhrobnurali/qwen3vl-resume-parser") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("sukhrobnurali/qwen3vl-resume-parser") model = AutoModelForImageTextToText.from_pretrained("sukhrobnurali/qwen3vl-resume-parser") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use sukhrobnurali/qwen3vl-resume-parser with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "sukhrobnurali/qwen3vl-resume-parser" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sukhrobnurali/qwen3vl-resume-parser", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/sukhrobnurali/qwen3vl-resume-parser
- SGLang
How to use sukhrobnurali/qwen3vl-resume-parser with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "sukhrobnurali/qwen3vl-resume-parser" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sukhrobnurali/qwen3vl-resume-parser", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "sukhrobnurali/qwen3vl-resume-parser" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sukhrobnurali/qwen3vl-resume-parser", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Unsloth Studio
How to use sukhrobnurali/qwen3vl-resume-parser with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sukhrobnurali/qwen3vl-resume-parser to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sukhrobnurali/qwen3vl-resume-parser to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for sukhrobnurali/qwen3vl-resume-parser to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="sukhrobnurali/qwen3vl-resume-parser", max_seq_length=2048, ) - Docker Model Runner
How to use sukhrobnurali/qwen3vl-resume-parser with Docker Model Runner:
docker model run hf.co/sukhrobnurali/qwen3vl-resume-parser
qwen3vl-resume-parser
A QLoRA fine-tune of Qwen/Qwen3-VL-8B-Instruct that reads resume/CV page images and returns a fixed 23-field JSON record. Published as merged full weights (BF16 safetensors, ~9B params), so it loads like any standard Qwen3-VL checkpoint — no adapter to attach.
It started as an internal project at Corporate Solutions Group: the production resume parser ran on Qwen2.5-VL-32B (8-bit, ~50 GB VRAM) behind a vLLM server, and the goal was a smaller model that kept parsing quality while cutting GPU cost. This repo is the public, portfolio version of that work. The training data is a private internal dataset and is not redistributed; all code (data prep, training, eval) is open at github.com/sukhrobnurali/resume-trainer.
TL;DR
- Base:
Qwen/Qwen3-VL-8B-Instruct(QLoRA, then merged → BF16). - Task: resume page image(s) → structured JSON (23 fields: identity, contact, skills, experiences, educations, languages, certificates, projects, preferences).
- Why fine-tune: the 23-field schema and the project's formatting rules are baked into the weights, so a one-line prompt replaces the ~280-line schema prompt the 32B base needed.
- Measured (full 51-sample held-out split, A100, BF16, greedy): 83.9% weighted score, 88.2% unweighted, 88.2% JSON-valid. See Evaluation for the honest caveats.
- Footprint: ~23 GB VRAM in BF16 at 16K context (vs. ~50 GB for the 32B it replaces).
Intended use
Extracting structured data from resume/CV documents rendered to images (PDF → PNG per page). The model is tuned for a specific downstream schema (below) used by a recruiting/ATS pipeline, including its enum vocabularies (PascalCase country names, a fixed list of roles/technologies/industries). It is most useful when you want one model call to turn a resume into a database-ready record.
It is not a general document-VQA model and should not be used to make automated decisions about candidates — see Out-of-scope.
Input / output schema
Input: one or more page images of a single resume, plus the short instruction the model was trained with (see How to use).
Output: a single JSON object with 23 top-level fields. Scalars are null when absent;
list fields default to []; address defaults to {country_name, region_name}.
| Field | Type | Notes |
|---|---|---|
first_name, last_name |
string | |
email, phone |
string | |
date_of_birth |
string | YYYY-MM-DD |
desired_position |
string | mapped to a fixed role vocabulary |
about |
string | free-text summary |
job_experience |
number | total years |
job_expectations, min_salary, max_salary |
string / number | |
ready_to_relocation |
bool | |
work_modes, employment_types, employment_durations |
string[] | enum values |
hobbies |
string | |
address |
object | {country_name, region_name} |
skills |
object[] | {skill_name, level} |
experiences |
object[] | {company_name, job, date_from, date_to, description, country_name} |
educations |
object[] | {name, degree, location, programme, date_from, date_to, country_name} |
languages |
object[] | {language_name, level} (level is an int) |
certificates |
object[] | {certificate_name, certificate_programme, issuing_date, expiring_date} |
projects |
object[] | {title, summary, used_technologies[], role, industries[]} |
Dates are normalized to YYYY-MM-DD (year-only ranges expand to Jan 1 / Dec 31; ongoing
roles set date_to: null). Classification fields (desired_position, project role /
used_technologies / industries, and all country_name fields) are mapped to predefined
option lists, falling back to "Other" when nothing matches.
Real (anonymized) output example:
{
"first_name": "Jane",
"last_name": "Doe",
"date_of_birth": null,
"email": "jane@example.com",
"phone": "+1-555-0100",
"desired_position": "Android Developer",
"about": null,
"job_experience": null,
"job_expectations": null,
"min_salary": null,
"max_salary": null,
"ready_to_relocation": false,
"work_modes": [],
"employment_types": [],
"employment_durations": [],
"hobbies": null,
"address": { "country_name": "Uzbekistan", "region_name": "Tashkent" },
"skills": [
{ "skill_name": "Android Development", "level": null },
{ "skill_name": "Kotlin", "level": null },
{ "skill_name": "Firebase", "level": null }
],
"experiences": [
{
"company_name": "Android Development Course",
"job": "Student / Trainee (Android Development)",
"date_from": "2021-01-01",
"date_to": null,
"description": "Android development course focused on Java/Kotlin/Android.",
"country_name": null
}
],
"languages": [
{ "language_name": "Uzbek", "level": 6 },
{ "language_name": "English", "level": 2 },
{ "language_name": "Russian", "level": 0 }
],
"educations": [
{
"name": "Tashkent University of Information Technologies",
"degree": "Bachelor",
"location": "Tashkent",
"programme": "E-Commerce",
"date_from": null,
"date_to": "2019-01-01",
"country_name": "Uzbekistan"
}
],
"certificates": [],
"projects": [
{
"title": "Wallpaper App",
"summary": "Wallpaper app based on MVVM, Coin, Flow, Retrofit.",
"used_technologies": ["Kotlin", "Other"],
"role": "Mobile Developer(IOS/Android)",
"industries": ["Other"]
}
]
}
Training data
- 513 human-verified resume samples (private internal dataset). Each sample is a PDF rendered to one or more page PNGs plus a verified ground-truth JSON record.
- Split: 462 train / 51 held-out eval, 90/10, fixed seed
42. Samples whose estimated token length exceeded ~15.2K (1K below the 16,384 context budget) were dropped from training, so the effective training count is ≤462. - Page distribution: 276 single-page, 136 two-page, 101 three-or-more-page (up to 8).
- Language: predominantly English; some records contain non-English values (e.g. Russian/Uzbek company or language names).
The dataset is not released. Code to rebuild splits and bundles is in the repo
(src/data_prep.py, src/export_eval_bundle.py).
Training procedure
QLoRA via Unsloth (FastVisionModel) + TRL SFTTrainer. The 4-bit base
(unsloth/Qwen3-VL-8B-Instruct-unsloth-bnb-4bit, nf4) was adapted with LoRA on both the
vision and language towers (attention + MLP modules), then the adapter was merged back into
the full model and published.
Each training example is a single user turn — the page images followed by the combined system+user instruction — with the ground-truth JSON as the assistant target. There is no separate system role; this is why inference uses the same short prompt.
Dtype note: the merge used Unsloth's
merged_16bit, and the original upload was labeled "float16", but the publishedconfig.jsonand stored tensors are bfloat16. Treat this model as BF16.
Hyperparameters
| Hyperparameter | Value |
|---|---|
| Method | QLoRA (4-bit nf4 base + LoRA, merged after training) |
| LoRA rank / alpha / dropout | 16 / 16 / 0 |
| Target modules | vision + language layers, attention + MLP (bias="none", no rslora) |
| Learning rate | 2e-4 |
| LR scheduler / warmup | cosine / 10 steps |
| Optimizer | adamw_8bit |
| Weight decay | 0.01 |
| Per-device batch / grad-accum | 1 / 4 (effective batch 4) |
| Epochs | 1 |
| Max sequence length | 16,384 |
| Precision | bf16 (fp16 fallback if unsupported) |
| Seed | 3407 |
| Hardware | Google Colab L4 (24 GB) |
Training time and final loss were not captured from the run.
Evaluation
Measured on 2026-06-05 with notebooks/eval_finetuned.ipynb against the held-out split,
using the project's field-weighted scorer (src/evaluation.py). Setup: the published BF16
weights on a single A100, greedy decoding (do_sample=False, max_new_tokens=4096), on
the full 51-sample held-out split.
| Metric | Result |
|---|---|
| Overall weighted score | 83.9% |
| Overall unweighted score | 88.2% |
| JSON validity | 88.2% (45/51 parsed; 6 failures) |
| Avg. inference | ~92.0 s/resume |
| Peak VRAM | 23.4 GB |
Per-field accuracy (worst → best):
| Field | Acc | Field | Acc |
|---|---|---|---|
| skills | 67.5% | ready_to_relocation | 88.2% |
| phone | 74.5% | certificates | 90.8% |
| desired_position | 79.2% | projects | 91.0% |
| address | 81.2% | job_expectations | 92.7% |
| experiences | 81.7% | hobbies | 96.1% |
| first_name | 82.3% | date_of_birth | 98.0% |
| last_name | 82.3% | work_modes | 98.0% |
| 84.3% | employment_types | 98.0% | |
| job_experience | 84.3% | employment_durations | 98.0% |
| educations | 84.5% | min_salary | 100.0% |
| languages | 87.2% | max_salary | 100.0% |
| about | 88.2% |
Read these numbers with the following caveats:
- Full held-out split, single run. These are all 51 held-out samples with greedy decoding — a real measurement, but one run on a modest test set, not a large benchmark.
- Partial-credit metric. The scorer uses fuzzy string ratios, date/numeric tolerances,
and greedy best-match over object arrays, with fields weighted by importance (work
experience is weighted highest). It is not strict exact-match and is not comparable
to other parsers' published numbers — it is an internal quality signal. The weighted score
(83.9%) is below the unweighted (88.2%) because the highest-weighted fields —
experiences,skills, identity/contact — are also the hardest ones. - The top-scoring fields are mostly "correctly empty."
min_salary/max_salary(100%) anddate_of_birth,work_modes,employment_types,employment_durations(~98%) are almost always absent in this data, so high scores largely reflect correctly returning empty — not hard extraction. - 6/51 invalid JSON (~12%). Most likely 4096-token truncation on long multi-page resumes; downstream code must handle un-parseable output (retry, repair, or shorter prompts).
For context, the model-selection benchmark that led to Qwen3-VL-8B (base models, ~10 samples,
not reproducible from committed outputs) is noted in the repo's SESSION_LOG.md; it is not a
fine-tuned result and is excluded here.
How to use
Requires a recent transformers (≥4.57 for Qwen3-VL; latest recommended). The published
processor carries the correct chat template, so the modern image-in-messages path works
without extra utilities.
# pip install -U transformers accelerate
import json
from transformers import AutoProcessor, Qwen3VLForConditionalGeneration
model_id = "sukhrobnurali/qwen3vl-resume-parser"
model = Qwen3VLForConditionalGeneration.from_pretrained(
model_id, dtype="auto", device_map="auto", attn_implementation="sdpa",
)
processor = AutoProcessor.from_pretrained(model_id)
# The 23-field schema is baked into the weights, so the short training prompt is all it needs.
SYSTEM_PROMPT = "You are a resume parser. Extract information from resume images into structured JSON."
USER_PROMPT = "Parse this resume and return the structured JSON."
# One entry per page, top to bottom. "url" accepts a local file path or an http(s) URL.
pages = ["resume_page_1.png", "resume_page_2.png"]
messages = [{
"role": "user",
"content": (
[{"type": "text", "text": SYSTEM_PROMPT}]
+ [{"type": "image", "url": p} for p in pages]
+ [{"type": "text", "text": USER_PROMPT}]
),
}]
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
inputs.pop("token_type_ids", None)
generated = model.generate(**inputs, max_new_tokens=4096, do_sample=False)
trimmed = generated[:, inputs["input_ids"].shape[1]:]
text = processor.batch_decode(trimmed, skip_special_tokens=True)[0]
resume = json.loads(text) # the 23-field record
print(json.dumps(resume, indent=2, ensure_ascii=False))
Use greedy decoding (do_sample=False) for stable structured output. For long multi-page
resumes, raise max_new_tokens if you see truncated JSON.
vLLM serving (the original deployment target):
vllm serve sukhrobnurali/qwen3vl-resume-parser \
--dtype bfloat16 --max-model-len 16384 --trust-remote-code
When calling through the OpenAI-compatible API, pass
extra_body={"chat_template_kwargs": {"enable_thinking": false}} to keep the model in
non-thinking (direct-JSON) mode.
Limitations
- Domain skew. Training resumes skew toward IT/software roles, and the enum vocabularies (roles, technologies, industries) are IT-centric. Expect degradation on non-technical resumes, unusual layouts, scans/photos, or handwriting.
- Language. English-dominant; non-English resumes are under-represented.
- Schema lock-in. The model is tuned to one specific 23-field schema and its enum lists.
It will coerce values toward those vocabularies (including
"Other"), which may not match a different downstream schema. - Invalid JSON happens (~12% on the held-out split). Always parse defensively.
- Latency. ~90 s/resume on an A100 at 16K context — batch/offline, not real-time.
- Quantization. BF16 peaks at ~23 GB VRAM; it runs in 4-bit on a 16 GB GPU, but accuracy was only measured in BF16.
Out-of-scope and responsible use
- No automated candidate decisions. Resume parsing for screening/ranking carries fairness and bias risk. Keep a human in the loop; do not use this model to make or materially influence hiring decisions without review.
- Not a general VQA / OCR model. It is specialized for this resume schema.
- PII. Resumes contain personal data. Handle outputs under the applicable privacy law (e.g. GDPR) — secure storage, access control, retention limits, and a lawful basis for processing.
- Verify before trusting. Outputs are model predictions, not ground truth; validate critical fields (contact info, dates) downstream.
License
Released under Apache-2.0, inherited from the Qwen/Qwen3-VL-8B-Instruct base model.
Citation
@misc{nurali2026qwen3vlresumeparser,
title = {qwen3vl-resume-parser: a Qwen3-VL-8B fine-tune for resume-to-JSON extraction},
author = {Nurali, Sukhrob},
year = {2026},
howpublished = {\url{https://huggingface.co/sukhrobnurali/qwen3vl-resume-parser}}
}
Built on Qwen3-VL by the Qwen team; see the Qwen3-VL model card and Unsloth for the training stack.
Author
Sukhrob Nurali — sukhrobnurali@gmail.com Hugging Face: @sukhrobnurali · GitHub: @sukhrobnurali
- Downloads last month
- 8
Model tree for sukhrobnurali/qwen3vl-resume-parser
Base model
Qwen/Qwen3-VL-8B-Instruct