doc-extractor-vl
Document data extraction model based on Qwen2.5-VL-7B-Instruct, configured for structured JSON output from document images (invoices, forms, receipts, etc.).
Key Features
- Cyrillic-free output: Includes pre-computed logit bias file that blocks all 4129 Cyrillic tokens, preventing Cyrillic/Latin script confusion common in multilingual VL models
- Structured JSON output: System prompt enforces JSON-only responses
- Multilingual: Optimized for Slovenian, English, German, Croatian and other Latin-script languages
Files
| File | Description |
|---|---|
cyrillic_logit_bias.json |
4129 token IDs with bias -100 to block Cyrillic generation |
system_prompt.txt |
System prompt template for document extraction |
serving_config.yaml |
Recommended vLLM serving parameters |
generate_cyrillic_bias.py |
Script to regenerate the logit bias file |
Usage with vLLM
Serving
vllm serve mikrografija/doc-extractor-vl --max-model-len 4096
Request with Cyrillic blocking
import json
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")
# Load Cyrillic logit bias
with open("cyrillic_logit_bias.json") as f:
cyrillic_bias = {int(k): v for k, v in json.load(f).items()}
# Load system prompt
with open("system_prompt.txt") as f:
system_prompt = f.read()
response = client.chat.completions.create(
model="mikrografija/doc-extractor-vl",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": [
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."} },
{"type": "text", "text": "Extract data into this JSON schema: {\"issuer\": \"\", \"date\": \"\", \"total\": \"\", \"items\": []}"}
]}
],
logit_bias=cyrillic_bias,
temperature=0.0,
max_tokens=4096,
)
Why Cyrillic Blocking?
Qwen2.5-VL models are trained on multilingual data including Cyrillic scripts. When processing Latin-script documents (especially Slovenian, Croatian, or other languages with diacritics), the model occasionally substitutes Latin characters with visually similar Cyrillic characters (e.g., Latin "a" → Cyrillic "а"). The logit bias approach blocks this at the decoding level, making it impossible for the model to generate Cyrillic tokens.
Base Model
This model uses unmodified Qwen2.5-VL-7B-Instruct weights. No fine-tuning was applied. The configuration files provide the Cyrillic blocking and structured output enforcement.
- Downloads last month
- -
Model tree for mikrografija/doc-extractor-vl
Base model
Qwen/Qwen2.5-VL-7B-Instruct