Driver License Reader

Donut-based image-to-JSON extraction for driver's license documents.

Transformers Task Weights

driver license OCR alternative | ID card parsing | KYC document extraction | visual document understanding

Open the demo Space

What it does

This model is a fine-tuned Donut VisionEncoderDecoderModel for extracting structured fields from driver's license images without a separate OCR pipeline. It converts an input image into JSON-like fields such as:

  • name
  • state
  • date
  • dob
  • person

The model is intended for demos, prototyping, and research around document AI workflows. It is not a production identity-verification system.

Public-safety notes

  • Do not upload real driver's licenses unless you have permission and a lawful basis to process the data.
  • Prefer synthetic, redacted, or consented images for testing.
  • Outputs can be wrong or incomplete. Use human review before relying on extracted identity fields.
  • The repository keeps the safer model.safetensors weight file and does not require loading Pickle weights.

Quick start

import re
import torch
from PIL import Image
from transformers import DonutProcessor, VisionEncoderDecoderModel

model_id = "lucky-verma/driver-license-reader"
processor = DonutProcessor.from_pretrained(model_id)
model = VisionEncoderDecoderModel.from_pretrained(model_id)

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device).eval()

image = Image.open("redacted_or_synthetic_license.jpg").convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values.to(device)

task_prompt = "<s_cord-v2>"
decoder_input_ids = processor.tokenizer(
    task_prompt,
    add_special_tokens=False,
    return_tensors="pt",
)["input_ids"].to(device)

with torch.inference_mode():
    outputs = model.generate(
        pixel_values,
        decoder_input_ids=decoder_input_ids,
        max_length=model.decoder.config.max_position_embeddings,
        early_stopping=True,
        pad_token_id=processor.tokenizer.pad_token_id,
        eos_token_id=processor.tokenizer.eos_token_id,
        use_cache=True,
        num_beams=1,
        bad_words_ids=[[processor.tokenizer.unk_token_id]],
        return_dict_in_generate=True,
    )

sequence = processor.batch_decode(outputs.sequences)[0]
sequence = sequence.replace(processor.tokenizer.eos_token, "")
sequence = sequence.replace(processor.tokenizer.pad_token, "")
sequence = re.sub(r"<.*?>", "", sequence, count=1).strip()

print(processor.token2json(sequence))

Model details

  • Architecture: Donut / vision encoder-decoder
  • Base model: nielsr/donut-base
  • Format: model.safetensors
  • Input: image of a driver's license-like document
  • Output: structured JSON-style fields
  • Language: English

Limitations

This model was trained on a small driver's-license dataset and may fail on unseen layouts, glare, blur, occlusion, low-resolution scans, non-English documents, or non-US license formats. Treat the output as an extraction suggestion, not a verified identity record.

Downloads last month
28
Safetensors
Model size
0.2B params
Tensor type
I64
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lucky-verma/driver-license-reader

Finetuned
(1)
this model

Space using lucky-verma/driver-license-reader 1