Donut Receipt Parser

This is a fine-tuned Donut model for receipt parsing.

Model Details

  • Base Model: naver-clova-ix/donut-base
  • Fine-tuned on: Receipt dataset
  • Task: Parse receipts and extract structured information
  • Training Date: 2025-10-13

Usage

from transformers import DonutProcessor, VisionEncoderDecoderModel
from PIL import Image
import torch

# Load model and processor
processor = DonutProcessor.from_pretrained("jvilchesf/donut-base-finetuned-receipts")
model = VisionEncoderDecoderModel.from_pretrained("jvilchesf/donut-base-finetuned-receipts")

# Load and process image
image = Image.open("receipt.jpg").convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values

# Generate
task_prompt = "<parsing>"
decoder_input_ids = processor.tokenizer(
    task_prompt, 
    add_special_tokens=False, 
    return_tensors="pt"
).input_ids

outputs = model.generate(
    pixel_values,
    decoder_input_ids=decoder_input_ids,
    max_length=model.config.decoder.max_length,
    early_stopping=True,
    pad_token_id=processor.tokenizer.pad_token_id,
    eos_token_id=processor.tokenizer.eos_token_id,
    use_cache=True,
    num_beams=1,
    bad_words_ids=[[processor.tokenizer.unk_token_id]],
    return_dict_in_generate=True,
)

# Decode
sequence = processor.batch_decode(outputs.sequences)[0]
sequence = sequence.replace(processor.tokenizer.eos_token, "").replace(processor.tokenizer.pad_token, "")
sequence = sequence.strip()
print(sequence)

Training Configuration

  • Epochs: 30
  • Batch Size: 2 (effective: 4 with gradient accumulation)
  • Learning Rate: 2e-5
  • LR Scheduler: Cosine with warmup
  • Max Length: 1200 tokens
  • Image Size: 720x960

Model Card

For more details, visit the model repository.

Downloads last month
1
Safetensors
Model size
0.2B params
Tensor type
I64
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support