Donut Receipt Parser
This is a fine-tuned Donut model for receipt parsing.
Model Details
- Base Model: naver-clova-ix/donut-base
- Fine-tuned on: Receipt dataset
- Task: Parse receipts and extract structured information
- Training Date: 2025-10-13
Usage
from transformers import DonutProcessor, VisionEncoderDecoderModel
from PIL import Image
import torch
# Load model and processor
processor = DonutProcessor.from_pretrained("jvilchesf/donut-base-finetuned-receipts")
model = VisionEncoderDecoderModel.from_pretrained("jvilchesf/donut-base-finetuned-receipts")
# Load and process image
image = Image.open("receipt.jpg").convert("RGB")
pixel_values = processor(image, return_tensors="pt").pixel_values
# Generate
task_prompt = "<parsing>"
decoder_input_ids = processor.tokenizer(
task_prompt,
add_special_tokens=False,
return_tensors="pt"
).input_ids
outputs = model.generate(
pixel_values,
decoder_input_ids=decoder_input_ids,
max_length=model.config.decoder.max_length,
early_stopping=True,
pad_token_id=processor.tokenizer.pad_token_id,
eos_token_id=processor.tokenizer.eos_token_id,
use_cache=True,
num_beams=1,
bad_words_ids=[[processor.tokenizer.unk_token_id]],
return_dict_in_generate=True,
)
# Decode
sequence = processor.batch_decode(outputs.sequences)[0]
sequence = sequence.replace(processor.tokenizer.eos_token, "").replace(processor.tokenizer.pad_token, "")
sequence = sequence.strip()
print(sequence)
Training Configuration
- Epochs: 30
- Batch Size: 2 (effective: 4 with gradient accumulation)
- Learning Rate: 2e-5
- LR Scheduler: Cosine with warmup
- Max Length: 1200 tokens
- Image Size: 720x960
Model Card
For more details, visit the model repository.
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support