PEFT
Safetensors
English
Indonesian
lora
vision-language-model
receipt-extraction
cord-v2
qwen2.5-vl
Instructions to use LinBMS410/receipt_detector with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use LinBMS410/receipt_detector with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct") model = PeftModel.from_pretrained(base_model, "LinBMS410/receipt_detector") - Notebooks
- Google Colab
- Kaggle
metadata
license: apache-2.0
base_model: Qwen/Qwen2.5-VL-3B-Instruct
tags:
- lora
- peft
- vision-language-model
- receipt-extraction
- cord-v2
- qwen2.5-vl
language:
- en
- id
datasets:
- naver-clova-ix/cord-v2
receipt_detector
LoRA adapter for Qwen2.5-VL-3B-Instruct fine-tuned on CORD-v2 receipt extraction dataset.
Model Details
| Item | Value |
|---|---|
| Base model | Qwen/Qwen2.5-VL-3B-Instruct |
| Method | LoRA SFT |
| LoRA rank | 8 |
| LoRA alpha | 16 |
| Target modules | q_proj, k_proj, v_proj, o_proj |
| Training epochs | 2 |
| Dataset | CORD-v2 (Indonesian receipts) |
| Task | Structured JSON extraction from receipt images |
Usage
from peft import PeftModel
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
from qwen_vl_utils import process_vision_info
import torch
processor = AutoProcessor.from_pretrained("LinBMS410/receipt_detector")
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen2.5-VL-3B-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, "LinBMS410/receipt_detector")
model.eval()
# 推論
def predict(image):
messages = [
{"role": "system", "content": [{"type": "text", "text": "你是專業的收據資訊抽取助手。請從圖片中抽取所有結構化資訊,以 JSON 格式輸出。"}]},
{"role": "user", "content": [{"type": "image", "image": image}, {"type": "text", "text": "請從這張收據圖片中抽取所有資訊,以 JSON 格式輸出。"}]},
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, _ = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, return_tensors="pt").to(model.device)
with torch.no_grad():
output_ids = model.generate(**inputs, max_new_tokens=768, do_sample=False)
generated_ids = output_ids[0][len(inputs["input_ids"][0]):]
return processor.tokenizer.decode(generated_ids, skip_special_tokens=True)
Training Details
- Image preprocessing: contrast enhancement (1.5x) + sharpness (2.0x)
- Image token masking:
<|image_pad|>tokens excluded from loss - Loss computed on assistant response only
- Optimizer: AdamW 8-bit
- Mixed precision: bfloat16
- Gradient checkpointing: enabled