Add model card

5c43c50 verified about 2 months ago

2.43 kB

license: apache-2.0
base_model: Qwen/Qwen2.5-VL-3B-Instruct
tags:
  - lora
  - peft
  - vision-language-model
  - receipt-extraction
  - cord-v2
  - qwen2.5-vl
language:
  - en
  - id
datasets:
  - naver-clova-ix/cord-v2

receipt_detector

LoRA adapter for Qwen2.5-VL-3B-Instruct fine-tuned on CORD-v2 receipt extraction dataset.

Model Details

Item	Value
Base model	Qwen/Qwen2.5-VL-3B-Instruct
Method	LoRA SFT
LoRA rank	8
LoRA alpha	16
Target modules	q_proj, k_proj, v_proj, o_proj
Training epochs	2
Dataset	CORD-v2 (Indonesian receipts)
Task	Structured JSON extraction from receipt images

Usage

from peft import PeftModel
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
from qwen_vl_utils import process_vision_info
import torch

processor = AutoProcessor.from_pretrained("LinBMS410/receipt_detector")
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen2.5-VL-3B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, "LinBMS410/receipt_detector")
model.eval()

# 推論
def predict(image):
    messages = [
        {"role": "system", "content": [{"type": "text", "text": "你是專業的收據資訊抽取助手。請從圖片中抽取所有結構化資訊，以 JSON 格式輸出。"}]},
        {"role": "user", "content": [{"type": "image", "image": image}, {"type": "text", "text": "請從這張收據圖片中抽取所有資訊，以 JSON 格式輸出。"}]},
    ]
    text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    image_inputs, _ = process_vision_info(messages)
    inputs = processor(text=[text], images=image_inputs, return_tensors="pt").to(model.device)
    with torch.no_grad():
        output_ids = model.generate(**inputs, max_new_tokens=768, do_sample=False)
    generated_ids = output_ids[0][len(inputs["input_ids"][0]):]
    return processor.tokenizer.decode(generated_ids, skip_special_tokens=True)

Training Details

Image preprocessing: contrast enhancement (1.5x) + sharpness (2.0x)
Image token masking: <|image_pad|> tokens excluded from loss
Loss computed on assistant response only
Optimizer: AdamW 8-bit
Mixed precision: bfloat16
Gradient checkpointing: enabled