--- license: apache-2.0 base_model: Qwen/Qwen2.5-VL-3B-Instruct tags: - lora - peft - vision-language-model - receipt-extraction - cord-v2 - qwen2.5-vl language: - en - id datasets: - naver-clova-ix/cord-v2 --- # receipt_detector LoRA adapter for **Qwen2.5-VL-3B-Instruct** fine-tuned on [CORD-v2](https://huggingface.co/datasets/naver-clova-ix/cord-v2) receipt extraction dataset. ## Model Details | Item | Value | |------|-------| | Base model | Qwen/Qwen2.5-VL-3B-Instruct | | Method | LoRA SFT | | LoRA rank | 8 | | LoRA alpha | 16 | | Target modules | q_proj, k_proj, v_proj, o_proj | | Training epochs | 2 | | Dataset | CORD-v2 (Indonesian receipts) | | Task | Structured JSON extraction from receipt images | ## Usage ```python from peft import PeftModel from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration from qwen_vl_utils import process_vision_info import torch processor = AutoProcessor.from_pretrained("LinBMS410/receipt_detector") model = Qwen2_5_VLForConditionalGeneration.from_pretrained( "Qwen/Qwen2.5-VL-3B-Instruct", torch_dtype=torch.bfloat16, device_map="auto", ) model = PeftModel.from_pretrained(model, "LinBMS410/receipt_detector") model.eval() # 推論 def predict(image): messages = [ {"role": "system", "content": [{"type": "text", "text": "你是專業的收據資訊抽取助手。請從圖片中抽取所有結構化資訊,以 JSON 格式輸出。"}]}, {"role": "user", "content": [{"type": "image", "image": image}, {"type": "text", "text": "請從這張收據圖片中抽取所有資訊,以 JSON 格式輸出。"}]}, ] text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) image_inputs, _ = process_vision_info(messages) inputs = processor(text=[text], images=image_inputs, return_tensors="pt").to(model.device) with torch.no_grad(): output_ids = model.generate(**inputs, max_new_tokens=768, do_sample=False) generated_ids = output_ids[0][len(inputs["input_ids"][0]):] return processor.tokenizer.decode(generated_ids, skip_special_tokens=True) ``` ## Training Details - Image preprocessing: contrast enhancement (1.5x) + sharpness (2.0x) - Image token masking: `<|image_pad|>` tokens excluded from loss - Loss computed on assistant response only - Optimizer: AdamW 8-bit - Mixed precision: bfloat16 - Gradient checkpointing: enabled