PEFT
Safetensors
English
Indonesian
lora
vision-language-model
receipt-extraction
cord-v2
qwen2.5-vl
Instructions to use LinBMS410/receipt_detector with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use LinBMS410/receipt_detector with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct") model = PeftModel.from_pretrained(base_model, "LinBMS410/receipt_detector") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| base_model: Qwen/Qwen2.5-VL-3B-Instruct | |
| tags: | |
| - lora | |
| - peft | |
| - vision-language-model | |
| - receipt-extraction | |
| - cord-v2 | |
| - qwen2.5-vl | |
| language: | |
| - en | |
| - id | |
| datasets: | |
| - naver-clova-ix/cord-v2 | |
| # receipt_detector | |
| LoRA adapter for **Qwen2.5-VL-3B-Instruct** fine-tuned on [CORD-v2](https://huggingface.co/datasets/naver-clova-ix/cord-v2) receipt extraction dataset. | |
| ## Model Details | |
| | Item | Value | | |
| |------|-------| | |
| | Base model | Qwen/Qwen2.5-VL-3B-Instruct | | |
| | Method | LoRA SFT | | |
| | LoRA rank | 8 | | |
| | LoRA alpha | 16 | | |
| | Target modules | q_proj, k_proj, v_proj, o_proj | | |
| | Training epochs | 2 | | |
| | Dataset | CORD-v2 (Indonesian receipts) | | |
| | Task | Structured JSON extraction from receipt images | | |
| ## Usage | |
| ```python | |
| from peft import PeftModel | |
| from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration | |
| from qwen_vl_utils import process_vision_info | |
| import torch | |
| processor = AutoProcessor.from_pretrained("LinBMS410/receipt_detector") | |
| model = Qwen2_5_VLForConditionalGeneration.from_pretrained( | |
| "Qwen/Qwen2.5-VL-3B-Instruct", | |
| torch_dtype=torch.bfloat16, | |
| device_map="auto", | |
| ) | |
| model = PeftModel.from_pretrained(model, "LinBMS410/receipt_detector") | |
| model.eval() | |
| # 推論 | |
| def predict(image): | |
| messages = [ | |
| {"role": "system", "content": [{"type": "text", "text": "你是專業的收據資訊抽取助手。請從圖片中抽取所有結構化資訊,以 JSON 格式輸出。"}]}, | |
| {"role": "user", "content": [{"type": "image", "image": image}, {"type": "text", "text": "請從這張收據圖片中抽取所有資訊,以 JSON 格式輸出。"}]}, | |
| ] | |
| text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| image_inputs, _ = process_vision_info(messages) | |
| inputs = processor(text=[text], images=image_inputs, return_tensors="pt").to(model.device) | |
| with torch.no_grad(): | |
| output_ids = model.generate(**inputs, max_new_tokens=768, do_sample=False) | |
| generated_ids = output_ids[0][len(inputs["input_ids"][0]):] | |
| return processor.tokenizer.decode(generated_ids, skip_special_tokens=True) | |
| ``` | |
| ## Training Details | |
| - Image preprocessing: contrast enhancement (1.5x) + sharpness (2.0x) | |
| - Image token masking: `<|image_pad|>` tokens excluded from loss | |
| - Loss computed on assistant response only | |
| - Optimizer: AdamW 8-bit | |
| - Mixed precision: bfloat16 | |
| - Gradient checkpointing: enabled | |