Customs Declaration-Form-Audit VLM

 专为报关单证智能审核优化的多模态视觉语言模型。


 ## 🎯 模型功能

 本模型专注于进出口报关单证的智能审核任务，具备以下核心能力：

 ### 单证信息提取
 - **证书类型识别**：卫生证书、原产地证书、检验报告、合同、发票等
 - **关键字段提取**：证书编号、集装箱号、件数、净重、毛重
 - **商品明细解析**：逐行提取表格数据（商品名称、数量、金额等）
 - **日期信息提取**：签发日期、有效期、生产日期

 ### 表格数据处理
 - 支持复杂多列表格的逐行扫描
 - 准确识别数字、日期、文本混合内容
 - 自动处理表格合并单元格和分栏结构

 ### 多语言OCR
 - 中文、英文、西班牙语、日文、俄文等多语言混合识别
 - 支持手写体和印刷体混合文档
 - 模糊字符智能识别优化

 ### 单证比对审核
 - 比对报关单与随附证书的一致性
 - 识别数据异常和潜在风险点
 - 生成结构化审核结果

 ## 🔧 模型训练

 ### 训练方法
 本模型采用初期审单领域知识注入（CPT）+多阶段监督微调（SFT）+ 2阶段强化学习（RL）**的训练策略：

 1. **视觉-语言对齐阶段**：增强模型对单证图像的理解能力
 2. **领域数据适配阶段**：学习海关报关单证的专业术语和格式
 3. **任务专项优化阶段**：针对表格提取、字段识别等具体任务强化训练
 4. **多任务融合阶段**：综合提升各项审核能力

 ### 训练数据规模
 - 监督学习阶段：约70万条高质量标注样本
 - 强化学习阶段：约15万条审核任务数据
 - 覆盖20+国家/地区的单证格式

 ## 📄 疑难PDF处理能力

 ### 低质量图像优化
 本模型在训练中特别针对实际业务中的疑难PDF进行了优化：
 
 1. 特殊类型的证书编号：

 2. 负责表格数据提取及汇总：

 3. 不规范表格的提取：

 4. 跨页单据的提取累加：

 ### 实测效果

 | 测试场景 | 准确率 | 
 |---------|-------|
 | 证书编号识别 | 99%+ | 
 | 集装箱号提取 | 98%+ |
 | 表格数据提取 | 99%+ | 
 | 件数重量识别 | 99%+ |

 ## 🚀 快速开始

 ### 安装依赖

 ```bash
 pip install transformers torch pillow
 ```

 ### Python推理

 ```python
 from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
 from PIL import Image
 import torch

 # 加载模型
 model = Qwen2VLForConditionalGeneration.from_pretrained(
     "shihao1989/Declaration-Form-Audit",
     torch_dtype=torch.bfloat16,
     device_map="auto"
 )
 processor = AutoProcessor.from_pretrained("shihao1989/Declaration-Form-Audit")

 # 准备输入
 image = Image.open("certificate.jpg")
 messages = [
     {
         "role": "user",
         "content": [
             {"type": "image"},
 {
   "type": "text",
   "text": "请提取这份证书的证书编号、集装箱号、件数和净重。"
 }
         ]
     }
 ]

 # 推理
 text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 inputs = processor(text=[text], images=[image], return_tensors="pt").to("cuda")

 output = model.generate(**inputs, max_new_tokens=512, temperature=0.1)
 result = processor.batch_decode(output, skip_special_tokens=True)[0]
 print(result)
 ```

 ### vLLM部署（生产推荐）

 ```bash
 docker run -d \
   --name declaration-audit \
   --runtime=nvidia \
   -e NVIDIA_VISIBLE_DEVICES=0 \
   --ipc=host \
   -p 8000:8000 \
   vllm/vllm-openai:latest \
   --model shihao1989/Declaration-Form-Audit \
   --trust-remote-code \
   --max-model-len 32000 \
   --gpu-memory-utilization 0.9
 ```

 ### API调用

 ```python
 import requests
 import base64

 with open("certificate.jpg", "rb") as f:
     image_b64 = base64.b64encode(f.read()).decode()

 response = requests.post("http://localhost:8000/v1/chat/completions", json={
     "model": "shihao1989/Declaration-Form-Audit",
     "messages": [
         {
             "role": "user",
             "content": [
                 {"type": "text", "text": "提取证书编号和净重"},
                 {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_b64}"}}
             ]
         }
     ],
     "max_tokens": 512,
     "temperature": 0.1
 })

 print(response.json()["choices"][0]["message"]["content"])
 ```

 ## 💡 最佳实践

 ### Prompt设计建议

 **推荐格式（结构化输出）：**
 ```
 请从这份原产地证书中提取以下字段，返回JSON格式：
 {
   "cert_code": "证书编号",
   "containers": ["集装箱号列表"],
   "packages": 件数（整数）,
   "net_weight_kg": 净重（数字）
 }
 只输出JSON，不要有额外文字。
 ```

 **关键原则：**
 - 明确指定提取字段和格式
 - 提供字段的可能名称（如"证书编号/Certificate No."）
 - 使用JSON等结构化格式便于后处理

 ## 📜 许可证

 本模型遵循 Apache 2.0 许可证。

 ## 🙏 致谢

 - Qwen团队提供的优秀基座模型
 - 海关业务专家提供的领域知识指导

 ## 📮 联系方式

 如有问题或建议，欢迎通过Hugging Face Discussions交流。
 邮箱:199416378@qq.com

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shihao1989/Declaration-Form-Audit

Base model

Qwen/Qwen3-VL-8B-Instruct

Finetuned

(171)

this model