rabbit

update

6381f12 about 1 month ago

2.98 kB

license: apache-2.0
language:
  - en
  - vi
tags:
  - vision-language
  - document-ai
  - vlm
  - ocr
pipeline_tag: image-to-text

Doc2Bit-VL-7B

Doc2Bit-VL-7B is a vision-language model fine-tuned for document understanding.

📄 Document Information Extraction VLM

A Vision-Language Model (VLM) specialized in document understanding and information extraction, supporting both unstructured information and structured data (tables) from document images.

This model is optimized for production usage via vLLM serving with an OpenAI-compatible API.

🚀 Features

Vision-Language Model for document images
Extracts unstructured key–value information
Extracts structured table data, including column-wise extraction
Handles complex layouts (forms, invoices, reports, product tables)
Strict output formatting (no hallucination)
Compatible with vLLM OpenAI-style API
Prompting optimized for Vietnamese instructions

📌 Supported Data Types

1. Unstructured Information

Extract specific fields defined by the user, such as:

Invoice number
Date
Company name
Address
Total amount
Custom document attributes

2. Structured Table Data

Designed for extracting individual columns from tables, especially product tables. Capabilities:

Column-level extraction
Ignore non-product rows
Markdown-formatted output
Clean and deterministic structure

🔧 Deployment (vLLM)

This model is intended to be deployed using vLLM with an OpenAI-compatible interface. Example:

vllm serve <model-path-or-name> \
  --served-model-name document-vlm \
  --port 8000

Prompt Usage

Unstructured Data Extraction Prompt Example

prompt = f"""QUERY Trích xuất thông tin: {field_names}.
            INSTRUCTION:
            - Bắt buộc dữ liệu trả về theo format <index>. <key>:<value>, trong đó <index> là số thứ tự (1, 2, 3, 4, ...)
            - key lấy chính xác từ trong QUERY của tôi
            - không tự bịa dữ liệu và coi đó là điều hiển nhiên
            - nếu không thể trích xuất thì hãy trả lời: tôi không thể tìm thấy dữ liệu này
            """

Expected Output

1. Số hóa đơn: INV-001
2. Ngày phát hành: 12/03/2024
3. Tổng tiền: 1.250.000 VND

If data cannot be extracted:

tôi không thể tìm thấy dữ liệu này

Structured Table (Column-wise) Extraction Prompt Example

prompt = (
    f"trích xuất thông tin tương ứng với sản phẩm của cột {col_name} trong bảng sản phẩm.\n"
    "INSTRUCTION:\n"
    "Xuất kết quả dưới dạng markdown một cột.\n"
    "Bỏ qua những hàng không phải sản phẩm.\n"
    f"Yêu cầu tiêu đề cột là |{col_name}|.\n"
)

Expected Output

|Tên sản phẩm|
|------------|
|Sản phẩm A|
|Sản phẩm B|
|Sản phẩm C|