File size: 2,975 Bytes

---
license: apache-2.0
language:
- en
- vi
tags:
- vision-language
- document-ai
- vlm
- ocr
pipeline_tag: image-to-text
---

# Doc2Bit-VL-7B

Doc2Bit-VL-7B is a vision-language model fine-tuned for document understanding.

# 📄 Document Information Extraction VLM

A Vision-Language Model (VLM) specialized in **document understanding and information extraction**, supporting both **unstructured information** and **structured data (tables)** from document images.

This model is optimized for production usage via **vLLM serving** with an **OpenAI-compatible API**.

---
## 🚀 Features
- Vision-Language Model for document images
- Extracts **unstructured key–value information**
- Extracts **structured table data**, including **column-wise extraction**
- Handles complex layouts (forms, invoices, reports, product tables)
- Strict output formatting (no hallucination)
- Compatible with **vLLM OpenAI-style API**
- Prompting optimized for **Vietnamese instructions**
---

## 📌 Supported Data Types
### 1. Unstructured Information

Extract specific fields defined by the user, such as:
- Invoice number
- Date
- Company name
- Address
- Total amount
- Custom document attributes

---
### 2. Structured Table Data

Designed for extracting **individual columns** from tables, especially product tables.
Capabilities:
- Column-level extraction
- Ignore non-product rows
- Markdown-formatted output
- Clean and deterministic structure

---

## 🔧 Deployment (vLLM)
This model is intended to be deployed using **vLLM** with an OpenAI-compatible interface.
Example:
```bash
vllm serve <model-path-or-name> \
  --served-model-name document-vlm \
  --port 8000
```
## Prompt Usage
Unstructured Data Extraction Prompt Example

```bash
prompt = f"""QUERY Trích xuất thông tin: {field_names}.
            INSTRUCTION:
            - Bắt buộc dữ liệu trả về theo format <index>. <key>:<value>, trong đó <index> là số thứ tự (1, 2, 3, 4, ...)
            - key lấy chính xác từ trong QUERY của tôi
            - không tự bịa dữ liệu và coi đó là điều hiển nhiên
            - nếu không thể trích xuất thì hãy trả lời: tôi không thể tìm thấy dữ liệu này
            """
```
Expected Output
```bash
1. Số hóa đơn: INV-001
2. Ngày phát hành: 12/03/2024
3. Tổng tiền: 1.250.000 VND
```
If data cannot be extracted:
```bash
tôi không thể tìm thấy dữ liệu này
```
Structured Table (Column-wise) Extraction Prompt Example

```bash
prompt = (
    f"trích xuất thông tin tương ứng với sản phẩm của cột {col_name} trong bảng sản phẩm.\n"
    "INSTRUCTION:\n"
    "Xuất kết quả dưới dạng markdown một cột.\n"
    "Bỏ qua những hàng không phải sản phẩm.\n"
    f"Yêu cầu tiêu đề cột là |{col_name}|.\n"
)
```
Expected Output
```bash
|Tên sản phẩm|
|------------|
|Sản phẩm A|
|Sản phẩm B|
|Sản phẩm C|
```