How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MayankLad31/invoice_schema:Q8_0
# Run inference directly in the terminal:
llama-cli -hf MayankLad31/invoice_schema:Q8_0
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf MayankLad31/invoice_schema:Q8_0
# Run inference directly in the terminal:
llama-cli -hf MayankLad31/invoice_schema:Q8_0
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf MayankLad31/invoice_schema:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf MayankLad31/invoice_schema:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf MayankLad31/invoice_schema:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf MayankLad31/invoice_schema:Q8_0
Use Docker
docker model run hf.co/MayankLad31/invoice_schema:Q8_0
Quick Links

Model Description

Extract text in any user specified schema.

Update: I finetuned qwen3.5 0.8B a bit. We now have qwen_finetune.Q8_0.gguf and qwen_finetune.F16-mmproj.gguf that you may use. I found it was already decent and needed just a little finetuning. Used unsloth.

Previously, I took a small 1.5B model fine-tuned with RL (GRPO on Qwen2.5-Coder) and asked it to extract structured JSON from OCR text based on any user-defined schema. You can find the model and the gguf.(100% local). Although not completely happy with my training and it still needs more work but it works! You can ignore that now.

How to Get Started with the Model

Use qwen_finetune.Q8_0.gguf and qwen_finetune.F16-mmproj.gguf. Ignore other files.

Start llama server(for cpu)

llama-server \
  -m qwen_finetune.Q8_0.gguf \
  --mmproj qwen_finetune.F16-mmproj.gguf \
  --host 0.0.0.0 \
  --port 8000 \
  --jinja \
  --reasoning off \
  -ngl 0 \
  -t 4 \
  -n 1024

You can the use openai sdk as follows, you can specify any schema like in the example below for your invoice

from openai import OpenAI
import base64

# Use 'base_url' instead of 'base_client_url'
client = OpenAI(
    base_url="http://localhost:8000/v1", 
    api_key="sk-no-key-required"
)

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

base64_image = encode_image("out.jpeg")

response = client.chat.completions.create(
    model="local-model",
    messages=[
        {
            "role": "user",
            "content": [
            {"type": "text", "text": """Extract the data in JSON format using the schema: `{ "date": "string", "invoice_id": "string","all_items":[//list of items {"description":"string","quantity":"number","unit_price":"number","line_total":"number"}]}`"""},
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
                },
            ],
        }
    ],
    temperature=0.1,
    # 'min_p' is passed via 'extra_body' for OpenAI-compatible local servers
    extra_body={"min_p": 0.1} 
)

print(response.choices[0].message.content)

We asked for Extract the data in JSON format using the schema: { "date": "string", "invoice_id": "string","bill_to":"string" // name and address,"ship_to":"string","all_items":[//list of items {"description":"string","quantity":"number","unit_price":"number","line_total":"number"}],"total":"number"}

Example response for the below image(a random invoice just for testing purposes, I am not the owner) with the code above Example Image

{'date': 'August 20, 2006', 'invoice_id': 'INV1048', 'bill_to': 'C1003, Test Customer Two, 88 WILLIAM Square, Sydney 12345, Australia', 'ship_to': '', 'all_items': [{'description': 'Very long product description that occupies more than 1 line - in fact, it occupies 2 lines', 'quantity': 1, 'unit_price': 199.99, 'line_total': 199.99}, {'description': 'One line product description', 'quantity': 2, 'unit_price': 420.0, 'line_total': 840.0}], 'total': 1140.87}

Connect with me on linkedin if you have an interesting project.

https://www.linkedin.com/in/mayankladdha31/


Previous model(in case you still want to try that,but i would not recommend it): inv.Q8_0.gguf.

Use it in combination with paddleocr. Define any schema and hopefully you get the json. Needs some more work but it still works!

from llama_cpp import Llama
from paddleocr import PaddleOCR
text = ""
ocr = PaddleOCR(use_angle_cls=True, lang='en')
result = ocr.ocr("test_image.jpg", cls=True)
for idx in range(len(result)):
    res = result[idx]
    for line in res:
        text =  text + line[-1][0]+ "\n"
        
llm = Llama(model_path="inv.Q8_0.gguf",n_ctx=2048)

import re

def extract_largest_json_block(text):
    pattern = r"```json\s*(.*?)\s*```"
    blocks = re.findall(pattern, text, re.DOTALL)
    if not blocks:
        return None
    return max(blocks, key=len)


def extract_xml_answer(text: str) -> str:
    answer = text.split("<answer>")[-1]
    answer = answer.split("</answer>")[0]
    return extract_largest_json_block(answer.strip())

messages = [
    {"role": "system", "content": """Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
```json 
```
</answer>"""},
    {"role": "user", "content": f"{text}\n"+"""

Extract the data in JSON format using the schema: 

{
  "invoice_no":"string",
  "issued_to": {
    "name": "string", 
    "address": "string" // Address of the client
  },
  "pay_to": {
    "bank_name": "string",  // Name of the bank
    "name": "string", // Name 
    "account_no": "number" 
  },
  "items":[
      {
        "description": "string",
        "quantity": "number",
        "unit_price": "number",
        "total":"number"
      }
    ],
  "subtotal":"number",
  "total":"number"
} """},
]

output = llm.create_chat_completion(messages,max_tokens=1000)

print(extract_xml_answer(output['choices'][0]['message']['content']))
llm._sampler.close()
llm.close()
Downloads last month
439
Safetensors
Model size
2B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MayankLad31/invoice_schema

Quantized
(6)
this model