Model Description
Extract text in any user specified schema.
Update: I finetuned qwen3.5 0.8B a bit.
We now have qwen_finetune.Q8_0.gguf and qwen_finetune.F16-mmproj.gguf that you may use.
I found it was already decent and needed just a little finetuning. Used unsloth.
Previously, I took a small 1.5B model fine-tuned with RL (GRPO on Qwen2.5-Coder) and asked it to extract structured JSON from OCR text based on any user-defined schema. You can find the model and the gguf.(100% local). Although not completely happy with my training and it still needs more work but it works! You can ignore that now.
How to Get Started with the Model
Use qwen_finetune.Q8_0.gguf and qwen_finetune.F16-mmproj.gguf. Ignore other files.
Start llama server(for cpu)
llama-server \
-m qwen_finetune.Q8_0.gguf \
--mmproj qwen_finetune.F16-mmproj.gguf \
--host 0.0.0.0 \
--port 8000 \
--jinja \
--reasoning off \
-ngl 0 \
-t 4 \
-n 1024
You can the use openai sdk as follows, you can specify any schema like in the example below for your invoice
from openai import OpenAI
import base64
# Use 'base_url' instead of 'base_client_url'
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="sk-no-key-required"
)
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
base64_image = encode_image("out.jpeg")
response = client.chat.completions.create(
model="local-model",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": """Extract the data in JSON format using the schema: `{ "date": "string", "invoice_id": "string","all_items":[//list of items {"description":"string","quantity":"number","unit_price":"number","line_total":"number"}]}`"""},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{base64_image}"},
},
],
}
],
temperature=0.1,
# 'min_p' is passed via 'extra_body' for OpenAI-compatible local servers
extra_body={"min_p": 0.1}
)
print(response.choices[0].message.content)
We asked for Extract the data in JSON format using the schema: { "date": "string", "invoice_id": "string","bill_to":"string" // name and address,"ship_to":"string","all_items":[//list of items {"description":"string","quantity":"number","unit_price":"number","line_total":"number"}],"total":"number"}
Example response for the below image(a random invoice just for testing purposes, I am not the owner) with the code above

{'date': 'August 20, 2006', 'invoice_id': 'INV1048', 'bill_to': 'C1003, Test Customer Two, 88 WILLIAM Square, Sydney 12345, Australia', 'ship_to': '', 'all_items': [{'description': 'Very long product description that occupies more than 1 line - in fact, it occupies 2 lines', 'quantity': 1, 'unit_price': 199.99, 'line_total': 199.99}, {'description': 'One line product description', 'quantity': 2, 'unit_price': 420.0, 'line_total': 840.0}], 'total': 1140.87}
Connect with me on linkedin if you have an interesting project.
https://www.linkedin.com/in/mayankladdha31/
Previous model(in case you still want to try that,but i would not recommend it):
inv.Q8_0.gguf.
Use it in combination with paddleocr. Define any schema and hopefully you get the json. Needs some more work but it still works!
from llama_cpp import Llama
from paddleocr import PaddleOCR
text = ""
ocr = PaddleOCR(use_angle_cls=True, lang='en')
result = ocr.ocr("test_image.jpg", cls=True)
for idx in range(len(result)):
res = result[idx]
for line in res:
text = text + line[-1][0]+ "\n"
llm = Llama(model_path="inv.Q8_0.gguf",n_ctx=2048)
import re
def extract_largest_json_block(text):
pattern = r"```json\s*(.*?)\s*```"
blocks = re.findall(pattern, text, re.DOTALL)
if not blocks:
return None
return max(blocks, key=len)
def extract_xml_answer(text: str) -> str:
answer = text.split("<answer>")[-1]
answer = answer.split("</answer>")[0]
return extract_largest_json_block(answer.strip())
messages = [
{"role": "system", "content": """Respond in the following format:
<reasoning>
...
</reasoning>
<answer>
```json
```
</answer>"""},
{"role": "user", "content": f"{text}\n"+"""
Extract the data in JSON format using the schema:
{
"invoice_no":"string",
"issued_to": {
"name": "string",
"address": "string" // Address of the client
},
"pay_to": {
"bank_name": "string", // Name of the bank
"name": "string", // Name
"account_no": "number"
},
"items":[
{
"description": "string",
"quantity": "number",
"unit_price": "number",
"total":"number"
}
],
"subtotal":"number",
"total":"number"
} """},
]
output = llm.create_chat_completion(messages,max_tokens=1000)
print(extract_xml_answer(output['choices'][0]['message']['content']))
llm._sampler.close()
llm.close()
- Downloads last month
- 439