Instructions to use NaughtyDog97/DiagramFormalizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use NaughtyDog97/DiagramFormalizer with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="NaughtyDog97/DiagramFormalizer", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("NaughtyDog97/DiagramFormalizer", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use NaughtyDog97/DiagramFormalizer with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "NaughtyDog97/DiagramFormalizer"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NaughtyDog97/DiagramFormalizer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/NaughtyDog97/DiagramFormalizer

SGLang

How to use NaughtyDog97/DiagramFormalizer with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "NaughtyDog97/DiagramFormalizer" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NaughtyDog97/DiagramFormalizer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "NaughtyDog97/DiagramFormalizer" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NaughtyDog97/DiagramFormalizer",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use NaughtyDog97/DiagramFormalizer with Docker Model Runner:
```
docker model run hf.co/NaughtyDog97/DiagramFormalizer
```

Diagram Formalizer

Model Structure:

Alt text

Diagram Encoder: siglip-so400m-patch14-384
Lightweight LLM: Qwen2-0.5B-Instruct

Quick Start

Before running the script, install the following necessary dependencies.

pip install torch==2.4.0 transformers==4.40.0 accelerate pillow sentencepiece

You can use the following script to predict the ConsCDL and ImgCDL for geometric diagram.

import torch
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image
import warnings
import numpy as np

# set device
device = 'cuda'  # or cpu
torch.set_default_device(device)

# create model
model = AutoModelForCausalLM.from_pretrained(
    'NaughtyDog97/DiagramFormalizer',
    torch_dtype=torch.float16, # float32 for cpu
    device_map='auto',
    trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(
    'NaughtyDog97/DiagramFormalizer',
    use_fast=True,
    padding_side="right",
    trust_remote_code=True)

# text prompt
img_path = 'sample/4927.png'
prompt = 'Based on the image, first describe what you see in the figure, then predict the construction_cdl and image_cdl and calibrate it.'
text = f'<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<image>\n{prompt}<|im_end|>\n<|im_start|>assistant\n'

def tokenizer_image_token(prompt, tokenizer, image_token_index, return_tensors=None):
    prompt_chunks = [tokenizer(chunk).input_ids for chunk in prompt.split('<image>')]

    def insert_separator(X, sep):
        return [ele for sublist in zip(X, [sep] * len(X)) for ele in sublist][:-1]

    input_ids = []
    offset = 0
    if len(prompt_chunks) > 0 and len(prompt_chunks[0]) > 0 and prompt_chunks[0][0] == tokenizer.bos_token_id:
        offset = 1
        input_ids.append(prompt_chunks[0][0])

    for x in insert_separator(prompt_chunks, [image_token_index] * (offset + 1)):
        input_ids.extend(x[offset:])

    if return_tensors is not None:
        if return_tensors == 'pt':
            return torch.tensor(input_ids, dtype=torch.long)
        raise ValueError(f'Unsupported tensor type: {return_tensors}')
    return input_ids
    
input_ids = tokenizer_image_token(text, tokenizer, -200, return_tensors='pt').unsqueeze(0).cuda()

# image, sample images can be found in images folder
image = Image.open(img_path).convert('RGB')

image_tensor = model.process_images([image], model.config).to(dtype=model.dtype, device=device)

# generate
with torch.inference_mode():
    output_ids = model.generate(
        input_ids,
        images=image_tensor,
        do_sample=False,
        temperature=None,
        top_p=None,
        top_k=None,
        num_beams=1,
        max_new_tokens=3500,
        eos_token_id=tokenizer.eos_token_id,
        repetition_penalty=None,
        use_cache=True
    )[0]


respones = tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip()
print(respones)

Our model supports the following recognition instrutions:

Natural Language Description:
- Describe what you see in the figure.
- Tell me what you observe in the image.
Predicting ConsCDL only
- Based on the image, predict the construction_cdl.
- Based on the image, predict the construction_cdl and calibrate it.
- Based on the image, first describe what you see in the figure, then predict the construction_cdl.
- Based on the image, first describe what you see in the figure, then predict the construction_cdl and calibrate it.
Predicting ImgCDL only:
- Based on the image, predict the image_cdl.
- Based on the image, predict the image_cdl and calibrate it.
- Based on the image, first describe what you see in the figure, then predict the image_cdl.
- Based on the image, first describe what you see in the figure, then predict the image_cdl and calibrate it.
Predicting construction_cdl and image_cdl simultaneously:
- Based on the image, predict the construction_cdl and image_cdl.
- Based on the image, first predict the construction_cdl and image_cdl and calibrate it.
- Based on the image, first describe what you see in the figure, then predict the construction_cdl and image_cdl.
- Based on the image, first describe what you see in the figure, then predict the construction_cdl and image_cdl and calibrate it.

Performance of Diagram Formalizer on formalgeo7k test set

Model	ConsCdlAcc	ConsCdlPerfect	ImgCdlAcc	ImgCdlPerfect	BothPerfect
Diagram Formalizer	90.25	72.29	92.88	84.38	65.05

Downloads last month: 12

Safetensors

Model size

0.9B params

Tensor type

F16