Instructions to use omercelik/Trace-Inverter-4B-MLX-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use omercelik/Trace-Inverter-4B-MLX-8bit with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("omercelik/Trace-Inverter-4B-MLX-8bit")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use omercelik/Trace-Inverter-4B-MLX-8bit with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "omercelik/Trace-Inverter-4B-MLX-8bit"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "omercelik/Trace-Inverter-4B-MLX-8bit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use omercelik/Trace-Inverter-4B-MLX-8bit with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "omercelik/Trace-Inverter-4B-MLX-8bit"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default omercelik/Trace-Inverter-4B-MLX-8bit

Run Hermes

hermes

MLX LM

How to use omercelik/Trace-Inverter-4B-MLX-8bit with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "omercelik/Trace-Inverter-4B-MLX-8bit"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "omercelik/Trace-Inverter-4B-MLX-8bit"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "omercelik/Trace-Inverter-4B-MLX-8bit",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Trace-Inverter-4B-MLX-8bit

This is an 8-bit MLX conversion of Jackrong/Trace-Inverter-4B, a Qwen3-based trace inversion model.

The model is intended to reconstruct a detailed synthetic reasoning trace from:

Problem + Model final answer + Reasoning Bubbles

The original weights are BF16. This MLX version was converted with mlx-lm using 8-bit affine quantization with group size 64.

Use With MLX

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("omercelik/Trace-Inverter-4B-MLX-8bit")

messages = [
    {
        "role": "system",
        "content": (
            "You are a trace inversion model. Given a problem, a final answer, "
            "and several compressed reasoning bubbles, reconstruct a detailed "
            "reasoning trace that could plausibly lead to the final answer."
        ),
    },
    {
        "role": "user",
        "content": """Problem:
If a pizza needs 10 cups of water, 16 cups of flour, and salt equal to half the flour amount, what is the combined total?

Model final answer:
34 cups.

Reasoning Bubbles:
I need to calculate the salt first because it is defined as half of the flour amount. Then I should add water, flour, and salt together to get the combined total.

Reconstruct the full reasoning trace.""",
    },
]

prompt = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_dict=False,
)

response = generate(
    model,
    tokenizer,
    prompt=prompt,
    max_tokens=512,
    verbose=True,
)

Notes

The source checkpoint stores PEFT-style LoRA-wrapped tensors inside the safetensors files. For MLX compatibility, the LoRA tensors were merged into plain model weights before conversion. The inferred LoRA scale used for the merge was 1.0.

The source model card notes that outputs may occasionally include stray tool tags such as <tool_call>. Post-processing is recommended when generating datasets.

Generated traces are synthetic reasoning traces. They should not be treated as recovered hidden chain-of-thought from any closed model.

Downloads last month: 28

Safetensors

Model size

1B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Model tree for omercelik/Trace-Inverter-4B-MLX-8bit

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

Jackrong/Trace-Inverter-4B

Quantized

(1)

this model