Instructions to use omercelik/Trace-Inverter-4B-MLX-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use omercelik/Trace-Inverter-4B-MLX-8bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("omercelik/Trace-Inverter-4B-MLX-8bit") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use omercelik/Trace-Inverter-4B-MLX-8bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "omercelik/Trace-Inverter-4B-MLX-8bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "omercelik/Trace-Inverter-4B-MLX-8bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use omercelik/Trace-Inverter-4B-MLX-8bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "omercelik/Trace-Inverter-4B-MLX-8bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default omercelik/Trace-Inverter-4B-MLX-8bit
Run Hermes
hermes
- MLX LM
How to use omercelik/Trace-Inverter-4B-MLX-8bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "omercelik/Trace-Inverter-4B-MLX-8bit"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "omercelik/Trace-Inverter-4B-MLX-8bit" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "omercelik/Trace-Inverter-4B-MLX-8bit", "messages": [ {"role": "user", "content": "Hello"} ] }'
Trace-Inverter-4B-MLX-8bit
This is an 8-bit MLX conversion of Jackrong/Trace-Inverter-4B, a Qwen3-based trace inversion model.
The model is intended to reconstruct a detailed synthetic reasoning trace from:
Problem + Model final answer + Reasoning Bubbles
The original weights are BF16. This MLX version was converted with mlx-lm using 8-bit affine quantization with group size 64.
Use With MLX
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("omercelik/Trace-Inverter-4B-MLX-8bit")
messages = [
{
"role": "system",
"content": (
"You are a trace inversion model. Given a problem, a final answer, "
"and several compressed reasoning bubbles, reconstruct a detailed "
"reasoning trace that could plausibly lead to the final answer."
),
},
{
"role": "user",
"content": """Problem:
If a pizza needs 10 cups of water, 16 cups of flour, and salt equal to half the flour amount, what is the combined total?
Model final answer:
34 cups.
Reasoning Bubbles:
I need to calculate the salt first because it is defined as half of the flour amount. Then I should add water, flour, and salt together to get the combined total.
Reconstruct the full reasoning trace.""",
},
]
prompt = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_dict=False,
)
response = generate(
model,
tokenizer,
prompt=prompt,
max_tokens=512,
verbose=True,
)
Notes
The source checkpoint stores PEFT-style LoRA-wrapped tensors inside the safetensors files. For MLX compatibility, the LoRA tensors were merged into plain model weights before conversion. The inferred LoRA scale used for the merge was 1.0.
The source model card notes that outputs may occasionally include stray tool tags such as <tool_call>. Post-processing is recommended when generating datasets.
Generated traces are synthetic reasoning traces. They should not be treated as recovered hidden chain-of-thought from any closed model.
- Downloads last month
- 28
8-bit
Model tree for omercelik/Trace-Inverter-4B-MLX-8bit
Base model
Qwen/Qwen3-4B-Instruct-2507