Instructions to use jason-schulz/Carnice-9b-MLX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jason-schulz/Carnice-9b-MLX with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("jason-schulz/Carnice-9b-MLX")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use jason-schulz/Carnice-9b-MLX with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "jason-schulz/Carnice-9b-MLX"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "jason-schulz/Carnice-9b-MLX"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use jason-schulz/Carnice-9b-MLX with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "jason-schulz/Carnice-9b-MLX"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default jason-schulz/Carnice-9b-MLX

Run Hermes

hermes

OpenClaw new

How to use jason-schulz/Carnice-9b-MLX with OpenClaw:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "jason-schulz/Carnice-9b-MLX"

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "jason-schulz/Carnice-9b-MLX" \
  --custom-provider-id mlx-lm \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

MLX LM

How to use jason-schulz/Carnice-9b-MLX with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "jason-schulz/Carnice-9b-MLX"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "jason-schulz/Carnice-9b-MLX"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "jason-schulz/Carnice-9b-MLX",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Carnice-9b-MLX

4-bit MLX quantization of kai-os/Carnice-9b.

This conversion was produced using mlx_lm.convert for use with MLX on Apple Silicon.

Original Model

Carnice-9b by kai-os is a standalone merged model built on Qwen/Qwen3.5-9B, tuned specifically for the Hermes Agent harness. It was trained in two stages:

Stage A — reasoning repair pass on high-signal reasoning data
Stage B — Hermes-specific refresh pass built around harness-native traces and action structure

The model is optimized for terminal-heavy task execution, file editing, structured tool use, browser-assisted agent behavior, and multi-turn tool calling inside the Hermes runtime.

See the original model card for full training details and data sources.

Quantization Details

Property	Value
Quantization	4-bit
Group size	64
Mode	Affine
Original dtype	bfloat16
File size	~4.7 GB

Model Architecture

Property	Value
Architecture	Qwen3_5ForCausalLM
Parameters	~8.95B
Hidden layers	32
Hidden size	4096
Attention heads	16 (4 KV heads)
Head dim	256
Intermediate size	12288
Max context length	262,144 tokens (256K)
Vocab size	248,320
Attention	Hybrid (linear + full, every 4th layer is full attention)

Features

Reasoning — supports <think> / </think> tags for chain-of-thought reasoning
Tool calling — Hermes-style tool use with <tool_call> formatting
Long context — 256K token context window
Multimodal tokens — tokenizer includes vision and audio special tokens (inherited from Qwen3.5 base)

Benchmarks

Sampled evaluation (30 items per benchmark) on the 4-bit MLX quantized model:

Benchmark	Accuracy	Correct	Total	Sampled From
MMLU	66.7%	20	30	14,042
HellaSwag	90.0%	27	30	10,042
TruthfulQA	86.7%	26	30	817
HumanEval	83.3%	25	30	164
LiveCodeBench	30.0%	9	30	1,055

Note: These are sampled results (30 items each), not full benchmark runs. They give a rough signal of quantized model quality but should not be compared directly to full-run scores.

Usage

With mlx-lm

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("jason-schulz/Carnice-9b-MLX")

prompt = "Explain the difference between linear and full attention."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)

response = generate(
    model,
    tokenizer,
    prompt=text,
    max_tokens=512,
)
print(response)

With MLX chat

mlx_lm.chat --model jason-schulz/Carnice-9b-MLX

Conversion

Produced with:

mlx_lm.convert \
    --hf-path kai-os/Carnice-9b \
    --mlx-path Carnice-9b-MLX \
    -q \
    --q-bits 4 \
    --q-group-size 64 \
    --q-mode affine

Conversion Note

The original model's config.json uses model_type: "qwen_3_5_text", which mlx_lm.convert does not recognize. The conversion will fail with an unsupported model type error. To work around this, manually change model_type to "qwen3_5" in the source model's config.json before running the conversion. The resulting model loads and runs correctly under MLX.

Credits

Original model: kai-os/Carnice-9b by kai-os
Base model: Qwen/Qwen3.5-9B by Qwen
MLX conversion: This repository

Downloads last month: 62

Safetensors

Model size

1B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for jason-schulz/Carnice-9b-MLX

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Quantized

(357)

this model