Instructions to use liminalstoat/osim-4b-mlx-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use liminalstoat/osim-4b-mlx-4bit with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("liminalstoat/osim-4b-mlx-4bit")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use liminalstoat/osim-4b-mlx-4bit with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "liminalstoat/osim-4b-mlx-4bit"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "liminalstoat/osim-4b-mlx-4bit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use liminalstoat/osim-4b-mlx-4bit with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "liminalstoat/osim-4b-mlx-4bit"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default liminalstoat/osim-4b-mlx-4bit

Run Hermes

hermes

OpenClaw new

How to use liminalstoat/osim-4b-mlx-4bit with OpenClaw:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "liminalstoat/osim-4b-mlx-4bit"

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "liminalstoat/osim-4b-mlx-4bit" \
  --custom-provider-id mlx-lm \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

MLX LM

How to use liminalstoat/osim-4b-mlx-4bit with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "liminalstoat/osim-4b-mlx-4bit"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "liminalstoat/osim-4b-mlx-4bit"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "liminalstoat/osim-4b-mlx-4bit",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

OSim-4B · MLX · 4-bit

A 4-bit MLX build of cmu-lti/osim-4b — CMU's OSim / OdysSim human-behavior-simulation model — for running on Apple Silicon (Mac, iPhone, iPad).

This is the instruct-derived OSim 4B (built on Qwen/Qwen3-4B), so it carries the correct Qwen3 chat template and <|im_end|> stop token — chat works out of the box.

Format: MLX · 4-bit · group size 64
Size: 2.1G
Built with: mlx-lm 0.31.3 on 2026-06-28

What OSim is (read this — it is not an assistant)

OSim (OdysSim) is a family of foundation models for human-behavior simulation from CMU LTI. It's trained to simulate how a person behaves in a conversation — to play the user, not the helpful assistant. Prompt it like a chatbot and it will do un-assistant-like things: ask its own questions, act like someone seeking help, hold a persona. That's the model working as intended. Use it where you want a synthetic human counterpart — dialogue-system testing, user simulation, behavioral data generation.

Base model: cmu-lti/osim-4b (MIT)
Foundation: Qwen/Qwen3-4B (Apache-2.0)
Project / paper: OdysSim — Building Foundation Models for Human Behavior Simulation · code: github.com/sunnweiwei/OdysSim

A note on quality

This is a 4-bit quant of a 4B model, so there's some loss versus full precision — expect occasional arithmetic/reasoning slips and the odd repetition. For more headroom, convert a higher-bit MLX build (5/6/8-bit) from the same source, or run cmu-lti/osim-4b directly on a larger machine. None of this is a prompting problem; it's the 4-bit size trade.

Run it on Mac (Apple Silicon)

pip install mlx-lm
mlx_lm.generate --model liminalstoat/osim-4b-mlx-4bit \
  --prompt "Hi, what can you help me with?" --max-tokens 256

from mlx_lm import load, generate
model, tokenizer = load("liminalstoat/osim-4b-mlx-4bit")
print(generate(model, tokenizer, prompt="Hi, what can you help me with?", max_tokens=256))

The chat template ships with the model, so mlx_lm applies it automatically.

Run it on iPhone / iPad

MLX runs on-device through mlx-swift. The most direct path is Apple's mlx-swift-examples app — point it at this repo or a local copy — or your own mlx-swift harness. Some MLX-based iOS chat apps can also load custom Hugging Face MLX repos; if yours supports adding a model by ID, use liminalstoat/osim-4b-mlx-4bit.

How it was made

source:   cmu-lti/osim-4b   (instruct-derived; Qwen3-4B foundation)
tool:     mlx_lm.convert --quantize --q-bits 4 --q-group-size 64
mlx-lm:   0.31.3

A straight 4-bit MLX conversion of CMU's published weights — no fine-tuning or merging, built from full-precision source (not from a pre-quantized model).

Intended use & limitations

A research / tinkering artifact for on-device human-behavior simulation. It inherits the intended uses and limitations of the base cmu-lti/osim-4b, plus quantization loss. Not validated for production or factual QA. Because it simulates human behavior, outputs can be inconsistent, opinionated, or persona-driven by design.

License & attribution

This quant: MIT, following the base model.
Base: cmu-lti/osim-4b — MIT (CMU LTI).
Foundation: Qwen/Qwen3-4B — Apache-2.0 (Qwen Team).

Citation

OdysSim — Building Foundation Models for Human Behavior Simulation (CMU LTI). Code: github.com/sunnweiwei/OdysSim.
Qwen3 — Qwen Team, Qwen3 Technical Report, arXiv:2505.09388.