Instructions to use liminalstoat/osim-4b-mlx-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use liminalstoat/osim-4b-mlx-4bit with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("liminalstoat/osim-4b-mlx-4bit") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use liminalstoat/osim-4b-mlx-4bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "liminalstoat/osim-4b-mlx-4bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "liminalstoat/osim-4b-mlx-4bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use liminalstoat/osim-4b-mlx-4bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "liminalstoat/osim-4b-mlx-4bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default liminalstoat/osim-4b-mlx-4bit
Run Hermes
hermes
- OpenClaw new
How to use liminalstoat/osim-4b-mlx-4bit with OpenClaw:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "liminalstoat/osim-4b-mlx-4bit"
Configure OpenClaw
# Install OpenClaw: npm install -g openclaw@latest # Register the local server and set it as the default model: openclaw onboard --non-interactive --mode local \ --auth-choice custom-api-key \ --custom-base-url http://127.0.0.1:8080/v1 \ --custom-model-id "liminalstoat/osim-4b-mlx-4bit" \ --custom-provider-id mlx-lm \ --custom-compatibility openai \ --custom-text-input \ --accept-risk \ --skip-health
Run OpenClaw
openclaw agent --local --agent main --message "Hello from Hugging Face"
- MLX LM
How to use liminalstoat/osim-4b-mlx-4bit with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "liminalstoat/osim-4b-mlx-4bit"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "liminalstoat/osim-4b-mlx-4bit" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "liminalstoat/osim-4b-mlx-4bit", "messages": [ {"role": "user", "content": "Hello"} ] }'
OSim-4B · MLX · 4-bit
A 4-bit MLX build of cmu-lti/osim-4b — CMU's OSim / OdysSim human-behavior-simulation model — for running on Apple Silicon (Mac, iPhone, iPad).
This is the instruct-derived OSim 4B (built on Qwen/Qwen3-4B), so it carries the correct Qwen3 chat template and <|im_end|> stop token — chat works out of the box.
- Format: MLX · 4-bit · group size 64
- Size: 2.1G
- Built with: mlx-lm
0.31.3on 2026-06-28
What OSim is (read this — it is not an assistant)
OSim (OdysSim) is a family of foundation models for human-behavior simulation from CMU LTI. It's trained to simulate how a person behaves in a conversation — to play the user, not the helpful assistant. Prompt it like a chatbot and it will do un-assistant-like things: ask its own questions, act like someone seeking help, hold a persona. That's the model working as intended. Use it where you want a synthetic human counterpart — dialogue-system testing, user simulation, behavioral data generation.
- Base model:
cmu-lti/osim-4b(MIT) - Foundation:
Qwen/Qwen3-4B(Apache-2.0) - Project / paper: OdysSim — Building Foundation Models for Human Behavior Simulation · code: github.com/sunnweiwei/OdysSim
A note on quality
This is a 4-bit quant of a 4B model, so there's some loss versus full precision — expect occasional arithmetic/reasoning slips and the odd repetition. For more headroom, convert a higher-bit MLX build (5/6/8-bit) from the same source, or run cmu-lti/osim-4b directly on a larger machine. None of this is a prompting problem; it's the 4-bit size trade.
Run it on Mac (Apple Silicon)
pip install mlx-lm
mlx_lm.generate --model liminalstoat/osim-4b-mlx-4bit \
--prompt "Hi, what can you help me with?" --max-tokens 256
from mlx_lm import load, generate
model, tokenizer = load("liminalstoat/osim-4b-mlx-4bit")
print(generate(model, tokenizer, prompt="Hi, what can you help me with?", max_tokens=256))
The chat template ships with the model, so mlx_lm applies it automatically.
Run it on iPhone / iPad
MLX runs on-device through mlx-swift. The most direct path is Apple's mlx-swift-examples app — point it at this repo or a local copy — or your own mlx-swift harness. Some MLX-based iOS chat apps can also load custom Hugging Face MLX repos; if yours supports adding a model by ID, use liminalstoat/osim-4b-mlx-4bit.
How it was made
source: cmu-lti/osim-4b (instruct-derived; Qwen3-4B foundation)
tool: mlx_lm.convert --quantize --q-bits 4 --q-group-size 64
mlx-lm: 0.31.3
A straight 4-bit MLX conversion of CMU's published weights — no fine-tuning or merging, built from full-precision source (not from a pre-quantized model).
Intended use & limitations
A research / tinkering artifact for on-device human-behavior simulation. It inherits the intended uses and limitations of the base cmu-lti/osim-4b, plus quantization loss. Not validated for production or factual QA. Because it simulates human behavior, outputs can be inconsistent, opinionated, or persona-driven by design.
License & attribution
- This quant: MIT, following the base model.
- Base:
cmu-lti/osim-4b— MIT (CMU LTI). - Foundation:
Qwen/Qwen3-4B— Apache-2.0 (Qwen Team).
Citation
- OdysSim — Building Foundation Models for Human Behavior Simulation (CMU LTI). Code: github.com/sunnweiwei/OdysSim.
- Qwen3 — Qwen Team, Qwen3 Technical Report, arXiv:2505.09388.
- Downloads last month
- 36
4-bit