OSim-4B · MLX · 4-bit

A 4-bit MLX build of cmu-lti/osim-4b — CMU's OSim / OdysSim human-behavior-simulation model — for running on Apple Silicon (Mac, iPhone, iPad).

This is the instruct-derived OSim 4B (built on Qwen/Qwen3-4B), so it carries the correct Qwen3 chat template and <|im_end|> stop token — chat works out of the box.

  • Format: MLX · 4-bit · group size 64
  • Size: 2.1G
  • Built with: mlx-lm 0.31.3 on 2026-06-28

What OSim is (read this — it is not an assistant)

OSim (OdysSim) is a family of foundation models for human-behavior simulation from CMU LTI. It's trained to simulate how a person behaves in a conversation — to play the user, not the helpful assistant. Prompt it like a chatbot and it will do un-assistant-like things: ask its own questions, act like someone seeking help, hold a persona. That's the model working as intended. Use it where you want a synthetic human counterpart — dialogue-system testing, user simulation, behavioral data generation.

A note on quality

This is a 4-bit quant of a 4B model, so there's some loss versus full precision — expect occasional arithmetic/reasoning slips and the odd repetition. For more headroom, convert a higher-bit MLX build (5/6/8-bit) from the same source, or run cmu-lti/osim-4b directly on a larger machine. None of this is a prompting problem; it's the 4-bit size trade.

Run it on Mac (Apple Silicon)

pip install mlx-lm
mlx_lm.generate --model liminalstoat/osim-4b-mlx-4bit \
  --prompt "Hi, what can you help me with?" --max-tokens 256
from mlx_lm import load, generate
model, tokenizer = load("liminalstoat/osim-4b-mlx-4bit")
print(generate(model, tokenizer, prompt="Hi, what can you help me with?", max_tokens=256))

The chat template ships with the model, so mlx_lm applies it automatically.

Run it on iPhone / iPad

MLX runs on-device through mlx-swift. The most direct path is Apple's mlx-swift-examples app — point it at this repo or a local copy — or your own mlx-swift harness. Some MLX-based iOS chat apps can also load custom Hugging Face MLX repos; if yours supports adding a model by ID, use liminalstoat/osim-4b-mlx-4bit.

How it was made

source:   cmu-lti/osim-4b   (instruct-derived; Qwen3-4B foundation)
tool:     mlx_lm.convert --quantize --q-bits 4 --q-group-size 64
mlx-lm:   0.31.3

A straight 4-bit MLX conversion of CMU's published weights — no fine-tuning or merging, built from full-precision source (not from a pre-quantized model).

Intended use & limitations

A research / tinkering artifact for on-device human-behavior simulation. It inherits the intended uses and limitations of the base cmu-lti/osim-4b, plus quantization loss. Not validated for production or factual QA. Because it simulates human behavior, outputs can be inconsistent, opinionated, or persona-driven by design.

License & attribution

Citation

  • OdysSimBuilding Foundation Models for Human Behavior Simulation (CMU LTI). Code: github.com/sunnweiwei/OdysSim.
  • Qwen3 — Qwen Team, Qwen3 Technical Report, arXiv:2505.09388.
Downloads last month
36
Safetensors
Model size
0.6B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for liminalstoat/osim-4b-mlx-4bit

Finetuned
Qwen/Qwen3-4B
Finetuned
cmu-lti/osim-4b
Quantized
(1)
this model

Paper for liminalstoat/osim-4b-mlx-4bit