OSim-8B (post-trained, text)

OSim-8B is the post-trained (text) checkpoint of OSim (OdysSim), a foundation model for human behavior simulation — trained to imitate the human / user side of interactions rather than to behave as a helpful assistant. It is the text counterpart of the midtrained cmu-lti/osim-8b-mid: Qwen/Qwen3-8B midtrained on the OdysSim corpus (62 behavioral datasets along the five Soul axes — CONV/SS/COG/ROLE/EVAL) and then post-trained (task-specific RL + expert consolidation).

(Mirror of sunweiwei/OSim-8B, the text post model. For the VL variant see sunweiwei/OSim-Inst-8B.)

Intended use

Simulating the human/user side of conversations — user simulation for agent evaluation, social simulation, persona / role-play. Conditioned on a "social-context" system prompt (who is speaking: role, goal, background, style); given the other party's turns it generates the next human turn.

Results

Evaluated out-of-distribution as the user simulator in the τ-USI agentic benchmark (τ-bench airline+retail, 165 tasks, fixed GPT-5.2 agent), OSim-8B reaches USI 75.6 — the best behavioral / specialized user simulator, surpassing same-size general instruct models and every prior specialized simulator (CoSER-8B 67.2, UserLM-8B 62.0). It is distinctively human-like in reactivity (Sørensen–Dice D4 ≈ 93, matching the human inter-annotator level) and in outcome calibration (best ECE among compared models), with essentially none of the long-horizon agentic failure modes (timeouts/perseveration) seen in non-behavioral baselines.

Training

  • Base: Qwen3-8B
  • Stages: midtraining on the OdysSim corpus → task-specific reinforcement learning + expert consolidation.

Citation

If you use this model, please cite the OdysSim paper (Building Foundation Models for Human Behavior Simulation). Code: https://github.com/sunnweiwei/OdysSim

Downloads last month
225
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cmu-lti/osim-8b

Finetuned
Qwen/Qwen3-8B
Finetuned
(1718)
this model

Collection including cmu-lti/osim-8b