Instructions to use sunweiwei/Ditto-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sunweiwei/Ditto-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="sunweiwei/Ditto-8B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("sunweiwei/Ditto-8B")
model = AutoModelForImageTextToText.from_pretrained("sunweiwei/Ditto-8B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use sunweiwei/Ditto-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sunweiwei/Ditto-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sunweiwei/Ditto-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/sunweiwei/Ditto-8B

SGLang

How to use sunweiwei/Ditto-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sunweiwei/Ditto-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sunweiwei/Ditto-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sunweiwei/Ditto-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sunweiwei/Ditto-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use sunweiwei/Ditto-8B with Docker Model Runner:
```
docker model run hf.co/sunweiwei/Ditto-8B
```

Ditto-8B / README.md

sunweiwei

Add model card README

cdc13de verified 1 day ago

preview code

raw

history blame contribute delete

4.29 kB

	---
	license: apache-2.0
	base_model:
	- Qwen/Qwen3-VL-8B-Instruct
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- human-simulation
	- role-play
	- social-intelligence
	---

	# Ditto-8B

	Ditto-8B is an 8B open-weight model for human behavior simulation, covering theory of
	mind, character role-play, social skills, learner simulation, user simulation, and persona
	simulation.

	- 📄 Paper: [Reinforcing Human Behavior Simulation via Verbal Feedback](https://arxiv.org/abs/2605.20506)
	- 💻 Code: https://github.com/sunnweiwei/OdysSim
	- 📊 Data (SOUL): https://huggingface.co/datasets/sunweiwei/Soul

	## Method

	Ditto-8B is trained with DITTO, a reinforcement learning method that uses **verbal
	feedback** as the learning signal. After each output, the model receives descriptive feedback
	and produces an improved version; both are jointly optimized with GRPO. This distills the
	verbal guidance into the policy, so no feedback is needed at inference time.

	## Results

	Primary metric for each benchmark (higher is better).

	\| Dim \| Benchmark \| GPT 5.5 \| Gemini 3.1 Pro \| Claude Opus 4.7 \| Qwen 3.6 Plus \| Others* \| Qwen3 8B Inst \| Ditto-8B \|
	\|---\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| CONV \| UserLLM \| 65.3 \| 67.7 \| 57.6 \| 72.1 \| 44.6 \| 46.0 \| 91.5 \|
	\| CONV \| MirrorBench \| 56.7 \| 48.3 \| 63.7 \| 48.0 \| 45.4 \| 54.0 \| 73.4 \|
	\| CONV \| Humanual-Chat \| 28.2 \| 21.0 \| 22.6 \| 22.2 \| 25.8 \| 24.7 \| 21.0 \|
	\| CONV \| SimArena-Doc \| 83.4 \| 83.0 \| 83.5 \| 82.4 \| 83.5 \| 83.6 \| 84.4 \|
	\| SS \| Sotopia-Hard \| 31.9 \| 27.8 \| 32.4 \| 28.3 \| 31.7 \| 27.7 \| 45.8 \|
	\| COG \| Fantom \| 93.0 \| 93.0 \| 80.0 \| 89.0 \| 70.0 \| 23.0 \| 92.0 \|
	\| COG \| Hitom \| 82.0 \| 86.0 \| 93.0 \| 73.0 \| 56.0 \| 62.0 \| 79.0 \|
	\| COG \| Paratomi \| 99.0 \| 97.0 \| 90.0 \| 94.0 \| 75.0 \| 67.0 \| 95.0 \|
	\| COG \| Social-R1 \| 69.0 \| 79.0 \| 67.0 \| 67.0 \| 47.0 \| 54.0 \| 50.0 \|
	\| ROLE \| Coser \| 66.2 \| 62.1 \| 66.5 \| 55.9 \| 30.3 \| 43.5 \| 64.4 \|
	\| ROLE \| Lifechoices \| 91.0 \| 84.0 \| 92.0 \| 79.0 \| 67.0 \| 70.0 \| 70.0 \|
	\| ROLE \| Twinvoice \| 74.0 \| 86.0 \| 83.0 \| 71.0 \| 40.0 \| 42.0 \| 71.0 \|
	\| ROLE \| BehaviorChain \| 95.0 \| 92.0 \| 96.0 \| 85.0 \| 36.0 \| 41.0 \| 44.0 \|
	\| ROLE \| SimArena-Math \| 68.5 \| 71.5 \| 68.7 \| 70.9 \| 70.5 \| 68.9 \| 69.6 \|
	\| ROLE \| Mistakes \| 72.0 \| 73.0 \| 74.0 \| 67.0 \| 56.0 \| 27.0 \| 36.0 \|
	\| ROLE \| Humanual-Email \| 50.1 \| 46.9 \| 50.4 \| 47.9 \| 42.8 \| 43.7 \| 40.8 \|
	\| ROLE \| Humanual-News \| 40.2 \| 42.3 \| 41.3 \| 41.8 \| 33.1 \| 32.5 \| 27.5 \|
	\| ROLE \| Humanual-Politics \| 42.0 \| 32.5 \| 43.5 \| 31.6 \| 34.2 \| 33.2 \| 29.7 \|
	\| EVAL \| AlignX \| 71.2 \| 73.4 \| 71.6 \| 69.8 \| 66.8 \| 68.6 \| 67.4 \|
	\| EVAL \| Humanllm \| 45.7 \| 46.9 \| 44.2 \| 42.7 \| 35.2 \| 34.1 \| 33.1 \|
	\| EVAL \| Socsci210 \| 77.2 \| 78.0 \| 77.2 \| 74.5 \| 75.2 \| 73.6 \| 72.5 \|
	\| EVAL \| Humanual-Book \| 57.6 \| 62.4 \| 61.4 \| 58.4 \| 50.2 \| 53.6 \| 53.4 \|
	\| EVAL \| Humanual-Opinion \| 39.8 \| 36.0 \| 46.2 \| 34.2 \| 37.4 \| 37.2 \| 30.3 \|

	\* Others: best result among other specialized human-simulation models (HumanLM-8B, Sotopia-RL-7B, UserLM-8B, Coser-8B).

	> Note. The released Ditto-8B is a single generalist distilled from a set of task-specific DITTO experts via rejection sampling on the training set.

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "sunweiwei/Ditto-8B"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

	messages = [{"role": "user", "content": "Hello!"}]
	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=512)
	print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
	```

	## Citation

	```bibtex
	@article{sun2026ditto,
	title = {Reinforcing Human Behavior Simulation via Verbal Feedback},
	author = {Sun, Weiwei and Zhou, Xuhui and Liu, Jiarui and Du, Weihua and Sun, Haojia and Xie, Yiqing and Ma, Qianou and Chen, Sihao and Wan, Mengting and Yang, Longqi and Zhou, Pei and Wu, Sherry and Welleck, Sean and Neubig, Graham and Yang, Yiming and Sap, Maarten},
	year = {2026},
	eprint = {2605.20506},
	archivePrefix = {arXiv},
	url = {http://arxiv.org/abs/2605.20506}
	}
	```