Instructions to use ByteDance-Seed/Stable-DiffCoder-8B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ByteDance-Seed/Stable-DiffCoder-8B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ByteDance-Seed/Stable-DiffCoder-8B-Instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("ByteDance-Seed/Stable-DiffCoder-8B-Instruct", trust_remote_code=True)
model = AutoModel.from_pretrained("ByteDance-Seed/Stable-DiffCoder-8B-Instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ByteDance-Seed/Stable-DiffCoder-8B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ByteDance-Seed/Stable-DiffCoder-8B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance-Seed/Stable-DiffCoder-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ByteDance-Seed/Stable-DiffCoder-8B-Instruct

SGLang

How to use ByteDance-Seed/Stable-DiffCoder-8B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ByteDance-Seed/Stable-DiffCoder-8B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance-Seed/Stable-DiffCoder-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ByteDance-Seed/Stable-DiffCoder-8B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance-Seed/Stable-DiffCoder-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ByteDance-Seed/Stable-DiffCoder-8B-Instruct with Docker Model Runner:
```
docker model run hf.co/ByteDance-Seed/Stable-DiffCoder-8B-Instruct
```

Issues running your model in LM Studio

by uzvisa - opened Feb 5

Discussion

uzvisa

Feb 5

Hello.
I couldn’t use the model; no matter what settings I choose, it only outputs fragments of sentences, code, and Chinese characters.
I’m using LM Studio on a MacBook M1 16 GB Pro.
Could you please advise how I can launch and use your model correctly?
Thank you.

Facico

ByteDance Seed org Feb 6

•

edited Feb 6

Hello.
I couldn’t use the model; no matter what settings I choose, it only outputs fragments of sentences, code, and Chinese characters.
I’m using LM Studio on a MacBook M1 16 GB Pro.
Could you please advise how I can launch and use your model correctly?
Thank you.

Thank you for your feedback and for trying out our model.

While the model architecture may appear similar to Llama, the Stable-DiffCoder model family requires a specialized inference logic adapted for Diffusion Language Models. It cannot use standard autoregressive inference directly.

To run the model correctly, please ensure you are using the dedicated inference code we provide. You can find the core implementation here:
https://huggingface.co/ByteDance-Seed/Stable-DiffCoder-8B-Instruct/blob/main/modeling_seed_diffcoder.py

If you are using a quantized version of the model, you will need to adapt this specific inference logic into your quantization framework/tool. Using standard autoregressive sampling with a quantized DiffCoder model will lead to incorrect outputs (such as fragments, code snippets, or garbled text).

Could you please first confirm that your current setup is using the correct diffusion-based inference logic as linked above? If the issue persists after adapting the code, please provide more detailed information about your environment and the exact steps you are taking, and I will be glad to help you resolve it.

Best regards

uzvisa

Feb 6

Thank you. I am not programmer and only use a LM Studio. I will wait when model can be used in LM Studio )
Xie-xie dear Developers!

qwasqwasqwasqwas

Feb 22

Hello, can i use bitsandbytes to quantize this model, will this work?

Facico

ByteDance Seed org Feb 23

•

edited Feb 23

Hello, can i use bitsandbytes to quantize this model, will this work?

If bnb doesn't change the inference logic, it should be fine.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment