Instructions to use ByteDance-Seed/Stable-DiffCoder-8B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ByteDance-Seed/Stable-DiffCoder-8B-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ByteDance-Seed/Stable-DiffCoder-8B-Instruct", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("ByteDance-Seed/Stable-DiffCoder-8B-Instruct", trust_remote_code=True) model = AutoModel.from_pretrained("ByteDance-Seed/Stable-DiffCoder-8B-Instruct", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ByteDance-Seed/Stable-DiffCoder-8B-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ByteDance-Seed/Stable-DiffCoder-8B-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ByteDance-Seed/Stable-DiffCoder-8B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ByteDance-Seed/Stable-DiffCoder-8B-Instruct
- SGLang
How to use ByteDance-Seed/Stable-DiffCoder-8B-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ByteDance-Seed/Stable-DiffCoder-8B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ByteDance-Seed/Stable-DiffCoder-8B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ByteDance-Seed/Stable-DiffCoder-8B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ByteDance-Seed/Stable-DiffCoder-8B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ByteDance-Seed/Stable-DiffCoder-8B-Instruct with Docker Model Runner:
docker model run hf.co/ByteDance-Seed/Stable-DiffCoder-8B-Instruct
Issues running your model in LM Studio
Hello.
I couldn’t use the model; no matter what settings I choose, it only outputs fragments of sentences, code, and Chinese characters.
I’m using LM Studio on a MacBook M1 16 GB Pro.
Could you please advise how I can launch and use your model correctly?
Thank you.
Hello.
I couldn’t use the model; no matter what settings I choose, it only outputs fragments of sentences, code, and Chinese characters.
I’m using LM Studio on a MacBook M1 16 GB Pro.
Could you please advise how I can launch and use your model correctly?
Thank you.
Thank you for your feedback and for trying out our model.
While the model architecture may appear similar to Llama, the Stable-DiffCoder model family requires a specialized inference logic adapted for Diffusion Language Models. It cannot use standard autoregressive inference directly.
To run the model correctly, please ensure you are using the dedicated inference code we provide. You can find the core implementation here:
https://huggingface.co/ByteDance-Seed/Stable-DiffCoder-8B-Instruct/blob/main/modeling_seed_diffcoder.py
If you are using a quantized version of the model, you will need to adapt this specific inference logic into your quantization framework/tool. Using standard autoregressive sampling with a quantized DiffCoder model will lead to incorrect outputs (such as fragments, code snippets, or garbled text).
Could you please first confirm that your current setup is using the correct diffusion-based inference logic as linked above? If the issue persists after adapting the code, please provide more detailed information about your environment and the exact steps you are taking, and I will be glad to help you resolve it.
Best regards
Thank you. I am not programmer and only use a LM Studio. I will wait when model can be used in LM Studio )
Xie-xie dear Developers!
Hello, can i use bitsandbytes to quantize this model, will this work?
Hello, can i use bitsandbytes to quantize this model, will this work?
If bnb doesn't change the inference logic, it should be fine.