Instructions to use ByteDance-Seed/Seed-OSS-36B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ByteDance-Seed/Seed-OSS-36B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ByteDance-Seed/Seed-OSS-36B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("ByteDance-Seed/Seed-OSS-36B-Instruct")
model = AutoModelForCausalLM.from_pretrained("ByteDance-Seed/Seed-OSS-36B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ByteDance-Seed/Seed-OSS-36B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ByteDance-Seed/Seed-OSS-36B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance-Seed/Seed-OSS-36B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ByteDance-Seed/Seed-OSS-36B-Instruct

SGLang

How to use ByteDance-Seed/Seed-OSS-36B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ByteDance-Seed/Seed-OSS-36B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance-Seed/Seed-OSS-36B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ByteDance-Seed/Seed-OSS-36B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance-Seed/Seed-OSS-36B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ByteDance-Seed/Seed-OSS-36B-Instruct with Docker Model Runner:
```
docker model run hf.co/ByteDance-Seed/Seed-OSS-36B-Instruct
```

Getting error when deploying on HF Inference with 2 A100 GPU's of AWS on region us-east

#19

by streebo - opened Aug 28, 2025

Discussion

streebo

Aug 28, 2025

Endpoint encountered an error.
You can try restarting it using the "retry" button above. Check logs for more details.
[Server message]Endpoint failed to start
Exit code: 1. Reason: eturn _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 11, in
server_args = prepare_server_args(sys.argv[1:])
File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 2003, in prepare_server_args
server_args = ServerArgs.from_cli_args(raw_args)
File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 1815, in from_cli_args
return cls(**{attr: getattr(args, attr) for attr in attrs})
File "", line 183, in init
File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 333, in post_init
model_config = ModelConfig.from_server_args(self)
File "/sgl-workspace/sglang/python/sglang/srt/configs/model_config.py", line 284, in from_server_args
return ModelConfig(
File "/sgl-workspace/sglang/python/sglang/srt/configs/model_config.py", line 79, in init
self.hf_config = get_config(
File "/sgl-workspace/sglang/python/sglang/srt/utils.py", line 2745, in wrapper
result = func(*args, **kwargs)
File "/sgl-workspace/sglang/python/sglang/srt/hf_transformers_utils.py", line 123, in get_config
config = AutoConfig.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 1267, in from_pretrained
raise ValueError(
ValueError: The checkpoint you are trying to load has model type seed_oss but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command pip install --upgrade transformers. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command pip install git+https://github.com/huggingface/transformers.git

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment