Assembly of Experts: Linear-time construction of the Chimera LLM variants with emergent and adaptable behaviors
Paper • 2506.14794 • Published • 1
How to use stanzy/7007777 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="stanzy/7007777", trust_remote_code=True)
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("stanzy/7007777", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("stanzy/7007777", trust_remote_code=True)
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use stanzy/7007777 with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "stanzy/7007777"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "stanzy/7007777",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/stanzy/7007777
How to use stanzy/7007777 with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "stanzy/7007777" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "stanzy/7007777",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "stanzy/7007777" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "stanzy/7007777",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use stanzy/7007777 with Docker Model Runner:
docker model run hf.co/stanzy/7007777
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("stanzy/7007777", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("stanzy/7007777", trust_remote_code=True)
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))Model merge of DeepSeek-R1 and DeepSeek-V3 (0324)
An open weights model combining the intelligence of R1 with the token efficiency of V3.
For details on the construction process and analyses of Chimera model variants, please read our paper.
Paper on arXiV | Announcement on X | LinkedIn post | Try it on OpenRouter
Regarding R1T Chimera, we ask you to follow the careful guidelines that Microsoft has created for their "MAI-DS-R1" DeepSeek-based model.
These guidelines are available here on Hugging Face.
@misc{tng_technology_consulting_gmbh_2025,
author = { TNG Technology Consulting GmbH },
title = { DeepSeek-R1T-Chimera },
year = 2025,
month = {April},
url = { https://huggingface.co/tngtech/DeepSeek-R1T-Chimera },
doi = { 10.57967/hf/5330 },
publisher = { Hugging Face }
}
Base model
deepseek-ai/DeepSeek-R1
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="stanzy/7007777", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)