Instructions to use tencent/Hunyuan-A13B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use tencent/Hunyuan-A13B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="tencent/Hunyuan-A13B-Instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tencent/Hunyuan-A13B-Instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("tencent/Hunyuan-A13B-Instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use tencent/Hunyuan-A13B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "tencent/Hunyuan-A13B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tencent/Hunyuan-A13B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/tencent/Hunyuan-A13B-Instruct

SGLang

How to use tencent/Hunyuan-A13B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "tencent/Hunyuan-A13B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tencent/Hunyuan-A13B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "tencent/Hunyuan-A13B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "tencent/Hunyuan-A13B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use tencent/Hunyuan-A13B-Instruct with Docker Model Runner:
```
docker model run hf.co/tencent/Hunyuan-A13B-Instruct
```

Update README.md

by jaxchang - opened Jun 27, 2025

base: refs/heads/main

←

from: refs/pr/4

Discussion Files changed

+10

-10

Files changed (1) hide show

README.md +10 -10

README.md CHANGED Viewed

@@ -76,16 +76,16 @@ Note: The following benchmarks are evaluated by TRT-LLM-backend
 Hunyuan-A13B-Instruct has achieved highly competitive performance across multiple benchmarks, particularly in mathematics, science, agent domains, and more. We compared it with several powerful models, and the results are shown below.
-| Topic               | Bench                         | OpenAI-o1-1217 | DeepSeek R1 | Qwen3-A22B | Hunyuan-A13B-Instruct |
-|:-------------------:|:-----------------------------:|:-------------:|:------------:|:-----------:|:---------------------:|
-| **Mathematics**     | AIME 2024<br>AIME 2025<br>MATH | 74.3<br>79.2<br>96.4 | 79.8<br>70<br>94.9 | 85.7<br>81.5<br>94.0 | 87.3<br>76.8<br>94.3 |
-| **Science**         | GPQA-Diamond<br>OlympiadBench | 78<br>83.1 | 71.5<br>82.4 | 71.1<br>85.7 | 71.2<br>82.7 |
-| **Coding**          | Livecodebench<br>Fullstackbench<br>ArtifactsBench | 63.9<br>64.6<br>38.6 | 65.9<br>71.6<br>44.6 | 70.7<br>65.6<br>44.6 | 63.9<br>67.8<br>43 |
-| **Reasoning**       | BBH<br>DROP<br>ZebraLogic    | 80.4<br>90.2<br>81 | 83.7<br>92.2<br>78.7 | 88.9<br>90.3<br>80.3 | 89.1<br>91.1<br>84.7 |
-| **Instruction<br>Following** | IF-Eval<br>SysBench  | 91.8<br>82.5 | 88.3<br>77.7 | 83.4<br>74.2 | 84.7<br>76.1 |
-| **Text<br>Creation**| LengthCtrl<br>InsCtrl       | 60.1<br>74.8 | 55.9<br>69 | 53.3<br>73.7 | 55.4<br>71.9 |
-| **NLU**             | ComplexNLU<br>Word-Task     | 64.7<br>67.1 | 64.5<br>76.3 | 59.8<br>56.4 | 61.2<br>62.9 |
-| **Agent**           | BDCL v3<br> τ-Bench<br>ComplexFuncBench<br> C3-Bench | 67.8<br>60.4<br>47.6<br>58.8 | 56.9<br>43.8<br>41.1<br>55.3 | 70.8<br>44.6<br>40.6<br>51.7 | 78.3<br>54.7<br>61.2<br>63.5 |
 &nbsp;

 Hunyuan-A13B-Instruct has achieved highly competitive performance across multiple benchmarks, particularly in mathematics, science, agent domains, and more. We compared it with several powerful models, and the results are shown below.
+|           Topic           |                        Bench                       |        OpenAI-o1-1217        |          DeepSeek R1         |          Qwen3-A22B          |       Hunyuan-A13B-Instruct      |
+| :-----------------------: | :------------------------------------------------: | :--------------------------: | :--------------------------: | :--------------------------: | :------------------------------: |
+|      **Mathematics**      |           AIME 2024<br>AIME 2025<br>MATH           |   74.3<br>79.2<br>**96.4**   |      79.8<br>70<br>94.9      |     85.7<br>81.5<br>94.0     |       87.3<br>76.8<br>94.3       |
+|        **Science**        |            GPQA-Diamond<br>OlympiadBench           |          78<br>83.1          |         71.5<br>82.4         |       71.1<br>**85.7**       |           71.2<br>82.7           |
+|         **Coding**        |  Livecodebench<br>Fullstackbench<br>ArtifactsBench |     63.9<br>64.6<br>38.6     |   65.9<br>**71.6**<br>44.6   |     70.7<br>65.6<br>44.6     |        63.9<br>67.8<br>43        |
+|       **Reasoning**       |              BBH<br>DROP<br>ZebraLogic             |      80.4<br>90.2<br>81      |   83.7<br>**92.2**<br>78.7   |     88.9<br>90.3<br>80.3     |       89.1<br>91.1<br>84.7       |
+| **Instruction Following** |                 IF-Eval<br>SysBench                |       **91.8**<br>82.5       |         88.3<br>77.7         |         83.4<br>74.2         |           84.7<br>76.1           |
+|     **Text Creation**     |                LengthCtrl<br>InsCtrl               |       60.1<br>**74.8**       |          55.9<br>69          |         53.3<br>73.7         |           55.4<br>71.9           |
+|          **NLU**          |               ComplexNLU<br>Word-Task              |         64.7<br>67.1         |       64.5<br>**76.3**       |         59.8<br>56.4         |           61.2<br>62.9           |
+|         **Agent**         | BDCL v3<br>τ-Bench<br>ComplexFuncBench<br>C3-Bench | 67.8<br>60.4<br>47.6<br>58.8 | 56.9<br>43.8<br>41.1<br>55.3 | 70.8<br>44.6<br>40.6<br>51.7 | **78.3**<br>54.7<br>61.2<br>63.5 |
 &nbsp;