Instructions to use Nexusflow/Athene-V2-Chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Nexusflow/Athene-V2-Chat with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Nexusflow/Athene-V2-Chat")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Nexusflow/Athene-V2-Chat")
model = AutoModelForCausalLM.from_pretrained("Nexusflow/Athene-V2-Chat")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Nexusflow/Athene-V2-Chat with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Nexusflow/Athene-V2-Chat"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Nexusflow/Athene-V2-Chat",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Nexusflow/Athene-V2-Chat

SGLang

How to use Nexusflow/Athene-V2-Chat with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Nexusflow/Athene-V2-Chat" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Nexusflow/Athene-V2-Chat",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Nexusflow/Athene-V2-Chat" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Nexusflow/Athene-V2-Chat",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Nexusflow/Athene-V2-Chat with Docker Model Runner:
```
docker model run hf.co/Nexusflow/Athene-V2-Chat
```

lbourdois commited on Apr 27, 2025

Commit

043fe2d

verified ·

1 Parent(s): 493f1bb

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show

README.md +95 -83

README.md CHANGED Viewed

@@ -1,84 +1,96 @@
----
-license: other
-language:
-- en
-library_name: transformers
-tags:
-- RLHF
-- Nexusflow
-- Athene
-- Chat Model
-base_model:
-- Qwen/Qwen2.5-72B-Instruct
----
-# Athene-V2-Chat-72B: Rivaling GPT-4o across Benchmarks
-<p align="center">
-<a href="https://huggingface.co/Nexusflow" target="_blank">Nexusflow HF</a> - <a href="https://discord.gg/HDSVmNAs3y" target="_blank">Nexusflow Discord</a> - <a href="https://nexusflow.ai/blogs/athene-v2" target="_blank">Athene-V2 Blogpost</a>
-</p>
-We introduce Athene-V2-Chat-72B, an open-weights LLM on-par with GPT-4o across benchmarks. It is currently the best open model according to [Chatbot Arena](https://lmarena.ai/?leaderboard), where it beats GPT-4o-0513 (the best GPT-4o model on Arena) in hard and math category, and is on-par with GPT-4o-0513 in coding, instruction following, longer query and multi-turn.
-It is trained through RLHF with Qwen-2.5-72B-Instruct as base model. Athene-V2-Chat-72B excels in chat, math, and coding. Its sister model, [Athene-V2-Agent-72B](https://huggingface.co/Nexusflow/Athene-V2-Agent), surpasses GPT-4o in complex function calling and agentic applications.
-<p align="center" width="100%">
-<a><img src="arena.png" alt="Arena" style="width: 100%; min-width: 300px; display: block; margin: auto;"></a>
-</p>
-<p align="center" width="100%">
-<a><img src="benchmark.png" alt="Benchmark" style="width: 100%; min-width: 300px; display: block; margin: auto;"></a>
-</p>
-- **Developed by:** The Nexusflow Team
-- **Model type:** Chat Model
-- **Finetuned from model:** [Qwen 2.5 72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)
-- **License**: [Nexusflow Research License](https://huggingface.co/Nexusflow/Athene-V2-Chat/blob/main/Nexusflow_Research_License_.pdf)
-- **Blog**: https://nexusflow.ai/blogs/athene-v2
-## Usage
-Athene-V2-Chat uses the same chat template as Qwen2.5-72B-Instruct. Below is an example simple usage using the Transformers library.
-```Python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_name = "Nexusflow/Athene-V2-Chat"
-model = AutoModelForCausalLM.from_pretrained(
-    model_name,
-    torch_dtype="auto",
-    device_map="auto"
-)
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-prompt = "Write a Python function to return the nth Fibonacci number in log n runtime."
-messages = [
-    {"role": "user", "content": prompt}
-]
-text = tokenizer.apply_chat_template(
-    messages,
-    tokenize=False,
-    add_generation_prompt=True
-)
-model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
-generated_ids = model.generate(
-    **model_inputs,
-    max_new_tokens=2048
-)
-generated_ids = [
-    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
-]
-response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
-```
-Note that by adding a system prompt that encourages the model to think step by step, the model can improve further on difficult math queries and problems like counting `r`s in strawberry. For fairness consideration we **do not** include such system prompt during chat evaluation.
-## Acknowledgment
 We would like to thank the [LMSYS Organization](https://lmsys.org/) for their support of testing the model. We would like to thank Qwen Team and the open source community for their efforts in providing the datasets and base models.

+---
+license: other
+language:
+- zho
+- eng
+- fra
+- spa
+- por
+- deu
+- ita
+- rus
+- jpn
+- kor
+- vie
+- tha
+- ara
+library_name: transformers
+tags:
+- RLHF
+- Nexusflow
+- Athene
+- Chat Model
+base_model:
+- Qwen/Qwen2.5-72B-Instruct
+---
+# Athene-V2-Chat-72B: Rivaling GPT-4o across Benchmarks
+<p align="center">
+<a href="https://huggingface.co/Nexusflow" target="_blank">Nexusflow HF</a> - <a href="https://discord.gg/HDSVmNAs3y" target="_blank">Nexusflow Discord</a> - <a href="https://nexusflow.ai/blogs/athene-v2" target="_blank">Athene-V2 Blogpost</a>
+</p>
+We introduce Athene-V2-Chat-72B, an open-weights LLM on-par with GPT-4o across benchmarks. It is currently the best open model according to [Chatbot Arena](https://lmarena.ai/?leaderboard), where it beats GPT-4o-0513 (the best GPT-4o model on Arena) in hard and math category, and is on-par with GPT-4o-0513 in coding, instruction following, longer query and multi-turn.
+It is trained through RLHF with Qwen-2.5-72B-Instruct as base model. Athene-V2-Chat-72B excels in chat, math, and coding. Its sister model, [Athene-V2-Agent-72B](https://huggingface.co/Nexusflow/Athene-V2-Agent), surpasses GPT-4o in complex function calling and agentic applications.
+<p align="center" width="100%">
+<a><img src="arena.png" alt="Arena" style="width: 100%; min-width: 300px; display: block; margin: auto;"></a>
+</p>
+<p align="center" width="100%">
+<a><img src="benchmark.png" alt="Benchmark" style="width: 100%; min-width: 300px; display: block; margin: auto;"></a>
+</p>
+- **Developed by:** The Nexusflow Team
+- **Model type:** Chat Model
+- **Finetuned from model:** [Qwen 2.5 72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)
+- **License**: [Nexusflow Research License](https://huggingface.co/Nexusflow/Athene-V2-Chat/blob/main/Nexusflow_Research_License_.pdf)
+- **Blog**: https://nexusflow.ai/blogs/athene-v2
+## Usage
+Athene-V2-Chat uses the same chat template as Qwen2.5-72B-Instruct. Below is an example simple usage using the Transformers library.
+```Python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "Nexusflow/Athene-V2-Chat"
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+prompt = "Write a Python function to return the nth Fibonacci number in log n runtime."
+messages = [
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=2048
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+```
+Note that by adding a system prompt that encourages the model to think step by step, the model can improve further on difficult math queries and problems like counting `r`s in strawberry. For fairness consideration we **do not** include such system prompt during chat evaluation.
+## Acknowledgment
 We would like to thank the [LMSYS Organization](https://lmsys.org/) for their support of testing the model. We would like to thank Qwen Team and the open source community for their efforts in providing the datasets and base models.