Instructions to use QuixiAI/DeepSeek-V3-AWQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use QuixiAI/DeepSeek-V3-AWQ with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="QuixiAI/DeepSeek-V3-AWQ", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("QuixiAI/DeepSeek-V3-AWQ", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("QuixiAI/DeepSeek-V3-AWQ", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use QuixiAI/DeepSeek-V3-AWQ with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "QuixiAI/DeepSeek-V3-AWQ"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuixiAI/DeepSeek-V3-AWQ",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/QuixiAI/DeepSeek-V3-AWQ

SGLang

How to use QuixiAI/DeepSeek-V3-AWQ with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "QuixiAI/DeepSeek-V3-AWQ" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuixiAI/DeepSeek-V3-AWQ",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "QuixiAI/DeepSeek-V3-AWQ" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuixiAI/DeepSeek-V3-AWQ",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use QuixiAI/DeepSeek-V3-AWQ with Docker Model Runner:
```
docker model run hf.co/QuixiAI/DeepSeek-V3-AWQ
```

The V3-AWQ model's response seems not as expected

by juxing - opened Feb 24, 2025

Discussion

juxing

Feb 24, 2025

3.11和3.9谁大

3.11和3.9是指数字的大小，3.11比3.9大。3.11和3.9是指数字的大小，3.11比3.9大。

3.11和3.9是指数字的大小，3.11比3.9大。

3.11和3.9是指数字的大小，3.11比3.9大.

3.11和3。

3.11和3.9是指数字的大小，3.11比3.9大.

3.3.11是指数字的大小，3.11比3.9大.

3.3是指数字的大小，3.11比3.9大.

3.是指数字的大小，3.11比3.9大.

3.是指数字的大小，3.11比3。

3.是指数字的大小，3.比3。

3.是指数字的大小，3.比。

juxing

Feb 24, 2025

vllm应该怎么部署模型

要部署一个模型，需要先准备好模型，然后使用适当的工具和技术进行部署。可以使用各种编程语言和框架来部署模型，如Python、Java、C++等。可以使用各种工具和技术来部署模型，如Docker、Kubernetes、Ansible等。可以使用各种云服务来部署模型，如AWS、Azure、Google Cloud等。可以使用各种数据库来存储和查询模型数据，如MySQL、PostgreSQL、MongoDB等。可以使用各种API来访问和操作模型，如REST API、GraphQL API等。可以使用各种SDK来开发和使用模型，如TensorFlow SDK、PyTorch SDK等。可以使用各种库来训练和评估模型，如Scikit-learn库、XGBoost库等。可以使用各种算法来训练和评估模型，如决策树算法、随机森林算法等。可以使用各种工具来监控和管理模型，如Prometheus工具、Grafana工具等。可以使用各种平台来部署和管理模型，如Kubernetes平台、OpenShift平台等。可以使用各种服务来部署和管理模型，如Amazon Sage服务、Google Cloud服务等可以使用各种技术来部署和管理模型，如微服务技术、容器技术等。可以使用各种方法来部署和管理模型，如持续集成方法、持续交付方法等。可以使用各种流程来部署和管理模型，如DevOps流程、Agile流程等。可以使用各种方法来部署和管理模型，如蓝绿部署方法、金丝雀部署方法等。可以使用各种策略来部署和管理模型，如滚动更新策略、蓝绿部署策略等。可以使用各种工具来部署和管理模型，如Jenkins工具、Ansible工具等。可以使用各种方法来部署和管理模型，如持续集成方法、持续交付方法等。可以使用各种技术来部署和管理模型，如微服务技术、容器技术等。可以使用各种平台来部署和管理模型，如Kubernetes平台、OpenShift平台等。可以使用各种服务来部署和管理模型，如Amazon服务、Google服务等。可以使用各种服务来部署和管理模型，如Amazon服务、Google服务等。可以使用各种工具来部署和管理模型，如Prometheus工具、GGrafana工具等。可以使用各种工具来部署和管理模型，如Prometheus工具、Grafana工具等。可以使用各种工具来部署和管理模型，如Prometheus工具、Grafana工具等。可以使用各种工具来部署和管理模型，如Prometheus工具、Grafana工具等。可以使用各种工具来部署和管理模型，如Prometheus工具、GGrafana工具等。可以使用各种工具来部署和管理模型，如Prometheus工具、Grafana工具等。可以使用各种工具来部署和管理模型，如Prometheus工具、Grafana工具等。可以使用各种工具来部署和管理模型，如Prometheus工具、GGrafana工具等。可以使用各种工具来部署和管理模型，如Prometheus工具、Ggrafana工具等。可以使用各种工具来部署和管理模型，如Prometheus工具、Ggrafana工具等。可以使用各种工具来部署和管理模型，如Prometheus工具、Ggrafana工具等。可以使用各种工具来部署和管理模型，如Prometheus工具、Ggrafana工具等。

v2ray

Feb 24, 2025

Start up command? The inference parameters? Temperature?

juxing

Feb 25, 2025

vllm serve $MODEL
--dtype float16
--tensor-parallel-size $GPU_CNT
--max-model-len 8192
--gpu-memory-utilization 0.9
--host 0.0.0.0
--port 80
--trust-remote-code \

v2ray

Feb 25, 2025

You did not provide enough info lol, and every other person seems to be doing fine, closed.

v2ray changed discussion status to closed Feb 25, 2025

fridayl

Feb 26, 2025

i have same poor performance
run command
python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 9000 --max-model-len 65536 --max-num-batched-tokens 65536 --trust-remote-code --tensor-parallel-size 8 --gpu-memory-utilization 0.97 --dtype float16 --served-model-name deepseek-chat --model /DeepSeek-V3-awq