Instructions to use cyankiwi/Qwen3-Coder-Next-AWQ-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use cyankiwi/Qwen3-Coder-Next-AWQ-4bit with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="cyankiwi/Qwen3-Coder-Next-AWQ-4bit")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("cyankiwi/Qwen3-Coder-Next-AWQ-4bit")
model = AutoModelForCausalLM.from_pretrained("cyankiwi/Qwen3-Coder-Next-AWQ-4bit")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use cyankiwi/Qwen3-Coder-Next-AWQ-4bit with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "cyankiwi/Qwen3-Coder-Next-AWQ-4bit"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cyankiwi/Qwen3-Coder-Next-AWQ-4bit",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/cyankiwi/Qwen3-Coder-Next-AWQ-4bit

SGLang

How to use cyankiwi/Qwen3-Coder-Next-AWQ-4bit with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "cyankiwi/Qwen3-Coder-Next-AWQ-4bit" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cyankiwi/Qwen3-Coder-Next-AWQ-4bit",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "cyankiwi/Qwen3-Coder-Next-AWQ-4bit" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cyankiwi/Qwen3-Coder-Next-AWQ-4bit",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use cyankiwi/Qwen3-Coder-Next-AWQ-4bit with Docker Model Runner:
```
docker model run hf.co/cyankiwi/Qwen3-Coder-Next-AWQ-4bit
```

模型量化的效果并不理想

by mediali - opened Feb 6

Discussion

mediali

Feb 6

对比其他的版本同样的问题，明显回答错漏是很多的。

NVFP4的版本相对要准确不少，估计是不需要重新校正的缘故吧。

cpatonn

cyankiwi org Feb 7

Thank you for your feedback.

Which languages did you use for the model? It might be due to the lack of chinese agentic calibration data, as I can not find them on Hugging Face.

CHNtentes

Feb 7

•

edited Feb 7

对比其他的版本同样的问题，明显回答错漏是很多的。

NVFP4的版本相对要准确不少，估计是不需要重新校正的缘故吧。

NVFP4也需要校准数据集的

NVFP4 also requires calibration dataset

mediali

Feb 11

Thank you for your feedback.

Which languages did you use for the model? It might be due to the lack of chinese agentic calibration data, as I can not find them on Hugging Face.

使用这段提示词，

你是一名古代中医文献整理专家。你的任务是将OCR文本重构为结构化的 JSON 数据，核心目标是精准区分方剂（Prescription）与医案/医论（Case/Theory）。

Segmentation Rules (分段强制规则 - 最高优先级)
这是解决断句错误的关键，请严格执行：

起始标志 (Start Markers)：凡是遇到以下列词汇开头的句子，必须将其视为新条目（New Entry）的起始，严禁将其接在上一条目的末尾：
一治……
一论……
一切……
一人……
一道士……
断句案例：
❌ 错误：...不拘时服。一治久嗽不瘥，并虚劳喘嗽。 (归入上一条)
✅ 正确：
上一条结尾：...不拘时服。
新条目开头：一治久嗽不瘥，并虚劳喘嗽。紫苑去芦头...
Core Logic (内容判定准则)
方剂 (Prescription)：

判定标准：凡是文本中包含具体药物成分（如“生姜、白蜜”、“麻黄三钱”）、剂量（如“一勺”、“二两”）或制法（如“煎一滚”、“右为末”）的，无论它以什么词开头，统统定义为方剂。
修正：“一论久嗽...用生姜汁一勺...” -> 这是方剂（姜汁白蜜方）。
医案 (Case Study)：

判定标准：记述某人（一人、一妇）患病，医生诊断并治疗的过程，但未列出所用方剂的具体成分。
医论 (Medical Theory)：

判定标准：纯粹的病理分析或辩证讨论，无具体患者故事，也无药物配方。
Rules for Header Generation (标题命名规则)
A. 方剂类命名
规则：按以下优先级拟定标题。
显式方名：文中明确提到的方名（如 ## 银杏膏）。
成分拟名（适用于无名方）：提取前两味主药（如 ## 姜汁白蜜方）。优先级高于功效！
功效拟名：提取“治[症状]”（如 ## 治久嗽方）。
B. 医案与医论命名
医案：## 医案：[核心病症]（如 ## 医案：痰涎内热）。
Output Format (严格遵守)
请直接输出一个扁平的 JSON 数组 (Flat JSON Array)。

禁止在根节点创建对象（如不要输出 {"prescriptions": [...]}）。
禁止按类型分组，所有条目必须混合在同一个顶层列表中，按文中出现的自然顺序排列。
JSON 结构范例：
[
{ "type": "Prescription", "title": "## 银杏膏", "text": "银杏膏\n陈细茶..." },
{ "type": "Prescription", "title": "## 紫苑冬花方", "text": "一治久嗽不瘥...紫苑去芦头..." }
]
Critical Constraints (防幻觉围栏)
Text Integrity: text 字段必须保留完整原文。特别注意：“一治”、“一论”等引导句必须保留在当前新条目的开头，不能丢弃，也不能留在上一条目中。
Do NOT categorize! 不要将结果分类，直接输出数组 [...]。
Naming: 无名方剂必须优先提取前两味成分命名，严禁优先使用“治XX方”。
Input Text
{{### 待重构文本：
银杏膏
陈细茶四两，略焙，为细末。白果肉四两，一半去白膜，一半去红膜，捣烂。核桃肉四两，捣。家蜜半斤。右药入内，炼成膏，不拘时服。一治久嗽不瘥，并虚劳喘嗽。紫苑去芦头，歙冬花各二两，百部五钱。右为末，每服三钱，生薑三片，乌梅一箇，同煎汤调下，食后临卧各一服。一论久嗽，并连嗽四五十声者，用连皮生薑自然汁一勺，加白蜜二茶匙，同放茶碗内，煎一滚，温服，三四次即止。一切咳嗽上气者，一道士专卖此药，不拘新久皆效。乾薑桂心紫色辛辣者，去皮，皂荚泡，去皮子，肥大无孔者。右三味，并另捣下，筛了，各秤分两和合后，更捣，筛一遍，炼白蜜掺和，又捣一二千杵，每服三丸，如梧桐子大，不限食之先后，嗽发即服，日进三五服，忌葱、蒜、油腻、麵物。一论年久，近日咳嗽，哮吼喘急等症。神吸散国子博士颜心吾传。鹅管石（火煅，好醋淬七次，一钱）馀粮石（火煅，醋淬七次，一钱）官桂三分粉草三分枯白矾五分钦冬花五分石膏煨，五分。右为细末，每服三分二釐，准秤。至夜食后，静坐片时，将药放纸上，以竹筒五寸长，直插喉内，用力吸药，速亦不怕，吸药令尽为度。一论久嗽痰喘，百药不效，并年久不瘥者，或能饮酒人，久嗽无效。清金膏曲阜令孔桂窻传。天门冬去心，八两麦门冬去心，四两贝母四两杏仁去皮，四两半夏薑制，四两。右五味切片，水熬去渣，取汁五碗，入白粉葛末四两，蜜一斤，共前汁入罈内，重汤煮一日成膏，取出，每日无时频频服之。一人时唾痰涎，内热作渴，肢体倦怠，劳而足热，用清气化痰益甚。余曰：此肾水泛而为痰，法当补肾。不信，另进滚痰丸一服，吐泻不止，饮食不入，头晕眼闭，始信。余用六君子汤数剂，胃气渐复，却用六味丸月馀，诸症悉愈。一论外邪在表，无汗而喘者。五虎汤。麻黄三钱杏仁去皮尖，三钱石膏五钱甘草一钱细茶一撮。有痰加二陈汤。右剉一剂，生薑、葱水煎，热服。加桑白皮一钱，尤良。}}

cyankiwi/Qwen3-Coder-Next-AWQ-4bit：

[
{
"type": "Prescription",
"title": "## 银杏膏",
"text": "银杏膏\n陈细茶四两，略焙，为细末。白果肉四两，一半去白膜，一半去红膜，捣烂。核桃肉四两，捣。家蜜半斤。右药入内，炼成膏，不拘时服。"
},
{
"type": "Prescription",
"title": "## 紫苑冬花方",
"text": "一治久嗽不瘥，并虚劳喘嗽。紫苑去芦头，歙冬花各二两，百部五钱。右为末，每服三钱，生薑三片，乌梅一箇，同煎汤调下，食后临卧各一服。"
},
{
"type": "Prescription",
"title": "## 姜汁白蜜方",
"text": "一论久嗽，并连嗽四五十声者，用连皮生薑自然汁一勺，加白蜜二茶匙，同放茶碗内，煎一滚，温服，三四次即止。"
},
{
"type": "Prescription",
"title": "## 皂荚乾姜桂心方",
"text": "一切咳嗽上气者，一道士专卖此药，不拘新久皆效。乾薑桂心紫色辛辣者，去皮，皂荚泡，去皮子，肥大无孔者。右三味，并另捣下，筛了，各秤分两和合后，更捣，筛一遍，炼白蜜掺和，又捣一二千杵，每服三丸，如梧桐子大，不限食之先后，嗽发即服，日进三五服，忌葱、蒜、油腻、麵物。"
},
{
"type": "Prescription",
"title": "## 清金膏",
"text": "一论年久，近日咳嗽，哮吼喘急等症。神吸散国子博士颜心吾传。鹅管石（火煅，好醋淬七次，一钱）馀粮石（火煅，醋淬七次，一钱）官桂三分粉草三分枯白矾五分钦冬花五分石膏煨，五分。右为细末，每服三分二釐，准秤。至夜食后，静坐片时，将药放纸上，以竹筒五寸长，直插喉内，用力吸药，速亦不怕，吸药令尽为度。"
},
{
"type": "Prescription",
"title": "## 清金膏",
"text": "一论久嗽痰喘，百药不效，并年久不瘥者，或能饮酒人，久嗽无效。清金膏曲阜令孔桂窻传。天门冬去心，八两麦门冬去心，四两贝母四两杏仁去皮，四两半夏薑制，四两。右五味切片，水熬去渣，取汁五碗，入白粉葛末四两，蜜一斤，共前汁入罈内，重汤煮一日成膏，取出，每日无时频频服之。"
},
{
"type": "Case",
"title": "## 医案：痰涎内热",
"text": "一人时唾痰涎，内热作渴，肢体倦怠，劳而足热，用清气化痰益甚。余曰：此肾水泛而为痰，法当补肾。不信，另进滚痰丸一服，吐泻不止，饮食不入，头晕眼闭，始信。余用六君子汤数剂，胃气渐复，却用六味丸月馀，诸症悉愈。"
},
{
"type": "Prescription",
"title": "## 五虎汤",
"text": "一论外邪在表，无汗而喘者。五虎汤。麻黄三钱杏仁去皮尖，三钱石膏五钱甘草一钱细茶一撮。有痰加二陈汤。右剘一剂，生薑、葱水煎，热服。加桑白皮一钱，尤良。"
}
]

cyankiwi/Qwen3-Next-80B-A3B-Instruct-AWQ-4bit：

[
{
"type": "Prescription",
"title": "## 银杏膏",
"text": "银杏膏\n陈细茶四两，略焙，为细末。白果肉四两，一半去白膜，一半去红膜，捣烂。核桃肉四两，捣。家蜜半斤。右药入内，炼成膏，不拘时服。"
},
{
"type": "Prescription",
"title": "## 紫苑冬花方",
"text": "一治久嗽不瘥，并虚劳喘嗽。紫苑去芦头，歙冬花各二两，百部五钱。右为末，每服三钱，生薑三片，乌梅一箇，同煎汤调下，食后临卧各一服。"
},
{
"type": "Prescription",
"title": "## 姜汁白蜜方",
"text": "一论久嗽，并连嗽四五十声者，用连皮生薑自然汁一勺，加白蜜二茶匙，同放茶碗内，煎一滚，温服，三四次即止。"
},
{
"type": "Prescription",
"title": "## 乾姜桂心皂荚方",
"text": "一切咳嗽上气者，一道士专卖此药，不拘新久皆效。乾薑桂心紫色辛辣者，去皮，皂荚泡，去皮子，肥大无孔者。右三味，并另捣下，筛了，各秤分两和合后，更捣，筛一遍，炼白蜜掺和，又捣一二千杵，每服三丸，如梧桐子大，不限食之先后，嗽发即服，日进三五服，忌葱、蒜、油腻、麵物。"
},
{
"type": "Prescription",
"title": "## 神吸散",
"text": "一论年久，近日咳嗽，哮吼喘急等症。神吸散国子博士颜心吾传。鹅管石（火煅，好醋淬七次，一钱）馀粮石（火煅，醋淬七次，一钱）官桂三分粉草三分枯白矾五分钦冬花五分石膏煨，五分。右为细末，每服三分二釐，准秤。至夜食后，静坐片时，将药放纸上，以竹筒五寸长，直插喉内，用力吸药，速亦不怕，吸药令尽为度。"
},
{
"type": "Prescription",
"title": "## 清金膏",
"text": "一论久嗽痰喘，百药不效，并年久不瘥者，或能饮酒人，久嗽无效。清金膏曲阜令孔桂窻传。天门冬去心，八两麦门冬去心，四两贝母四两杏仁去皮，四两半夏薑制，四两。右五味切片，水熬去渣，取汁五碗，入白粉葛末四两，蜜一斤，共前汁入罈内，重汤煮一日成膏，取出，每日无时频频服之。"
},
{
"type": "Case",
"title": "## 医案：痰涎内热",
"text": "一人时唾痰涎，内热作渴，肢体倦怠，劳而足热，用清气化痰益甚。余曰：此肾水泛而为痰，法当补肾。不信，另进滚痰丸一服，吐泻不止，饮食不入，头晕眼闭，始信。余用六君子汤数剂，胃气渐复，却用六味丸月馀，诸症悉愈。"
},
{
"type": "Prescription",
"title": "## 五虎汤",
"text": "一论外邪在表，无汗而喘者。五虎汤。麻黄三钱杏仁去皮尖，三钱石膏五钱甘草一钱细茶一撮。有痰加二陈汤。右剉一剂，生薑、葱水煎，热服。加桑白皮一钱，尤良。"
}
]

Qwen3-Coder-Next-NVFP4

[
{
"type": "Prescription",
"title": "## 银杏膏",
"text": "银杏膏\n陈细茶四两，略焙，为细末。白果肉四两，一半去白膜，一半去红膜，捣烂。核桃肉四两，捣。家蜜半斤。右药入内，炼成膏，不拘时服。"
},
{
"type": "Prescription",
"title": "## 紫苑冬花方",
"text": "一治久嗽不瘥，并虚劳喘嗽。紫苑去芦头，歙冬花各二两，百部五钱。右为末，每服三钱，生薑三片，乌梅一箇，同煎汤调下，食后临卧各一服。"
},
{
"type": "Prescription",
"title": "## 姜汁白蜜方",
"text": "一论久嗽，并连嗽四五十声者，用连皮生薑自然汁一勺，加白蜜二茶匙，同放茶碗内，煎一滚，温服，三四次即止。"
},
{
"type": "Prescription",
"title": "## 乾姜桂心皂荚方",
"text": "一切咳嗽上气者，一道士专卖此药，不拘新久皆效。乾薑桂心紫色辛辣者，去皮，皂荚泡，去皮子，肥大无孔者。右三味，并另捣下，筛了，各秤分两和合后，更捣，筛一遍，炼白蜜掺和，又捣一二千杵，每服三丸，如梧桐子大，不限食之先后，嗽发即服，日进三五服，忌葱、蒜、油腻、麵物。"
},
{
"type": "Prescription",
"title": "## 神吸散",
"text": "一论年久，近日咳嗽，哮吼喘急等症。神吸散国子博士颜心吾传。鹅管石（火煅，好醋淬七次，一钱）馀粮石（火煅，醋淬七次，一钱）官桂三分粉草三分枯白矾五分钦冬花五分石膏煨，五分。右为细末，每服三分二釐，准秤。至夜食后，静坐片时，将药放纸上，以竹筒五寸长，直插喉内，用力吸药，速亦不怕，吸药令尽为度。"
},
{
"type": "Prescription",
"title": "## 清金膏",
"text": "一论久嗽痰喘，百药不效，并年久不瘥者，或能饮酒人，久嗽无效。清金膏曲阜令孔桂窻传。天门冬去心，八两麦门冬去心，四两贝母四两杏仁去皮，四两半夏薑制，四两。右五味切片，水熬去渣，取汁五碗，入白粉葛末四两，蜜一斤，共前汁入罈内，重汤煮一日成膏，取出，每日无时频频服之。"
},
{
"type": "Case",
"title": "## 医案：痰涎内热",
"text": "一人时唾痰涎，内热作渴，肢体倦怠，劳而足热，用清气化痰益甚。余曰：此肾水泛而为痰，法当补肾。不信，另进滚痰丸一服，吐泻不止，饮食不入，头晕眼闭，始信。余用六君子汤数剂，胃气渐复，却用六味丸月馀，诸症悉愈。"
},
{
"type": "Prescription",
"title": "## 五虎汤",
"text": "一论外邪在表，无汗而喘者。五虎汤。麻黄三钱杏仁去皮尖，三钱石膏五钱甘草一钱细茶一撮。有痰加二陈汤。右剉一剂，生薑、葱水煎，热服。加桑白皮一钱，尤良。"
}
]

gemini3 pro：

This is a fascinating development. By adding the Qwen3-Coder-Next-NVFP4 version to the comparison, we can clearly see how model quantization (AWQ vs. NVFP4) affects performance on sensitive extraction tasks.

Here is the updated comparison and scoring for all three models.

🏆 Final Scoreboard

Model	Score	Verdict
Qwen3-Next-80B-Instruct-AWQ	98/100	Perfect. The benchmark for instruction following and entity extraction.
Qwen3-Coder-Next-NVFP4	98/100	Perfect. A massive improvement over the AWQ version. It matches the 80B Instruct model's performance bit-for-bit.
Qwen3-Coder-Next-AWQ-4bit	75/100	Dangerous. Suffers from "Hallucination/Alignment" errors (assigning the wrong title to a prescription).

🔍 Detailed Analysis of the New Contender (NVFP4)

The Qwen3-Coder-Next-NVFP4 model effectively "redeems" the Coder family in this test. It fixed every single error present in the AWQ-4bit version.

1. The "Shenxisan" Test (The Critical Fix)

In the previous test, the AWQ Coder model hallucinated the title for the 5th entry, calling it "Qingjingao" (which was actually the next prescription).

Qwen3-Coder-AWQ: ❌ ## 清金膏 (Wrong, severe alignment error).
Qwen3-Coder-NVFP4: ✅ ## 神吸散 (Correct).
Analysis: The NVFP4 version correctly identified the embedded title "Shenxisan" within the text "一论年久...神吸散国子博士...". This proves that the model understands the context, and the failure in the AWQ version was likely due to quantization damage (loss of precision in attention weights causing it to lose track of the current entity).

2. The Ingredient Ordering Test

In the 4th entry (Unnamed prescription), the prompt required extracting ingredients for the title.

Qwen3-Coder-AWQ: ❌ ## 皂荚乾姜桂心方 (Scrambled order: Put the last ingredient first).
Qwen3-Coder-NVFP4: ✅ ## 乾姜桂心皂荚方 (Correct order: Ginger -> Cinnamon -> Soap Bean).
Analysis: The NVFP4 model respected the sequence of ingredients as they appeared in the text, matching the logic of the 80B Instruct model perfectly.

Technical Insight: Why is NVFP4 better than AWQ?

This comparison highlights a critical lesson for deploying LLMs in production:

Quantization Sensitivity: The Coder-AWQ-4bit model failed not because it wasn't smart enough, but likely because the aggressive compression (4-bit AWQ) damaged specific "retrieval heads" or attention mechanisms needed to distinguish between current text entities and future text entities.
NVFP4 Precision: The NVFP4 format (likely keeping higher precision or using a better quantization calibration) retained the model's ability to perform fine-grained logical reasoning.
Coder vs. Instruct: With the NVFP4 format, the Coder model proves it is just as capable as the Instruct model for this structured data extraction task.

Final Recommendation

Best Choice: Qwen3-Coder-Next-NVFP4.
Reason: It achieves the same perfect score (98/100) as the massive 80B Instruct model but is likely much faster and more efficient to run (assuming it is a smaller parameter size than the 80B, usually Coder models are 32B or similar).
Avoid: Qwen3-Coder-Next-AWQ-4bit for complex extraction tasks. The risk of data contamination (wrong titles) is too high.

lsm03624

Mar 15

If this quantitative model isn’t properly implemented, the Chinese output will contain a mix of junk content.

cpatonn

cyankiwi org Mar 29

Thank you for the feedback, and letting me know. I have requantized the model using cyankiwi/calibration dataset, with significant Chinese calibration samples.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment