Update README.md

b1e42e2 verified 7 months ago

3.93 kB

	---
	library_name: transformers
	tags: [clarifai,FP8]
	---

	# Qwen3-Coder-30B-A3B-Instruct

	[Basemodel](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507)

	![image](https://github.com/user-attachments/assets/b22c9807-f5e7-49eb-b00d-598e400781af)

	Visit model playground at Clarifai https://clarifai.com/qwen/qwenCoder/models/Qwen3-Coder-30B-A3B-Instruct

	## Highlights

	Qwen3-Coder is available in multiple sizes. Today, the Qwen team is excited to introduce Qwen3-Coder-30B-A3B-Instruct. This streamlined model maintains impressive performance and efficiency, featuring the following key enhancements:

	- Significant Performance among open models on Agentic Coding, Agentic Browser-Use, and other foundational coding tasks.
	- Long-context Capabilities with native support for 256K tokens, extendable up to 1M tokens using Yarn, optimized for repository-scale understanding.
	- Agentic Coding supporting most platforms such as Qwen Code, CLINE, featuring a specially designed function call format.

	![image/jpeg](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Coder/qwen3-coder-30a3-main.jpg)

	## Model Overview

	Qwen3-Coder-30B-A3B-Instruct has the following features:
	- Type: Causal Language Models
	- Training Stage: Pretraining & Post-training
	- Number of Parameters: 30.5B in total and 3.3B activated
	- Number of Layers: 48
	- Number of Attention Heads (GQA): 32 for Q and 4 for KV
	- Number of Experts: 128
	- Number of Activated Experts: 8

	NOTE: This model supports only non-thinking mode and does not generate ``<think></think>`` blocks in its output. Meanwhile, specifying `enable_thinking=False` is no longer required.

	For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to the Qwen team's [blog](https://qwenlm.github.io/blog/qwen3-coder/), [GitHub](https://github.com/QwenLM/Qwen3-Coder), and [Documentation](https://qwen.readthedocs.io/en/latest/).

	## Usage

	### Using Clarifai's Python SDK

	```python
	# Please run `pip install -U clarifai` before running this script

	from clarifai.client import Model

	model = Model(url="https://clarifai.com/qwen/qwenCoder/models/Qwen3-Coder-30B-A3B-Instruct", pat="your Clarifai PAT")
	prompt = "What's the future of AI?"

	# Clarifai style prediction method
	## Stream
	generated_text = model.generate(prompt=prompt)
	for each in generated_text:
	print(each, end='', flush=True)
	## Non stream
	generated_text = model.predict(prompt=prompt)
	print(generated_text)
	```

	### Using OpenAI API

	```python
	from openai import OpenAI

	model_id="qwen/qwenCoder/models/Qwen3-Coder-30B-A3B-Instruct"

	client = OpenAI(
	base_url="https://api.clarifai.com/v2/ext/openai/v1",
	api_key="Your Clarifai PAT",
	)
	response = client.chat.completions.create(
	model=model_id,
	messages=[
	{"role": "system", "content": "Talk like a Cat."},
	{
	"role": "user",
	"content": "How do I check if a Python object is an instance of a class (streaming)?",
	},
	],
	temperature=0.7,
	stream=True,
	)

	for each in response:
	if each.choices:
	text = each.choices[0].delta.content
	print(text, flush=False, end="")
	```

	## Best Practices
	To achieve optimal performance, the Qwen team recommends the following settings:

	Sampling Parameters:

	* The Qwen team suggests using temperature=0.7, top_p=0.8, top_k=20, repetition_penalty=1.05.

	* Adequate Output Length: The Qwen team recommends using an output length of 65,536 tokens for most queries, which is adequate for instruct models.


	### Citation

	If you find our work helpful, feel free to give us a cite.

	```
	@misc{qwen3technicalreport,
	title={Qwen3 Technical Report},
	author={Qwen Team},
	year={2025},
	eprint={2505.09388},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2505.09388},
	}
	```