phatv9's picture
Update README.md
b1e42e2 verified
---
library_name: transformers
tags: [clarifai,FP8]
---
# Qwen3-Coder-30B-A3B-Instruct
[Basemodel](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507)
![image](https://github.com/user-attachments/assets/b22c9807-f5e7-49eb-b00d-598e400781af)
Visit model playground at Clarifai https://clarifai.com/qwen/qwenCoder/models/Qwen3-Coder-30B-A3B-Instruct
## Highlights
**Qwen3-Coder** is available in multiple sizes. Today, the **Qwen team** is excited to introduce **Qwen3-Coder-30B-A3B-Instruct**. This streamlined model maintains impressive performance and efficiency, featuring the following key enhancements:
- **Significant Performance** among open models on **Agentic Coding**, **Agentic Browser-Use**, and other foundational coding tasks.
- **Long-context Capabilities** with native support for **256K** tokens, extendable up to **1M** tokens using Yarn, optimized for repository-scale understanding.
- **Agentic Coding** supporting most platforms such as **Qwen Code**, **CLINE**, featuring a specially designed function call format.
![image/jpeg](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Coder/qwen3-coder-30a3-main.jpg)
## Model Overview
**Qwen3-Coder-30B-A3B-Instruct** has the following features:
- Type: Causal Language Models
- Training Stage: Pretraining & Post-training
- Number of Parameters: 30.5B in total and 3.3B activated
- Number of Layers: 48
- Number of Attention Heads (GQA): 32 for Q and 4 for KV
- Number of Experts: 128
- Number of Activated Experts: 8
**NOTE: This model supports only non-thinking mode and does not generate ``<think></think>`` blocks in its output. Meanwhile, specifying `enable_thinking=False` is no longer required.**
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to the **Qwen team's** [blog](https://qwenlm.github.io/blog/qwen3-coder/), [GitHub](https://github.com/QwenLM/Qwen3-Coder), and [Documentation](https://qwen.readthedocs.io/en/latest/).
## Usage
### Using Clarifai's Python SDK
```python
# Please run `pip install -U clarifai` before running this script
from clarifai.client import Model
model = Model(url="https://clarifai.com/qwen/qwenCoder/models/Qwen3-Coder-30B-A3B-Instruct", pat="your Clarifai PAT")
prompt = "What's the future of AI?"
# Clarifai style prediction method
## Stream
generated_text = model.generate(prompt=prompt)
for each in generated_text:
print(each, end='', flush=True)
## Non stream
generated_text = model.predict(prompt=prompt)
print(generated_text)
```
### Using OpenAI API
```python
from openai import OpenAI
model_id="qwen/qwenCoder/models/Qwen3-Coder-30B-A3B-Instruct"
client = OpenAI(
base_url="https://api.clarifai.com/v2/ext/openai/v1",
api_key="Your Clarifai PAT",
)
response = client.chat.completions.create(
model=model_id,
messages=[
{"role": "system", "content": "Talk like a Cat."},
{
"role": "user",
"content": "How do I check if a Python object is an instance of a class (streaming)?",
},
],
temperature=0.7,
stream=True,
)
for each in response:
if each.choices:
text = each.choices[0].delta.content
print(text, flush=False, end="")
```
## Best Practices
To achieve optimal performance, the Qwen team recommends the following settings:
Sampling Parameters:
* The Qwen team suggests using temperature=0.7, top_p=0.8, top_k=20, repetition_penalty=1.05.
* Adequate Output Length: The Qwen team recommends using an output length of 65,536 tokens for most queries, which is adequate for instruct models.
### Citation
If you find our work helpful, feel free to give us a cite.
```
@misc{qwen3technicalreport,
title={Qwen3 Technical Report},
author={Qwen Team},
year={2025},
eprint={2505.09388},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.09388},
}
```