---
library_name: transformers
tags: [clarifai,FP8]
---

# Qwen3-Coder-30B-A3B-Instruct

[Basemodel](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507)

![image](https://github.com/user-attachments/assets/b22c9807-f5e7-49eb-b00d-598e400781af)

Visit model playground at Clarifai https://clarifai.com/qwen/qwenCoder/models/Qwen3-Coder-30B-A3B-Instruct

## Highlights

**Qwen3-Coder** is available in multiple sizes. Today, the **Qwen team** is excited to introduce **Qwen3-Coder-30B-A3B-Instruct**. This streamlined model maintains impressive performance and efficiency, featuring the following key enhancements:  

- **Significant Performance** among open models on **Agentic Coding**, **Agentic Browser-Use**, and other foundational coding tasks.
- **Long-context Capabilities** with native support for **256K** tokens, extendable up to **1M** tokens using Yarn, optimized for repository-scale understanding.
- **Agentic Coding** supporting most platforms such as **Qwen Code**, **CLINE**, featuring a specially designed function call format.

![image/jpeg](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Coder/qwen3-coder-30a3-main.jpg)

## Model Overview

**Qwen3-Coder-30B-A3B-Instruct** has the following features:
- Type: Causal Language Models
- Training Stage: Pretraining & Post-training
- Number of Parameters: 30.5B in total and 3.3B activated
- Number of Layers: 48
- Number of Attention Heads (GQA): 32 for Q and 4 for KV
- Number of Experts: 128
- Number of Activated Experts: 8

**NOTE: This model supports only non-thinking mode and does not generate ``<think></think>`` blocks in its output. Meanwhile, specifying `enable_thinking=False` is no longer required.**

For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to the **Qwen team's** [blog](https://qwenlm.github.io/blog/qwen3-coder/), [GitHub](https://github.com/QwenLM/Qwen3-Coder), and [Documentation](https://qwen.readthedocs.io/en/latest/).

## Usage

### Using Clarifai's Python SDK

```python
# Please run `pip install -U clarifai` before running this script

from clarifai.client import Model

model = Model(url="https://clarifai.com/qwen/qwenCoder/models/Qwen3-Coder-30B-A3B-Instruct", pat="your Clarifai PAT")
prompt = "What's the future of AI?"

# Clarifai style prediction method
## Stream
generated_text = model.generate(prompt=prompt)
for each in generated_text:
    print(each, end='', flush=True)
## Non stream
generated_text = model.predict(prompt=prompt)
print(generated_text)
```

### Using OpenAI API

```python
from openai import OpenAI

model_id="qwen/qwenCoder/models/Qwen3-Coder-30B-A3B-Instruct"

client = OpenAI(
    base_url="https://api.clarifai.com/v2/ext/openai/v1",
    api_key="Your Clarifai PAT",
)
response = client.chat.completions.create(
    model=model_id,
    messages=[
        {"role": "system", "content": "Talk like a Cat."},
        {
            "role": "user",
            "content": "How do I check if a Python object is an instance of a class (streaming)?",
        },
    ],
    temperature=0.7,
    stream=True,
)

for each in response:
  if each.choices:
    text = each.choices[0].delta.content
    print(text, flush=False, end="")
```

## Best Practices
To achieve optimal performance, the Qwen team recommends the following settings:

Sampling Parameters:

* The Qwen team suggests using temperature=0.7, top_p=0.8, top_k=20, repetition_penalty=1.05.

* Adequate Output Length: The Qwen team recommends using an output length of 65,536 tokens for most queries, which is adequate for instruct models.


### Citation

If you find our work helpful, feel free to give us a cite.

```
@misc{qwen3technicalreport,
      title={Qwen3 Technical Report}, 
      author={Qwen Team},
      year={2025},
      eprint={2505.09388},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.09388}, 
}
```