--- library_name: transformers tags: [clarifai,FP8] --- # Qwen3-Coder-30B-A3B-Instruct [Basemodel](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507) ![image](https://github.com/user-attachments/assets/b22c9807-f5e7-49eb-b00d-598e400781af) Visit model playground at Clarifai https://clarifai.com/qwen/qwenCoder/models/Qwen3-Coder-30B-A3B-Instruct ## Highlights **Qwen3-Coder** is available in multiple sizes. Today, the **Qwen team** is excited to introduce **Qwen3-Coder-30B-A3B-Instruct**. This streamlined model maintains impressive performance and efficiency, featuring the following key enhancements: - **Significant Performance** among open models on **Agentic Coding**, **Agentic Browser-Use**, and other foundational coding tasks. - **Long-context Capabilities** with native support for **256K** tokens, extendable up to **1M** tokens using Yarn, optimized for repository-scale understanding. - **Agentic Coding** supporting most platforms such as **Qwen Code**, **CLINE**, featuring a specially designed function call format. ![image/jpeg](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Coder/qwen3-coder-30a3-main.jpg) ## Model Overview **Qwen3-Coder-30B-A3B-Instruct** has the following features: - Type: Causal Language Models - Training Stage: Pretraining & Post-training - Number of Parameters: 30.5B in total and 3.3B activated - Number of Layers: 48 - Number of Attention Heads (GQA): 32 for Q and 4 for KV - Number of Experts: 128 - Number of Activated Experts: 8 **NOTE: This model supports only non-thinking mode and does not generate ```` blocks in its output. Meanwhile, specifying `enable_thinking=False` is no longer required.** For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to the **Qwen team's** [blog](https://qwenlm.github.io/blog/qwen3-coder/), [GitHub](https://github.com/QwenLM/Qwen3-Coder), and [Documentation](https://qwen.readthedocs.io/en/latest/). ## Usage ### Using Clarifai's Python SDK ```python # Please run `pip install -U clarifai` before running this script from clarifai.client import Model model = Model(url="https://clarifai.com/qwen/qwenCoder/models/Qwen3-Coder-30B-A3B-Instruct", pat="your Clarifai PAT") prompt = "What's the future of AI?" # Clarifai style prediction method ## Stream generated_text = model.generate(prompt=prompt) for each in generated_text: print(each, end='', flush=True) ## Non stream generated_text = model.predict(prompt=prompt) print(generated_text) ``` ### Using OpenAI API ```python from openai import OpenAI model_id="qwen/qwenCoder/models/Qwen3-Coder-30B-A3B-Instruct" client = OpenAI( base_url="https://api.clarifai.com/v2/ext/openai/v1", api_key="Your Clarifai PAT", ) response = client.chat.completions.create( model=model_id, messages=[ {"role": "system", "content": "Talk like a Cat."}, { "role": "user", "content": "How do I check if a Python object is an instance of a class (streaming)?", }, ], temperature=0.7, stream=True, ) for each in response: if each.choices: text = each.choices[0].delta.content print(text, flush=False, end="") ``` ## Best Practices To achieve optimal performance, the Qwen team recommends the following settings: Sampling Parameters: * The Qwen team suggests using temperature=0.7, top_p=0.8, top_k=20, repetition_penalty=1.05. * Adequate Output Length: The Qwen team recommends using an output length of 65,536 tokens for most queries, which is adequate for instruct models. ### Citation If you find our work helpful, feel free to give us a cite. ``` @misc{qwen3technicalreport, title={Qwen3 Technical Report}, author={Qwen Team}, year={2025}, eprint={2505.09388}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2505.09388}, } ```