Ali-Yaser
/

Qwen3-R1-8B

@@ -11,8 +11,11 @@ language:
 - en
 ---
-# Llama3.3-CodeZ-1 🚀
 <div align="center">
 [![Model Size](https://img.shields.io/badge/Model%20Size-8B-red)](https://huggingface.co/Ali-Yaser/Qwen3-R1-8B)
@@ -23,30 +26,88 @@ language:
 ## Model Description
-**Llama3.3-CodeZ-1** is a specialized code-focused fine-tuned version of Llama 3.3 70B Instruct, optimized for programming and software development tasks. This model has been trained to excel at code generation, debugging, code explanation, and various programming-related tasks across multiple programming languages.
-Built on top of Meta's powerful Llama 3.3 70B base model, CodeZ-1 combines the strong reasoning capabilities of the foundation model with enhanced code understanding and generation abilities.
-## 🎯 Key Features
-- **Multi-Language Support**: Proficient in Python, JavaScript, Java, C++, Go, Rust, and many more programming languages
-- **Code Generation**: Generate clean, efficient, and well-documented code from natural language descriptions
-- **Code Explanation**: Understand and explain complex code snippets
-- **Debugging Assistance**: Identify and fix bugs in code
-- **Code Optimization**: Suggest improvements and optimizations
-- **Documentation**: Generate comprehensive code documentation and comments
 ## 📊 Model Details
 - **Developed by:** Ali-Yaser
-- **Model type:** Causal Language Model (Fine-tuned)
-- **Base Model:** unsloth/llama-3.3-70b-instruct
-- **Model Size:** 70B parameters
-- **License:** Llama 3.3 Community License
-- **Language(s):** Primarily English
-- **Finetuned from:** Meta Llama 3.3 70B Instruct
 ## 🚀 Quick Start
 ### Installation
-```bash

 - en
 ---
+[<img src="https://i.imgur.com/vo0dm9p.jpeg" width="710"/>]()
+;
+# Qwen3-R1 8B 🚀
 <div align="center">
 [![Model Size](https://img.shields.io/badge/Model%20Size-8B-red)](https://huggingface.co/Ali-Yaser/Qwen3-R1-8B)
 ## Model Description
+**Qwen3-R1 Series** is a specialized math and reansoning awnsers-focused fine-tuned version of Qwen3-8B Instruct, optimized for Math and hard question tasks.
 ## 📊 Model Details
 - **Developed by:** Ali-Yaser
+- **Model type:** GRPO thinker
+- **Base Model:** Qwen/Qwen3-8B
+- **Model Size:** 8B parameters
+- **License:** Apache 2.0
+- **Language(s):** English
+- **Finetuned from:** Qwen3-8B
 ## 🚀 Quick Start
 ### Installation
+The code of Qwen3 has been in the latest Hugging Face `transformers` and we advise you to use the latest version of `transformers`.
+With `transformers<4.51.0`, you will encounter the following error:
+```
+KeyError: 'qwen3'
+```
+The following contains a code snippet illustrating how to use the model generate content based on given inputs.
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "Qwen/Qwen3-8B"
+# load the tokenizer and the model
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+# prepare the model input
+prompt = "Give me a short introduction to large language model."
+messages = [
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+# conduct text completion
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=32768
+)
+output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
+# parsing thinking content
+try:
+    # rindex finding 151668 (</think>)
+    index = len(output_ids) - output_ids[::-1].index(151668)
+except ValueError:
+    index = 0
+thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
+content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
+print("thinking content:", thinking_content)
+print("content:", content)
+```
+For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint:
+- SGLang:
+    ```shell
+    python -m sglang.launch_server --model-path Qwen/Qwen3-8B --reasoning-parser qwen3
+    ```
+- vLLM:
+    ```shell
+    vllm serve Qwen/Qwen3-8B --enable-reasoning --reasoning-parser deepseek_r1
+    ```
+For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
+## Switching Between Thinking and Non-Thinking Mode