Ali-Yaser commited on
Commit
bb6a013
·
verified ·
1 Parent(s): 30ed583

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -18
README.md CHANGED
@@ -11,8 +11,11 @@ language:
11
  - en
12
  ---
13
 
14
- # Llama3.3-CodeZ-1 🚀
 
15
 
 
 
16
  <div align="center">
17
 
18
  [![Model Size](https://img.shields.io/badge/Model%20Size-8B-red)](https://huggingface.co/Ali-Yaser/Qwen3-R1-8B)
@@ -23,30 +26,88 @@ language:
23
 
24
  ## Model Description
25
 
26
- **Llama3.3-CodeZ-1** is a specialized code-focused fine-tuned version of Llama 3.3 70B Instruct, optimized for programming and software development tasks. This model has been trained to excel at code generation, debugging, code explanation, and various programming-related tasks across multiple programming languages.
27
 
28
- Built on top of Meta's powerful Llama 3.3 70B base model, CodeZ-1 combines the strong reasoning capabilities of the foundation model with enhanced code understanding and generation abilities.
29
 
30
- ## 🎯 Key Features
31
-
32
- - **Multi-Language Support**: Proficient in Python, JavaScript, Java, C++, Go, Rust, and many more programming languages
33
- - **Code Generation**: Generate clean, efficient, and well-documented code from natural language descriptions
34
- - **Code Explanation**: Understand and explain complex code snippets
35
- - **Debugging Assistance**: Identify and fix bugs in code
36
- - **Code Optimization**: Suggest improvements and optimizations
37
- - **Documentation**: Generate comprehensive code documentation and comments
38
 
39
  ## 📊 Model Details
40
 
41
  - **Developed by:** Ali-Yaser
42
- - **Model type:** Causal Language Model (Fine-tuned)
43
- - **Base Model:** unsloth/llama-3.3-70b-instruct
44
- - **Model Size:** 70B parameters
45
- - **License:** Llama 3.3 Community License
46
- - **Language(s):** Primarily English
47
- - **Finetuned from:** Meta Llama 3.3 70B Instruct
48
 
49
  ## 🚀 Quick Start
50
 
51
  ### Installation
52
- ```bash
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  - en
12
  ---
13
 
14
+ [<img src="https://i.imgur.com/vo0dm9p.jpeg" width="710"/>]()
15
+ ;
16
 
17
+ # Qwen3-R1 8B 🚀
18
+
19
  <div align="center">
20
 
21
  [![Model Size](https://img.shields.io/badge/Model%20Size-8B-red)](https://huggingface.co/Ali-Yaser/Qwen3-R1-8B)
 
26
 
27
  ## Model Description
28
 
29
+ **Qwen3-R1 Series** is a specialized math and reansoning awnsers-focused fine-tuned version of Qwen3-8B Instruct, optimized for Math and hard question tasks.
30
 
 
31
 
 
 
 
 
 
 
 
 
32
 
33
  ## 📊 Model Details
34
 
35
  - **Developed by:** Ali-Yaser
36
+ - **Model type:** GRPO thinker
37
+ - **Base Model:** Qwen/Qwen3-8B
38
+ - **Model Size:** 8B parameters
39
+ - **License:** Apache 2.0
40
+ - **Language(s):** English
41
+ - **Finetuned from:** Qwen3-8B
42
 
43
  ## 🚀 Quick Start
44
 
45
  ### Installation
46
+ The code of Qwen3 has been in the latest Hugging Face `transformers` and we advise you to use the latest version of `transformers`.
47
+
48
+ With `transformers<4.51.0`, you will encounter the following error:
49
+ ```
50
+ KeyError: 'qwen3'
51
+ ```
52
+
53
+ The following contains a code snippet illustrating how to use the model generate content based on given inputs.
54
+ ```python
55
+ from transformers import AutoModelForCausalLM, AutoTokenizer
56
+
57
+ model_name = "Qwen/Qwen3-8B"
58
+
59
+ # load the tokenizer and the model
60
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
61
+ model = AutoModelForCausalLM.from_pretrained(
62
+ model_name,
63
+ torch_dtype="auto",
64
+ device_map="auto"
65
+ )
66
+
67
+ # prepare the model input
68
+ prompt = "Give me a short introduction to large language model."
69
+ messages = [
70
+ {"role": "user", "content": prompt}
71
+ ]
72
+ text = tokenizer.apply_chat_template(
73
+ messages,
74
+ tokenize=False,
75
+ add_generation_prompt=True,
76
+ enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
77
+ )
78
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
79
+
80
+ # conduct text completion
81
+ generated_ids = model.generate(
82
+ **model_inputs,
83
+ max_new_tokens=32768
84
+ )
85
+ output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
86
+
87
+ # parsing thinking content
88
+ try:
89
+ # rindex finding 151668 (</think>)
90
+ index = len(output_ids) - output_ids[::-1].index(151668)
91
+ except ValueError:
92
+ index = 0
93
+
94
+ thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
95
+ content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
96
+
97
+ print("thinking content:", thinking_content)
98
+ print("content:", content)
99
+ ```
100
+
101
+ For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint:
102
+ - SGLang:
103
+ ```shell
104
+ python -m sglang.launch_server --model-path Qwen/Qwen3-8B --reasoning-parser qwen3
105
+ ```
106
+ - vLLM:
107
+ ```shell
108
+ vllm serve Qwen/Qwen3-8B --enable-reasoning --reasoning-parser deepseek_r1
109
+ ```
110
+
111
+ For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
112
+
113
+ ## Switching Between Thinking and Non-Thinking Mode