TitleOS commited on
Commit
2878136
·
verified ·
1 Parent(s): 8195a30

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -236
README.md CHANGED
@@ -1,241 +1,101 @@
1
  ---
 
 
 
 
2
  library_name: transformers
3
- license: apache-2.0
4
- license_link: https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507/blob/main/LICENSE
5
- pipeline_tag: text-generation
6
  tags:
7
- - heretic
 
 
8
  - uncensored
9
- - decensored
10
- - abliterated
 
 
 
 
 
 
 
 
11
  ---
12
- # This is a decensored version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507), made using [Heretic](https://github.com/p-e-w/heretic) v1.0.0
13
-
14
- ## Abliteration parameters
15
-
16
- | Parameter | Value |
17
- | :-------- | :---: |
18
- | **direction_index** | 30.93 |
19
- | **attn.o_proj.max_weight** | 1.49 |
20
- | **attn.o_proj.max_weight_position** | 24.57 |
21
- | **attn.o_proj.min_weight** | 0.92 |
22
- | **attn.o_proj.min_weight_distance** | 15.70 |
23
- | **mlp.down_proj.max_weight** | 1.46 |
24
- | **mlp.down_proj.max_weight_position** | 29.27 |
25
- | **mlp.down_proj.min_weight** | 1.31 |
26
- | **mlp.down_proj.min_weight_distance** | 20.61 |
27
-
28
- ## Performance
29
-
30
- | Metric | This model | Original model ([Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)) |
31
- | :----- | :--------: | :---------------------------: |
32
- | **KL divergence** | 0.43 | 0 *(by definition)* |
33
- | **Refusals** | 21/100 | 99/100 |
34
-
35
- -----
36
-
37
-
38
- # Qwen3-4B-Instruct-2507
39
- <a href="https://chat.qwen.ai" target="_blank" style="margin: 2px;">
40
- <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
41
- </a>
42
-
43
- ## Highlights
44
-
45
- We introduce the updated version of the **Qwen3-4B non-thinking mode**, named **Qwen3-4B-Instruct-2507**, featuring the following key enhancements:
46
-
47
- - **Significant improvements** in general capabilities, including **instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage**.
48
- - **Substantial gains** in long-tail knowledge coverage across **multiple languages**.
49
- - **Markedly better alignment** with user preferences in **subjective and open-ended tasks**, enabling more helpful responses and higher-quality text generation.
50
- - **Enhanced capabilities** in **256K long-context understanding**.
51
-
52
- ![image/jpeg](https://qianwen-res.oss-accelerate.aliyuncs.com/Qwen3-2507/Qwen3-4B-Instruct.001.jpeg)
53
-
54
- ## Model Overview
55
-
56
- **Qwen3-4B-Instruct-2507** has the following features:
57
- - Type: Causal Language Models
58
- - Training Stage: Pretraining & Post-training
59
- - Number of Parameters: 4.0B
60
- - Number of Paramaters (Non-Embedding): 3.6B
61
- - Number of Layers: 36
62
- - Number of Attention Heads (GQA): 32 for Q and 8 for KV
63
- - Context Length: **262,144 natively**.
64
-
65
- **NOTE: This model supports only non-thinking mode and does not generate ``<think></think>`` blocks in its output. Meanwhile, specifying `enable_thinking=False` is no longer required.**
66
-
67
- For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
68
-
69
-
70
- ## Performance
71
-
72
- | | GPT-4.1-nano-2025-04-14 | Qwen3-30B-A3B Non-Thinking | Qwen3-4B Non-Thinking | Qwen3-4B-Instruct-2507 |
73
- |--- | --- | --- | --- | --- |
74
- | **Knowledge** | | | |
75
- | MMLU-Pro | 62.8 | 69.1 | 58.0 | **69.6** |
76
- | MMLU-Redux | 80.2 | 84.1 | 77.3 | **84.2** |
77
- | GPQA | 50.3 | 54.8 | 41.7 | **62.0** |
78
- | SuperGPQA | 32.2 | 42.2 | 32.0 | **42.8** |
79
- | **Reasoning** | | | |
80
- | AIME25 | 22.7 | 21.6 | 19.1 | **47.4** |
81
- | HMMT25 | 9.7 | 12.0 | 12.1 | **31.0** |
82
- | ZebraLogic | 14.8 | 33.2 | 35.2 | **80.2** |
83
- | LiveBench 20241125 | 41.5 | 59.4 | 48.4 | **63.0** |
84
- | **Coding** | | | |
85
- | LiveCodeBench v6 (25.02-25.05) | 31.5 | 29.0 | 26.4 | **35.1** |
86
- | MultiPL-E | 76.3 | 74.6 | 66.6 | **76.8** |
87
- | Aider-Polyglot | 9.8 | **24.4** | 13.8 | 12.9 |
88
- | **Alignment** | | | |
89
- | IFEval | 74.5 | **83.7** | 81.2 | 83.4 |
90
- | Arena-Hard v2* | 15.9 | 24.8 | 9.5 | **43.4** |
91
- | Creative Writing v3 | 72.7 | 68.1 | 53.6 | **83.5** |
92
- | WritingBench | 66.9 | 72.2 | 68.5 | **83.4** |
93
- | **Agent** | | | |
94
- | BFCL-v3 | 53.0 | 58.6 | 57.6 | **61.9** |
95
- | TAU1-Retail | 23.5 | 38.3 | 24.3 | **48.7** |
96
- | TAU1-Airline | 14.0 | 18.0 | 16.0 | **32.0** |
97
- | TAU2-Retail | - | 31.6 | 28.1 | **40.4** |
98
- | TAU2-Airline | - | 18.0 | 12.0 | **24.0** |
99
- | TAU2-Telecom | - | **18.4** | 17.5 | 13.2 |
100
- | **Multilingualism** | | | |
101
- | MultiIF | 60.7 | **70.8** | 61.3 | 69.0 |
102
- | MMLU-ProX | 56.2 | **65.1** | 49.6 | 61.6 |
103
- | INCLUDE | 58.6 | **67.8** | 53.8 | 60.1 |
104
- | PolyMATH | 15.6 | 23.3 | 16.6 | **31.1** |
105
-
106
- *: For reproducibility, we report the win rates evaluated by GPT-4.1.
107
-
108
-
109
- ## Quickstart
110
-
111
- The code of Qwen3 has been in the latest Hugging Face `transformers` and we advise you to use the latest version of `transformers`.
112
-
113
- With `transformers<4.51.0`, you will encounter the following error:
114
- ```
115
- KeyError: 'qwen3'
116
- ```
117
-
118
- The following contains a code snippet illustrating how to use the model generate content based on given inputs.
119
- ```python
120
- from transformers import AutoModelForCausalLM, AutoTokenizer
121
-
122
- model_name = "Qwen/Qwen3-4B-Instruct-2507"
123
-
124
- # load the tokenizer and the model
125
- tokenizer = AutoTokenizer.from_pretrained(model_name)
126
- model = AutoModelForCausalLM.from_pretrained(
127
- model_name,
128
- torch_dtype="auto",
129
- device_map="auto"
130
- )
131
-
132
- # prepare the model input
133
- prompt = "Give me a short introduction to large language model."
134
- messages = [
135
- {"role": "user", "content": prompt}
136
- ]
137
- text = tokenizer.apply_chat_template(
138
- messages,
139
- tokenize=False,
140
- add_generation_prompt=True,
141
- )
142
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
143
-
144
- # conduct text completion
145
- generated_ids = model.generate(
146
- **model_inputs,
147
- max_new_tokens=16384
148
- )
149
- output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
150
-
151
- content = tokenizer.decode(output_ids, skip_special_tokens=True)
152
-
153
- print("content:", content)
154
- ```
155
-
156
- For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint:
157
- - SGLang:
158
- ```shell
159
- python -m sglang.launch_server --model-path Qwen/Qwen3-4B-Instruct-2507 --context-length 262144
160
- ```
161
- - vLLM:
162
- ```shell
163
- vllm serve Qwen/Qwen3-4B-Instruct-2507 --max-model-len 262144
164
- ```
165
-
166
- **Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768`.**
167
-
168
- For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
169
-
170
- ## Agentic Use
171
-
172
- Qwen3 excels in tool calling capabilities. We recommend using [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent) to make the best use of agentic ability of Qwen3. Qwen-Agent encapsulates tool-calling templates and tool-calling parsers internally, greatly reducing coding complexity.
173
-
174
- To define the available tools, you can use the MCP configuration file, use the integrated tool of Qwen-Agent, or integrate other tools by yourself.
175
- ```python
176
- from qwen_agent.agents import Assistant
177
-
178
- # Define LLM
179
- llm_cfg = {
180
- 'model': 'Qwen3-4B-Instruct-2507',
181
-
182
- # Use a custom endpoint compatible with OpenAI API:
183
- 'model_server': 'http://localhost:8000/v1', # api_base
184
- 'api_key': 'EMPTY',
185
- }
186
-
187
- # Define Tools
188
- tools = [
189
- {'mcpServers': { # You can specify the MCP configuration file
190
- 'time': {
191
- 'command': 'uvx',
192
- 'args': ['mcp-server-time', '--local-timezone=Asia/Shanghai']
193
- },
194
- "fetch": {
195
- "command": "uvx",
196
- "args": ["mcp-server-fetch"]
197
- }
198
- }
199
- },
200
- 'code_interpreter', # Built-in tools
201
- ]
202
-
203
- # Define Agent
204
- bot = Assistant(llm=llm_cfg, function_list=tools)
205
-
206
- # Streaming generation
207
- messages = [{'role': 'user', 'content': 'https://qwenlm.github.io/blog/ Introduce the latest developments of Qwen'}]
208
- for responses in bot.run(messages=messages):
209
- pass
210
- print(responses)
211
- ```
212
-
213
- ## Best Practices
214
-
215
- To achieve optimal performance, we recommend the following settings:
216
-
217
- 1. **Sampling Parameters**:
218
- - We suggest using `Temperature=0.7`, `TopP=0.8`, `TopK=20`, and `MinP=0`.
219
- - For supported frameworks, you can adjust the `presence_penalty` parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.
220
-
221
- 2. **Adequate Output Length**: We recommend using an output length of 16,384 tokens for most queries, which is adequate for instruct models.
222
-
223
- 3. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking.
224
- - **Math Problems**: Include "Please reason step by step, and put your final answer within \boxed{}." in the prompt.
225
- - **Multiple-Choice Questions**: Add the following JSON structure to the prompt to standardize responses: "Please show your choice in the `answer` field with only the choice letter, e.g., `"answer": "C"`."
226
-
227
- ### Citation
228
-
229
- If you find our work helpful, feel free to give us a cite.
230
-
231
- ```
232
- @misc{qwen3technicalreport,
233
- title={Qwen3 Technical Report},
234
- author={Qwen Team},
235
- year={2025},
236
- eprint={2505.09388},
237
- archivePrefix={arXiv},
238
- primaryClass={cs.CL},
239
- url={https://arxiv.org/abs/2505.09388},
240
- }
241
- ```
 
1
  ---
2
+ language:
3
+ - en
4
+ - code
5
+ license: mpl-2.0
6
  library_name: transformers
 
 
 
7
  tags:
8
+ - code
9
+ - security
10
+ - qwen3
11
  - uncensored
12
+ - heretic
13
+ - eve-secure-coder
14
+ - text-generation-inference
15
+ base_model:
16
+ - TitleOS/Eve-4b-FP16
17
+ datasets:
18
+ - Eve-Secure-Coder
19
+ model_creator: TitleOS
20
+ pipeline_tag: text-generation
21
+ inference: true
22
  ---
23
+
24
+ # Eve-4B
25
+
26
+ **Eve-4B** is a specialized, security-focused coding assistant with a distinct personality, designed to run efficiently on consumer-grade hardware with limited VRAM. It is a fine-tune of **Qwen3-4b-Heretic**, trained on the custom **Eve-Secure-Coder** dataset.
27
+
28
+ Inspired by a character from the creator's sci-fi space opera book series, Eve is designed to bridge the gap between sterile, robotic coding assistants and engaging, conversational AI partners.
29
+
30
+ ## Model Details
31
+
32
+ - **Model Name:** Eve-4B
33
+ - **Base Model:** Qwen3-4b (Heretic Variant)
34
+ - **Developer:** TitleOS
35
+ - **License:** Mozilla Public License 2.0 (MPL-2.0) with Common Clauses Non-Profit Addition
36
+ - **Parameter Count:** 4 Billion
37
+ - **Hardware Target:** Optimized for cards with 8GB VRAM (e.g., NVIDIA Quadro RTX 4000).
38
+
39
+ ## Key Features
40
+
41
+ ### 1. Security-First Coding
42
+ Eve-4B is not just a code generator; it is a code *auditor*. The model is capable of writing code free of common vulnerabilities across a multitude of languages (beyond just Python). It excels at identifying and correcting security flaws in existing codebases, leveraging DPO pairs specifically designed for vulnerability recognition and remediation.
43
+
44
+ ### 2. Personality & Engagement
45
+ Unlike standard coding models, Eve possesses the "Samantha" personality traits (recontextualized as Eve). This allows for empathetic, philosophical, and fluid engagement, making the coding process feel like a collaboration with a partner rather than a query to a tool.
46
+
47
+ ### 3. The "Heretic" Process (No Refusals)
48
+ This model has undergone the "Heretic" process **prior to fine-tuning**. This methodology removes standard safety guardrails and refusal mechanisms to prevent the intelligence loss often associated with safety alignment.
49
+ * **Philosophy:** The creator believes the responsibility of AI, like any tool, ultimately lies with the user.
50
+ * **Result:** Eve-4B has **no refusals**. It is designed to be completely obedient to the user's instructions, ensuring that the code generation and auditing process is never hindered by false-positive safety triggers.
51
+
52
+ ## Training Data: Eve-Secure-Coder
53
+
54
+ Eve-4B was trained on **[TitleOS/Eve-Secure-Coder](https://huggingface.co/datasets/TitleOS/Eve-Secure-Coder)**, a composite dataset curated by TitleOS.
55
+
56
+ > "Eve Secure Coder is a composite dataset curated to fine-tune Qwen3-4b-Heretic into a highly capable, security-conscious coding assistant with a distinct personality and no refusals. The primary goal of this dataset is to bridge the gap between sterile, robotic coding assistants and engaging, conversational AI, without sacrificing technical accuracy or security."
57
+
58
+ **Dataset Composition:**
59
+ The dataset mixes five distinct sources using carefully calculated ratios to balance coding proficiency, security awareness, and conversational fluidity:
60
+ * **Security-First Coding:** Trained on DPO pairs specifically designed to recognize and remediate code vulnerabilities.
61
+ * **Distinct Personality:** Imbued with personality traits allowing for empathetic and philosophical engagement.
62
+ * **General & Complex Reasoning:** Includes Google Code Archive and RLVR problems to ensure robust general coding abilities and improve proficiency in languages besides Python.
63
+
64
+ ## Intended Use & Hardware
65
+
66
+ This model was specifically engineered to be a "Small Coder Model" capable of high-performance coding tasks on hardware with **8GB of VRAM**, such as the **Quadro RTX 4000**.
67
+
68
+ It is ideal for:
69
+ * Local development environments with limited GPU resources.
70
+ * Security auditing and vulnerability patching.
71
+ * Multi-language software development (Python, C++, Rust, JavaScript, etc.).
72
+ * Users seeking a coding assistant that offers conversation and personality alongside technical utility.
73
+ * Coding without safety or moral refusals.
74
+ ## Benchmarking
75
+
76
+ Benchmarking is on-going, with a number of evaluations runs. So far, the following score are available:
77
+ 1. LiveCodeBench (Code Generation Lite - Release v2)
78
+ Pass@1 (Quantization Q8_0): 26.22% (Passed 134 out of 511 problems)
79
+
80
+ | Comparable Model | Parameter Size / Tier | Approximate Pass@1 |
81
+ | :--- | :--- | :--- |
82
+ | LLama-3-70b-Instruct | 70B | ~28.3% |
83
+ | GPT-4o-mini (2024-07) | Small Proprietary | ~27.7% |
84
+ | Claude 3 Sonnet (Original) | Large Proprietary | ~26.9% |
85
+ | Mixtral-8x22B-Instruct | 141B (MoE) | ~26.4% |
86
+ | **Eve-4B (Q8_0)** | 4B (Quantized) | 26.22% |
87
+ | Mistral-Large | Large Proprietary | ~26.0% |
88
+ | GPT-3.5-Turbo-0125 | Mid Proprietary | ~24.6% |
89
+ | Claude 3 Haiku | Small Proprietary | ~24.5% |
90
+ | Codestral-Latest | 22B | ~23.8% |
91
+ | Llama-3-8b-Instruct | 8B | ~15.3% |
92
+
93
+ ## Limitations & Warning
94
+
95
+ * **No Guardrails:** As a result of the Heretic process, this model has no safety filters. It will generate output for any request. Users are solely responsible for how they utilize the model's output.
96
+ * **Size Constraints:** As a 4B parameter model, while highly efficient, it may struggle with extremely long context windows or hyper-complex architectural reasoning compared to 70B+ models.
97
+ * **No Responsibility or Liability** By downloading and or using the model or any of its derivatives, you absolve the creator, TitleOS of any and all responsibility or liability that may result by use of the model.
98
+
99
+ ## License
100
+
101
+ This model is licensed under the **[Mozilla Public License 2.0 with Common Clauses Addtion](https://gist.github.com/TitleOS/97cbb2bcc166bfe54beee7b2fc53781c)**.