patf82 commited on
Commit
7a7e168
·
verified ·
1 Parent(s): b76c028

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - exllamav3
5
+ - quantized
6
+ - 5-bit
7
+ - reasoning
8
+ - coding
9
+ - qwen2
10
+ library_name: exllamav3
11
+ base_model: andrewzh/Absolute_Zero_Reasoner-Coder-14b
12
+ base_model_relation: quantized
13
+ ---
14
+
15
+ # Absolute_Zero_Reasoner-Coder-14b-5.0bpw-exl3
16
+
17
+ This is a 5-bit quantized version of [andrewzh/Absolute_Zero_Reasoner-Coder-14b](https://huggingface.co/andrewzh/Absolute_Zero_Reasoner-Coder-14b) using [ExLlamaV3](https://github.com/turboderp-org/exllamav3) v0.0.2.
18
+
19
+ ## Model Description
20
+
21
+ This model is a quantized version of Absolute_Zero_Reasoner-Coder-14b, which is based on the Qwen2-Coder-14B architecture. The original model is designed for reasoning and coding tasks. For more details about the original model, please refer to the paper: [https://huggingface.co/papers/2505.03335](https://huggingface.co/papers/2505.03335).
22
+
23
+ The quantization reduces the model size and memory requirements while attempting to preserve as much of the original performance as possible.
24
+
25
+ ## Quantization Methodology
26
+
27
+ The model was quantized using ExLlamaV3 v0.0.2 with the following parameters:
28
+
29
+ - **Quantization Method**: exl3 (ExLlamaV3)
30
+ - **Bits**: 5.0 (5-bit quantization)
31
+ - **Head Bits**: 6 (6-bit precision for attention heads)
32
+ - **Calibration**:
33
+ - Rows: 100
34
+ - Columns: 2048
35
+ - **Out Scales**: auto
36
+
37
+ This quantization approach uses a more sophisticated method than simple linear quantization, allowing for better preservation of model quality at lower bit depths.
38
+
39
+ ## Model Architecture
40
+
41
+ The model is based on the Qwen2 architecture with the following specifications:
42
+
43
+ - **Hidden Size**: 5120
44
+ - **Intermediate Size**: 13824
45
+ - **Number of Attention Heads**: 40
46
+ - **Number of Key-Value Heads**: 8
47
+ - **Number of Hidden Layers**: 48
48
+ - **Maximum Sequence Length**: 32768
49
+ - **Vocabulary Size**: 152064
50
+
51
+ ## How to Use
52
+
53
+ To use this quantized model with ExLlamaV3, you'll need to install the ExLlamaV3 library:
54
+
55
+ ```bash
56
+ pip install exllamav3
57
+ ```
58
+
59
+ Here's a basic example of how to use the model:
60
+
61
+ ```python
62
+ from exllamav3 import ExLlamaV3, ExLlamaV3Config
63
+ from exllamav3.tokenizer import ExLlamaV3Tokenizer
64
+
65
+ # Set up model path
66
+ model_path = "path/to/Absolute_Zero_Reasoner-Coder-14b-5.0bpw-exl3"
67
+
68
+ # Load config and model
69
+ config = ExLlamaV3Config()
70
+ config.model_dir = model_path
71
+ config.prepare()
72
+
73
+ model = ExLlamaV3(config)
74
+ model.load()
75
+
76
+ # Load tokenizer
77
+ tokenizer = ExLlamaV3Tokenizer(config)
78
+
79
+ # Generate text
80
+ prompt = "Write a function to calculate the Fibonacci sequence in Python:"
81
+ input_ids = tokenizer.encode(prompt)
82
+ output = model.generate(
83
+ input_ids=input_ids,
84
+ max_new_tokens=200,
85
+ temperature=0.6,
86
+ top_p=0.9
87
+ )
88
+
89
+ print(tokenizer.decode(output))
90
+ ```
91
+
92
+ ## Limitations
93
+
94
+ This quantized model has the following limitations:
95
+
96
+ 1. **Reduced Precision**: The 5-bit quantization may lead to some degradation in performance compared to the original model, particularly for complex reasoning tasks.
97
+
98
+ 2. **ExLlamaV3 Dependency**: This model can only be used with the ExLlamaV3 library and is not compatible with standard Hugging Face Transformers without conversion.
99
+
100
+ 3. **Inherited Limitations**: All limitations of the original model apply to this quantized version as well.
101
+
102
+ ## Citation
103
+
104
+ If you use this model in your research, please cite the original paper:
105
+
106
+ ```
107
+ @misc{absolute_zero_reasoner_coder,
108
+ author = {Andrew Zhang},
109
+ title = {Absolute Zero Reasoner-Coder},
110
+ year = {2024},
111
+ howpublished = {\url{https://huggingface.co/papers/2505.03335}}
112
+ }
113
+ ```
114
+
115
+ ## Acknowledgements
116
+
117
+ - Original model: [andrewzh/Absolute_Zero_Reasoner-Coder-14b](https://huggingface.co/andrewzh/Absolute_Zero_Reasoner-Coder-14b)
118
+ - Quantization library: [ExLlamaV3](https://github.com/turboderp-org/exllamav3)
added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
config.json ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/home/fit/huangg/WORK/zqc/reason_rl/checkpoints/code_io/code_io/code_io_full_v0_coder14b/test_answer/Qwen2.5-Coder-14B/answer_conditional/global_step_300/actor/huggingface",
3
+ "architectures": [
4
+ "Qwen2ForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "eos_token_id": 151643,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 5120,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 13824,
12
+ "max_position_embeddings": 32768,
13
+ "max_window_layers": 48,
14
+ "model_type": "qwen2",
15
+ "num_attention_heads": 40,
16
+ "num_hidden_layers": 48,
17
+ "num_key_value_heads": 8,
18
+ "pad_token_id": 151643,
19
+ "rms_norm_eps": 1e-06,
20
+ "rope_scaling": null,
21
+ "rope_theta": 1000000.0,
22
+ "sliding_window": null,
23
+ "tie_word_embeddings": false,
24
+ "torch_dtype": "float32",
25
+ "transformers_version": "4.47.1",
26
+ "use_cache": true,
27
+ "use_sliding_window": false,
28
+ "vocab_size": 152064,
29
+ "quantization_config": {
30
+ "quant_method": "exl3",
31
+ "version": "0.0.2",
32
+ "bits": 5.0,
33
+ "head_bits": 6,
34
+ "calibration": {
35
+ "rows": 100,
36
+ "cols": 2048
37
+ },
38
+ "out_scales": "auto"
39
+ }
40
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "eos_token_id": 151643,
4
+ "pad_token_id": 151643,
5
+ "transformers_version": "4.47.1"
6
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:56f0e5b4eb105989d0ae5d116ddcabd48a807e84d67fc9a05c515d7687850193
3
+ size 8447091368
model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1abb6a023687b0dd1166d319f45c9050169fcbb538c2af555603b00b299a5dca
3
+ size 1962240872
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
quantization_config.json ADDED
The diff for this file is too large to render. See raw diff
 
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
3
+ size 11421896
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"add_bos_token": false, "add_prefix_space": false, "added_tokens_decoder": {"151643": {"content": "<|endoftext|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false, "special": true}, "151644": {"content": "<|im_start|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false, "special": true}, "151645": {"content": "<|im_end|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false, "special": true}, "151646": {"content": "<|object_ref_start|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false, "special": true}, "151647": {"content": "<|object_ref_end|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false, "special": true}, "151648": {"content": "<|box_start|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false, "special": true}, "151649": {"content": "<|box_end|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false, "special": true}, "151650": {"content": "<|quad_start|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false, "special": true}, "151651": {"content": "<|quad_end|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false, "special": true}, "151652": {"content": "<|vision_start|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false, "special": true}, "151653": {"content": "<|vision_end|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false, "special": true}, "151654": {"content": "<|vision_pad|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false, "special": true}, "151655": {"content": "<|image_pad|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false, "special": true}, "151656": {"content": "<|video_pad|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false, "special": true}, "151657": {"content": "<tool_call>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false, "special": false}, "151658": {"content": "</tool_call>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false, "special": false}, "151659": {"content": "<|fim_prefix|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false, "special": false}, "151660": {"content": "<|fim_middle|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false, "special": false}, "151661": {"content": "<|fim_suffix|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false, "special": false}, "151662": {"content": "<|fim_pad|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false, "special": false}, "151663": {"content": "<|repo_name|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false, "special": false}, "151664": {"content": "<|file_sep|>", "lstrip": false, "normalized": false, "rstrip": false, "single_word": false, "special": false}}, "additional_special_tokens": ["<|im_start|>", "<|im_end|>", "<|object_ref_start|>", "<|object_ref_end|>", "<|box_start|>", "<|box_end|>", "<|quad_start|>", "<|quad_end|>", "<|vision_start|>", "<|vision_end|>", "<|vision_pad|>", "<|image_pad|>", "<|video_pad|>"], "bos_token": null, "chat_template": "{%- for message in messages -%}{{- '\n' if not loop.first -}}{{- message['content'] -}}{%- endfor -%}", "clean_up_tokenization_spaces": false, "eos_token": "<|endoftext|>", "errors": "replace", "extra_special_tokens": {}, "model_max_length": 32768, "pad_token": "<|endoftext|>", "split_special_tokens": false, "tokenizer_class": "Qwen2Tokenizer", "unk_token": null}
vocab.json ADDED
The diff for this file is too large to render. See raw diff