Update README.md
Browse files
README.md
CHANGED
|
@@ -28,7 +28,7 @@ tags:
|
|
| 28 |
|
| 29 |
---
|
| 30 |
# Quantization Description
|
| 31 |
-
This model is quantized using *selective quantization* from the Qwen2.5-Coder-0.5B base model to increase its speed while
|
| 32 |
The quantization method included *32-bit* quantization of the following Layers:
|
| 33 |
- q_proj
|
| 34 |
- v_proj
|
|
@@ -54,7 +54,7 @@ Rest of the remaining layers were quantized to *q3_k_l*
|
|
| 54 |
|
| 55 |
---
|
| 56 |
# Model Architect
|
| 57 |
-
|
| 58 |
(model): Qwen2Model(
|
| 59 |
(embed_tokens): Embedding(151936, 896, padding_idx=151665)
|
| 60 |
(layers): ModuleList(
|
|
@@ -80,8 +80,7 @@ Rest of the remaining layers were quantized to *q3_k_l*
|
|
| 80 |
(rotary_emb): LlamaRotaryEmbedding()
|
| 81 |
)
|
| 82 |
(lm_head): Linear(in_features=896, out_features=151936, bias=False)
|
| 83 |
-
)
|
| 84 |
-
|
| 85 |
|
| 86 |
---
|
| 87 |
# Performance & Limitations
|
|
|
|
| 28 |
|
| 29 |
---
|
| 30 |
# Quantization Description
|
| 31 |
+
This model is quantized using *selective quantization* from the Qwen2.5-Coder-0.5B base model to increase its speed while preserving the capabilities in generating relevant and accurate responses related python programming.
|
| 32 |
The quantization method included *32-bit* quantization of the following Layers:
|
| 33 |
- q_proj
|
| 34 |
- v_proj
|
|
|
|
| 54 |
|
| 55 |
---
|
| 56 |
# Model Architect
|
| 57 |
+
Qwen2ForCausalLM(
|
| 58 |
(model): Qwen2Model(
|
| 59 |
(embed_tokens): Embedding(151936, 896, padding_idx=151665)
|
| 60 |
(layers): ModuleList(
|
|
|
|
| 80 |
(rotary_emb): LlamaRotaryEmbedding()
|
| 81 |
)
|
| 82 |
(lm_head): Linear(in_features=896, out_features=151936, bias=False)
|
| 83 |
+
)
|
|
|
|
| 84 |
|
| 85 |
---
|
| 86 |
# Performance & Limitations
|