Update README.md
Browse files
README.md
CHANGED
|
@@ -5,12 +5,13 @@ language:
|
|
| 5 |
tags:
|
| 6 |
- uzbek
|
| 7 |
- english
|
| 8 |
-
- sft
|
| 9 |
- chat
|
| 10 |
- transformers
|
| 11 |
pipeline_tag: text-generation
|
| 12 |
library_name: transformers
|
| 13 |
license: other
|
|
|
|
|
|
|
| 14 |
---
|
| 15 |
|
| 16 |
# NeuronAI-Uzbek
|
|
@@ -26,12 +27,12 @@ NeuronAI-Uzbek is a Qwen3-family causal language model fine-tuned to be helpful
|
|
| 26 |
- **Attention heads**: 32 (KV heads: 8)
|
| 27 |
- **Vocab size**: 180,000
|
| 28 |
- **Max position embeddings**: 40,960 (model config)
|
| 29 |
-
- **Generation defaults**
|
| 30 |
- `temperature=0.6`
|
| 31 |
- `top_p=0.95`
|
| 32 |
- `top_k=20`
|
| 33 |
|
| 34 |
-
Note:
|
| 35 |
|
| 36 |
## Training data (token counts)
|
| 37 |
|
|
@@ -42,7 +43,7 @@ This model was trained on a mixture of:
|
|
| 42 |
|
| 43 |
Total: **2.0B tokens**.
|
| 44 |
|
| 45 |
-
## Training process
|
| 46 |
|
| 47 |
We trained NeuronAI-Uzbek in stages:
|
| 48 |
|
|
@@ -55,17 +56,6 @@ We trained NeuronAI-Uzbek in stages:
|
|
| 55 |
- Continued training / adaptation on the mixed corpus (2.0B tokens total) to improve Uzbek capability while retaining English.
|
| 56 |
|
| 57 |
3. **Supervised fine-tuning (SFT)**
|
| 58 |
-
- Final fine-tuning checkpoint is stored under `runs/honest_sft/final` during training and uploaded here.
|
| 59 |
-
- Key hyperparameters recovered from `training_args.bin`:
|
| 60 |
-
- **Epochs**: 1
|
| 61 |
-
- **Learning rate**: 5e-6
|
| 62 |
-
- **Scheduler**: cosine, **warmup_ratio**: 0.03
|
| 63 |
-
- **Optimizer**: `paged_adamw_8bit`
|
| 64 |
-
- **Per-device train batch size**: 2
|
| 65 |
-
- **Gradient accumulation**: 4
|
| 66 |
-
- **Gradient checkpointing**: enabled
|
| 67 |
-
- **Seed**: 42
|
| 68 |
-
- **bf16**: enabled
|
| 69 |
|
| 70 |
4. **Export**
|
| 71 |
- Exported weights to `safetensors` shards + index.
|
|
@@ -174,4 +164,4 @@ If you use this model, please cite the repository:
|
|
| 174 |
howpublished = {\url{https://huggingface.co/NeuronUz/NeuronAI-Uzbek}},
|
| 175 |
year = {2025}
|
| 176 |
}
|
| 177 |
-
```
|
|
|
|
| 5 |
tags:
|
| 6 |
- uzbek
|
| 7 |
- english
|
|
|
|
| 8 |
- chat
|
| 9 |
- transformers
|
| 10 |
pipeline_tag: text-generation
|
| 11 |
library_name: transformers
|
| 12 |
license: other
|
| 13 |
+
base_model:
|
| 14 |
+
- Qwen/Qwen3-4B
|
| 15 |
---
|
| 16 |
|
| 17 |
# NeuronAI-Uzbek
|
|
|
|
| 27 |
- **Attention heads**: 32 (KV heads: 8)
|
| 28 |
- **Vocab size**: 180,000
|
| 29 |
- **Max position embeddings**: 40,960 (model config)
|
| 30 |
+
- **Generation defaults**
|
| 31 |
- `temperature=0.6`
|
| 32 |
- `top_p=0.95`
|
| 33 |
- `top_k=20`
|
| 34 |
|
| 35 |
+
Note: This model is from the **Qwen3** family and is intended to be used with recent `transformers`.
|
| 36 |
|
| 37 |
## Training data (token counts)
|
| 38 |
|
|
|
|
| 43 |
|
| 44 |
Total: **2.0B tokens**.
|
| 45 |
|
| 46 |
+
## Training process
|
| 47 |
|
| 48 |
We trained NeuronAI-Uzbek in stages:
|
| 49 |
|
|
|
|
| 56 |
- Continued training / adaptation on the mixed corpus (2.0B tokens total) to improve Uzbek capability while retaining English.
|
| 57 |
|
| 58 |
3. **Supervised fine-tuning (SFT)**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
|
| 60 |
4. **Export**
|
| 61 |
- Exported weights to `safetensors` shards + index.
|
|
|
|
| 164 |
howpublished = {\url{https://huggingface.co/NeuronUz/NeuronAI-Uzbek}},
|
| 165 |
year = {2025}
|
| 166 |
}
|
| 167 |
+
```
|