kmamaroziqov commited on
Commit
86b5842
·
verified ·
1 Parent(s): 0322d7f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -16
README.md CHANGED
@@ -5,12 +5,13 @@ language:
5
  tags:
6
  - uzbek
7
  - english
8
- - sft
9
  - chat
10
  - transformers
11
  pipeline_tag: text-generation
12
  library_name: transformers
13
  license: other
 
 
14
  ---
15
 
16
  # NeuronAI-Uzbek
@@ -26,12 +27,12 @@ NeuronAI-Uzbek is a Qwen3-family causal language model fine-tuned to be helpful
26
  - **Attention heads**: 32 (KV heads: 8)
27
  - **Vocab size**: 180,000
28
  - **Max position embeddings**: 40,960 (model config)
29
- - **Generation defaults** (from `generation_config.json`)
30
  - `temperature=0.6`
31
  - `top_p=0.95`
32
  - `top_k=20`
33
 
34
- Note: the original base checkpoint name was not saved in `config.json` (`_name_or_path` is `null`). This model is from the **Qwen3** family and is intended to be used with recent `transformers`.
35
 
36
  ## Training data (token counts)
37
 
@@ -42,7 +43,7 @@ This model was trained on a mixture of:
42
 
43
  Total: **2.0B tokens**.
44
 
45
- ## Training process (high-level)
46
 
47
  We trained NeuronAI-Uzbek in stages:
48
 
@@ -55,17 +56,6 @@ We trained NeuronAI-Uzbek in stages:
55
  - Continued training / adaptation on the mixed corpus (2.0B tokens total) to improve Uzbek capability while retaining English.
56
 
57
  3. **Supervised fine-tuning (SFT)**
58
- - Final fine-tuning checkpoint is stored under `runs/honest_sft/final` during training and uploaded here.
59
- - Key hyperparameters recovered from `training_args.bin`:
60
- - **Epochs**: 1
61
- - **Learning rate**: 5e-6
62
- - **Scheduler**: cosine, **warmup_ratio**: 0.03
63
- - **Optimizer**: `paged_adamw_8bit`
64
- - **Per-device train batch size**: 2
65
- - **Gradient accumulation**: 4
66
- - **Gradient checkpointing**: enabled
67
- - **Seed**: 42
68
- - **bf16**: enabled
69
 
70
  4. **Export**
71
  - Exported weights to `safetensors` shards + index.
@@ -174,4 +164,4 @@ If you use this model, please cite the repository:
174
  howpublished = {\url{https://huggingface.co/NeuronUz/NeuronAI-Uzbek}},
175
  year = {2025}
176
  }
177
- ```
 
5
  tags:
6
  - uzbek
7
  - english
 
8
  - chat
9
  - transformers
10
  pipeline_tag: text-generation
11
  library_name: transformers
12
  license: other
13
+ base_model:
14
+ - Qwen/Qwen3-4B
15
  ---
16
 
17
  # NeuronAI-Uzbek
 
27
  - **Attention heads**: 32 (KV heads: 8)
28
  - **Vocab size**: 180,000
29
  - **Max position embeddings**: 40,960 (model config)
30
+ - **Generation defaults**
31
  - `temperature=0.6`
32
  - `top_p=0.95`
33
  - `top_k=20`
34
 
35
+ Note: This model is from the **Qwen3** family and is intended to be used with recent `transformers`.
36
 
37
  ## Training data (token counts)
38
 
 
43
 
44
  Total: **2.0B tokens**.
45
 
46
+ ## Training process
47
 
48
  We trained NeuronAI-Uzbek in stages:
49
 
 
56
  - Continued training / adaptation on the mixed corpus (2.0B tokens total) to improve Uzbek capability while retaining English.
57
 
58
  3. **Supervised fine-tuning (SFT)**
 
 
 
 
 
 
 
 
 
 
 
59
 
60
  4. **Export**
61
  - Exported weights to `safetensors` shards + index.
 
164
  howpublished = {\url{https://huggingface.co/NeuronUz/NeuronAI-Uzbek}},
165
  year = {2025}
166
  }
167
+ ```