|
|
--- |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# Nano语言模型 - 极小模型系列 |
|
|
|
|
|
用于测试的极小语言模型,有以下几个规格。 |
|
|
|
|
|
| Model Name |BlockSize|VocabSize|Layers| Embd |Q_Heads|KV_Heads|Hidden|NormEps| #Param | |
|
|
|-----------------|---------|---------|------|------|-------|--------|------|-------|--------| |
|
|
|Psycho-230k-base | 512 | 4096 | 8 | 32 | 4 | 2 | 96 | 1e-5 | 229920 | |
|
|
|Nano-230k-base | 512 | 4096 | 8 | 32 | 4 | 2 | 96 | 1e-5 | 229920 | |
|
|
|
|
|
使用2025年12月构造的[4096词表](https://github.com/bd4sur/Nano/blob/master/tokenizer/tokenizer_4096.json)。 |
|
|
|
|
|
## 训练参数 |
|
|
|
|
|
Psycho-230k-base |
|
|
|
|
|
``` |
|
|
{ |
|
|
"use_lora": false, |
|
|
"lora_rank": 8, |
|
|
"lora_alpha": 16, |
|
|
"lora_dropout": 0.0, |
|
|
|
|
|
"from_checkpoint": "", |
|
|
"save_checkpoint_to": "/home/bd4sur/ai/Nano/checkpoint", |
|
|
"dataset_path": [ |
|
|
["/home/bd4sur/ai/Nano/dataset_preprocessed/pt_train_0.base64", "/home/bd4sur/ai/Nano/dataset_preprocessed/pt_val_0.base64"] |
|
|
], |
|
|
"tokenizer_path": "/home/bd4sur/ai/Nano/tokenizer/tokenizer_4096.json", |
|
|
|
|
|
"random_seed": 39, |
|
|
"batch_size": 256, |
|
|
"gradient_accumulation_steps": 1, |
|
|
"grad_clip": 1.0, |
|
|
|
|
|
"dropout": 0.0, |
|
|
|
|
|
"learning_rate": 5e-4, |
|
|
"weight_decay": 1e-1, |
|
|
"beta1": 0.9, |
|
|
"beta2": 0.95, |
|
|
|
|
|
"decay_lr": true, |
|
|
"warmup_iters": 500, |
|
|
"lr_decay_iters": 1e9, |
|
|
"min_lr": 6e-5, |
|
|
|
|
|
"eval_interval": 100, |
|
|
"log_interval": 10, |
|
|
"eval_iters": 5, |
|
|
|
|
|
"backend": "nccl", |
|
|
"device": "cuda", |
|
|
"sdp_kernel": "flash", |
|
|
"dtype": "bfloat16", |
|
|
"use_amp": true |
|
|
} |
|
|
``` |
|
|
|
|
|
Nano-230k-base |
|
|
|
|
|
``` |
|
|
{ |
|
|
"use_lora": false, |
|
|
"lora_rank": 8, |
|
|
"lora_alpha": 16, |
|
|
"lora_dropout": 0.0, |
|
|
|
|
|
"from_checkpoint": "", |
|
|
"save_checkpoint_to": "/home/bd4sur/ai/Nano/checkpoint", |
|
|
"dataset_path": [ |
|
|
["/home/bd4sur/ai/Nano/dataset_preprocessed/pt_1Gtk_512_4096_train.base64", "/home/bd4sur/ai/Nano/dataset_preprocessed/pt_1Gtk_512_4096_valid.base64"] |
|
|
], |
|
|
"tokenizer_path": "/home/bd4sur/ai/Nano/tokenizer/tokenizer_4096.json", |
|
|
|
|
|
"random_seed": 39, |
|
|
"batch_size": 256, |
|
|
"gradient_accumulation_steps": 1, |
|
|
"grad_clip": 1.0, |
|
|
|
|
|
"dropout": 0.0, |
|
|
|
|
|
"learning_rate": 5e-4, |
|
|
"weight_decay": 1e-1, |
|
|
"beta1": 0.9, |
|
|
"beta2": 0.95, |
|
|
|
|
|
"decay_lr": true, |
|
|
"warmup_iters": 500, |
|
|
"lr_decay_iters": 1e9, |
|
|
"min_lr": 6e-5, |
|
|
|
|
|
"eval_interval": 100, |
|
|
"log_interval": 10, |
|
|
"eval_iters": 5, |
|
|
|
|
|
"backend": "nccl", |
|
|
"device": "cuda", |
|
|
"sdp_kernel": "flash", |
|
|
"dtype": "bfloat16", |
|
|
"use_amp": true |
|
|
} |
|
|
``` |
|
|
|