Nano语言模型 - 极小模型系列
用于测试的极小语言模型,有以下几个规格。
| Model Name | BlockSize | VocabSize | Layers | Embd | Q_Heads | KV_Heads | Hidden | NormEps | #Param |
|---|---|---|---|---|---|---|---|---|---|
| Psycho-230k-base | 512 | 4096 | 8 | 32 | 4 | 2 | 96 | 1e-5 | 229920 |
| Nano-230k-base | 512 | 4096 | 8 | 32 | 4 | 2 | 96 | 1e-5 | 229920 |
使用2025年12月构造的4096词表。
训练参数
Psycho-230k-base
{
"use_lora": false,
"lora_rank": 8,
"lora_alpha": 16,
"lora_dropout": 0.0,
"from_checkpoint": "",
"save_checkpoint_to": "/home/bd4sur/ai/Nano/checkpoint",
"dataset_path": [
["/home/bd4sur/ai/Nano/dataset_preprocessed/pt_train_0.base64", "/home/bd4sur/ai/Nano/dataset_preprocessed/pt_val_0.base64"]
],
"tokenizer_path": "/home/bd4sur/ai/Nano/tokenizer/tokenizer_4096.json",
"random_seed": 39,
"batch_size": 256,
"gradient_accumulation_steps": 1,
"grad_clip": 1.0,
"dropout": 0.0,
"learning_rate": 5e-4,
"weight_decay": 1e-1,
"beta1": 0.9,
"beta2": 0.95,
"decay_lr": true,
"warmup_iters": 500,
"lr_decay_iters": 1e9,
"min_lr": 6e-5,
"eval_interval": 100,
"log_interval": 10,
"eval_iters": 5,
"backend": "nccl",
"device": "cuda",
"sdp_kernel": "flash",
"dtype": "bfloat16",
"use_amp": true
}
Nano-230k-base
{
"use_lora": false,
"lora_rank": 8,
"lora_alpha": 16,
"lora_dropout": 0.0,
"from_checkpoint": "",
"save_checkpoint_to": "/home/bd4sur/ai/Nano/checkpoint",
"dataset_path": [
["/home/bd4sur/ai/Nano/dataset_preprocessed/pt_1Gtk_512_4096_train.base64", "/home/bd4sur/ai/Nano/dataset_preprocessed/pt_1Gtk_512_4096_valid.base64"]
],
"tokenizer_path": "/home/bd4sur/ai/Nano/tokenizer/tokenizer_4096.json",
"random_seed": 39,
"batch_size": 256,
"gradient_accumulation_steps": 1,
"grad_clip": 1.0,
"dropout": 0.0,
"learning_rate": 5e-4,
"weight_decay": 1e-1,
"beta1": 0.9,
"beta2": 0.95,
"decay_lr": true,
"warmup_iters": 500,
"lr_decay_iters": 1e9,
"min_lr": 6e-5,
"eval_interval": 100,
"log_interval": 10,
"eval_iters": 5,
"backend": "nccl",
"device": "cuda",
"sdp_kernel": "flash",
"dtype": "bfloat16",
"use_amp": true
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support