nanochat training report
Generated: 2025-12-17 09:42:24
Environment
Git Information
- Branch: recursive
- Commit: 427e3f4 (dirty)
- Message: skip identity conv gen if present
Hardware
- Platform: Linux
- CPUs: 128 cores (256 logical)
- Memory: 1511.5 GB
- GPUs: 8x NVIDIA H100 80GB HBM3
- GPU Memory: 632.8 GB total
- CUDA Version: 12.8
- Hourly Rate: $24.00/hour
Software
- Python: 3.10.12
- PyTorch: 2.8.0+cu128
Bloat
- Characters: 464,857
- Lines: 11,239
- Files: 55
- Tokens (approx): 116,214
- Dependencies (uv.lock lines): 2,254
Run started: 2025-12-17 09:42:27
Midtraining
timestamp: 2025-12-17 11:15:09
- run: recursive-d20
- device_type:
- dtype: bfloat16
- num_iterations: -1
- max_seq_len: 2048
- device_batch_size: 4
- unembedding_lr: 0.0040
- embedding_lr: 0.2000
- matrix_lr: 0.0200
- init_lr_frac: 1.0000
- weight_decay: 0.0000
- eval_every: 150
- eval_tokens: 10,485,760
- total_batch_size: 524,288
- dry_run: 0
- Number of iterations: 810
- DDP world size: 8
- Minimum validation bpb: 0.4178
Chat SFT
timestamp: 2025-12-17 12:09:10
- run: recursive-d20
- source: mid
- device_type:
- dtype: bfloat16
- device_batch_size: 4
- num_epochs: 1
- num_iterations: -1
- target_examples_per_step: 32
- unembedding_lr: 0.0040
- embedding_lr: 0.2000
- matrix_lr: 0.0200
- weight_decay: 0.0000
- init_lr_frac: 0.0200
- eval_every: 100
- eval_steps: 100
- eval_metrics_every: 200
- eval_metrics_max_problems: 1024
- Training rows: 22,440
- Number of iterations: 701
- Training loss: 1.4988
- Validation loss: 1.0783
Summary
- Characters: 464,857
- Lines: 11,239
- Files: 55
- Tokens (approx): 116,214
- Dependencies (uv.lock lines): 2,254
| Metric | BASE | MID | SFT | RL |
|---|
Total wall clock time: 2h26m