RonanMcGovern's picture
Upload via push_to_hf.py
e7fcc10 verified

nanochat training report

Generated: 2025-12-17 09:42:24

Environment

Git Information

  • Branch: recursive
  • Commit: 427e3f4 (dirty)
  • Message: skip identity conv gen if present

Hardware

  • Platform: Linux
  • CPUs: 128 cores (256 logical)
  • Memory: 1511.5 GB
  • GPUs: 8x NVIDIA H100 80GB HBM3
  • GPU Memory: 632.8 GB total
  • CUDA Version: 12.8
  • Hourly Rate: $24.00/hour

Software

  • Python: 3.10.12
  • PyTorch: 2.8.0+cu128

Bloat

  • Characters: 464,857
  • Lines: 11,239
  • Files: 55
  • Tokens (approx): 116,214
  • Dependencies (uv.lock lines): 2,254

Run started: 2025-12-17 09:42:27


Midtraining

timestamp: 2025-12-17 11:15:09

  • run: recursive-d20
  • device_type:
  • dtype: bfloat16
  • num_iterations: -1
  • max_seq_len: 2048
  • device_batch_size: 4
  • unembedding_lr: 0.0040
  • embedding_lr: 0.2000
  • matrix_lr: 0.0200
  • init_lr_frac: 1.0000
  • weight_decay: 0.0000
  • eval_every: 150
  • eval_tokens: 10,485,760
  • total_batch_size: 524,288
  • dry_run: 0
  • Number of iterations: 810
  • DDP world size: 8
  • Minimum validation bpb: 0.4178

Chat SFT

timestamp: 2025-12-17 12:09:10

  • run: recursive-d20
  • source: mid
  • device_type:
  • dtype: bfloat16
  • device_batch_size: 4
  • num_epochs: 1
  • num_iterations: -1
  • target_examples_per_step: 32
  • unembedding_lr: 0.0040
  • embedding_lr: 0.2000
  • matrix_lr: 0.0200
  • weight_decay: 0.0000
  • init_lr_frac: 0.0200
  • eval_every: 100
  • eval_steps: 100
  • eval_metrics_every: 200
  • eval_metrics_max_problems: 1024
  • Training rows: 22,440
  • Number of iterations: 701
  • Training loss: 1.4988
  • Validation loss: 1.0783

Summary

  • Characters: 464,857
  • Lines: 11,239
  • Files: 55
  • Tokens (approx): 116,214
  • Dependencies (uv.lock lines): 2,254
Metric BASE MID SFT RL

Total wall clock time: 2h26m