Trelis
/

nanochat-recursive

Model card Files Files and versions

nanochat-recursive / report /latest /report.md

RonanMcGovern's picture

Upload via push_to_hf.py

e7fcc10 verified 5 months ago

|

history blame contribute delete

1.86 kB

nanochat training report

Generated: 2025-12-17 09:42:24

Environment

Git Information

Branch: recursive
Commit: 427e3f4 (dirty)
Message: skip identity conv gen if present

Hardware

Platform: Linux
CPUs: 128 cores (256 logical)
Memory: 1511.5 GB
GPUs: 8x NVIDIA H100 80GB HBM3
GPU Memory: 632.8 GB total
CUDA Version: 12.8
Hourly Rate: $24.00/hour

Software

Python: 3.10.12
PyTorch: 2.8.0+cu128

Bloat

Characters: 464,857
Lines: 11,239
Files: 55
Tokens (approx): 116,214
Dependencies (uv.lock lines): 2,254

Run started: 2025-12-17 09:42:27

Midtraining

timestamp: 2025-12-17 11:15:09

run: recursive-d20
device_type:
dtype: bfloat16
num_iterations: -1
max_seq_len: 2048
device_batch_size: 4
unembedding_lr: 0.0040
embedding_lr: 0.2000
matrix_lr: 0.0200
init_lr_frac: 1.0000
weight_decay: 0.0000
eval_every: 150
eval_tokens: 10,485,760
total_batch_size: 524,288
dry_run: 0
Number of iterations: 810
DDP world size: 8
Minimum validation bpb: 0.4178

Chat SFT

timestamp: 2025-12-17 12:09:10

run: recursive-d20
source: mid
device_type:
dtype: bfloat16
device_batch_size: 4
num_epochs: 1
num_iterations: -1
target_examples_per_step: 32
unembedding_lr: 0.0040
embedding_lr: 0.2000
matrix_lr: 0.0200
weight_decay: 0.0000
init_lr_frac: 0.0200
eval_every: 100
eval_steps: 100
eval_metrics_every: 200
eval_metrics_max_problems: 1024
Training rows: 22,440
Number of iterations: 701
Training loss: 1.4988
Validation loss: 1.0783

Summary

Characters: 464,857
Lines: 11,239
Files: 55
Tokens (approx): 116,214
Dependencies (uv.lock lines): 2,254

Metric	BASE	MID	SFT	RL

Total wall clock time: 2h26m