Delete files token_bytes.pt tokenizer.pkl report.md meta_000650.json with huggingface_hub

Browse files

Files changed (4) hide show

meta_000650.json +0 -14
report.md +0 -269
token_bytes.pt +0 -3
tokenizer.pkl +0 -3

meta_000650.json DELETED Viewed

@@ -1,14 +0,0 @@
-{
-  "step": 650,
-  "val_loss": 1.012525200843811,
-  "mmlu_acc": 0.328125,
-  "arc_easy_acc": 0.4296875,
-  "model_config": {
-    "sequence_len": 2048,
-    "vocab_size": 65536,
-    "n_layer": 20,
-    "n_head": 10,
-    "n_kv_head": 10,
-    "n_embd": 1280
-  }
-}

report.md DELETED Viewed

@@ -1,269 +0,0 @@
-# nanochat training report
-Generated: 2025-10-17 16:50:29
-## Environment
-### Git Information
-- Branch: master
-- Commit: d6d86cb (dirty)
-- Message: update readme with a link to the CPU|MPS branch
-### Hardware
-- Platform: Linux
-- CPUs: 104 cores (104 logical)
-- Memory: 1007.4 GB
-- GPUs: 8x NVIDIA H100 80GB HBM3
-- GPU Memory: 633.7 GB total
-- CUDA Version: 12.8
-- Hourly Rate: $24.00/hour
-### Software
-- Python: 3.12.12
-- PyTorch: 2.9.0+cu128
-### Bloat
-- Characters: 351,931
-- Lines: 8,552
-- Files: 43
-- Tokens (approx): 87,982
-- Dependencies (uv.lock lines): 2,004
-Run started: 2025-10-17 16:50:31
----
-## Tokenizer training
-timestamp: 2025-10-17 16:51:35
-- max_chars: 2,000,000,000
-- doc_cap: 10,000
-- vocab_size: 65,536
-- train_time: 55.2927
-- num_special_tokens: 9
-- token_bytes_min: 1
-- token_bytes_max: 32
-- token_bytes_mean: 6.9197
-- token_bytes_std: 2.8748
-## Tokenizer evaluation
-timestamp: 2025-10-17 16:51:40
-### Comparison with GPT-2
-| Text Type | Bytes | GPT-2 Tokens | GPT-2 Ratio | Ours Tokens | Ours Ratio | Relative Diff % |
-|-----------|-------|--------------|--------------|-------------|------------|-----------------|
-| news | 1819 | 404 | 4.50 | 375 | 4.85 | +7.2% |
-| korean | 893 | 745 | 1.20 | 712 | 1.25 | +4.4% |
-| code | 1259 | 576 | 2.19 | 492 | 2.56 | +14.6% |
-| math | 1834 | 936 | 1.96 | 966 | 1.90 | -3.2% |
-| science | 1112 | 260 | 4.28 | 228 | 4.88 | +12.3% |
-| fwe-train | 4208518 | 900364 | 4.67 | 856883 | 4.91 | +4.8% |
-| fwe-val | 4908443 | 1059062 | 4.63 | 1010352 | 4.86 | +4.6% |
-### Comparison with GPT-4
-| Text Type | Bytes | GPT-4 Tokens | GPT-4 Ratio | Ours Tokens | Ours Ratio | Relative Diff % |
-|-----------|-------|--------------|--------------|-------------|------------|-----------------|
-| news | 1819 | 387 | 4.70 | 375 | 4.85 | +3.1% |
-| korean | 893 | 364 | 2.45 | 712 | 1.25 | -95.6% |
-| code | 1259 | 309 | 4.07 | 492 | 2.56 | -59.2% |
-| math | 1834 | 832 | 2.20 | 966 | 1.90 | -16.1% |
-| science | 1112 | 249 | 4.47 | 228 | 4.88 | +8.4% |
-| fwe-train | 4208518 | 874799 | 4.81 | 856883 | 4.91 | +2.0% |
-| fwe-val | 4908443 | 1029691 | 4.77 | 1010352 | 4.86 | +1.9% |
-## Base model training
-timestamp: 2025-10-17 20:00:41
-- run: nanochat
-- depth: 20
-- max_seq_len: 2048
-- num_iterations: -1
-- target_flops: -1.0000
-- target_param_data_ratio: 20
-- device_batch_size: 32
-- total_batch_size: 524,288
-- embedding_lr: 0.2000
-- unembedding_lr: 0.0040
-- weight_decay: 0.0000
-- matrix_lr: 0.0200
-- grad_clip: 1.0000
-- eval_every: 250
-- eval_tokens: 10,485,760
-- core_metric_every: 2000
-- core_metric_max_per_task: 500
-- sample_every: 2000
-- model_tag:
-- Number of parameters: 560,988,160
-- Number of FLOPs per token: 3.491758e+09
-- Calculated number of iterations: 21,400
-- Number of training tokens: 11,219,763,200
-- Tokens : Params ratio: 20.0000
-- DDP world size: 8
-- warmup_ratio: 0.0000
-- warmdown_ratio: 0.2000
-- final_lr_frac: 0.0000
-- Minimum validation bpb: 0.8118
-- Final validation bpb: 0.8118
-- CORE metric estimate: 0.2232
-- MFU %: 47.92%
-- Total training flops: 3.917670e+19
-- Total training time: 173.10m
-- Peak memory usage: 75422.02MiB
-## Base model loss
-timestamp: 2025-10-17 20:01:31
-- train bpb: 0.8146
-- val bpb: 0.8120
-- sample 0: <|bos|>The capital of France is Paris. Paris is the capital of France. Paris is the capital of France.
-- sample 1: <|bos|>The chemical symbol of gold is Au. The atomic number of gold is 79. The atomic mass of gold
-- sample 2: <|bos|>If yesterday was Friday, then tomorrow will be Tuesday. The day after tomorrow will be Tuesday, and the day after that will
-- sample 3: <|bos|>The opposite of hot is cold. The opposite of cold is hot. The opposite of hot is cold.
-- sample 4: <|bos|>The planets of the solar system are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune,
-- sample 5: <|bos|>My favorite color is red. I love it because it is so bright and it is so easy to
-- sample 6: <|bos|>If 5*x + 3 = 13, then x is the greatest common divisor of 5 and 3. 13 is the
-## Base model evaluation
-timestamp: 2025-10-17 20:04:41
-- Model: base_model (step 21400)
-- CORE metric: 0.2152
-- hellaswag_zeroshot: 0.2618
-- jeopardy: 0.1181
-- bigbench_qa_wikidata: 0.5281
-- arc_easy: 0.5241
-- arc_challenge: 0.1183
-- copa: 0.2000
-- commonsense_qa: 0.1697
-- piqa: 0.3776
-- openbook_qa: 0.1413
-- lambada_openai: 0.3699
-- hellaswag: 0.2624
-- winograd: 0.3260
-- winogrande: 0.0624
-- bigbench_dyck_languages: 0.0990
-- agi_eval_lsat_ar: 0.0978
-- bigbench_cs_algorithms: 0.4265
-- bigbench_operators: 0.1667
-- bigbench_repeat_copy_logic: 0.0000
-- squad: 0.2375
-- coqa: 0.1950
-- boolq: -0.1259
-- bigbench_language_identification: 0.1774
-## Midtraining
-timestamp: 2025-10-17 20:13:13
-- run: nanochat
-- dtype: bfloat16
-- max_seq_len: 2048
-- device_batch_size: 32
-- unembedding_lr: 0.0040
-- embedding_lr: 0.2000
-- matrix_lr: 0.0200
-- init_lr_frac: 1.0000
-- weight_decay: 0.0000
-- eval_every: 150
-- eval_tokens: 10,485,760
-- total_batch_size: 524,288
-- dry_run: 0
-- Number of iterations: 765
-- DDP world size: 8
-- Minimum validation bpb: 0.3952
-## Chat evaluation mid
-timestamp: 2025-10-17 20:18:21
-- source: mid
-- task_name: None
-- dtype: bfloat16
-- temperature: 0.0000
-- max_new_tokens: 512
-- num_samples: 1
-- top_k: 50
-- batch_size: 8
-- model_tag: None
-- step: None
-- max_problems: None
-- ARC-Easy: 0.4116
-- ARC-Challenge: 0.3012
-- MMLU: 0.3284
-- GSM8K: 0.0417
-- HumanEval: 0.0305
-- ChatCORE metric: 0.0921
-## Chat SFT
-timestamp: 2025-10-17 20:20:44
-- run: nanochat
-- source: mid
-- dtype: bfloat16
-- device_batch_size: 4
-- num_epochs: 1
-- max_iterations: -1
-- target_examples_per_step: 32
-- unembedding_lr: 0.0040
-- embedding_lr: 0.2000
-- matrix_lr: 0.0200
-- weight_decay: 0.0000
-- init_lr_frac: 0.0200
-- eval_every: 100
-- eval_steps: 100
-- eval_metrics_every: 200
-- Training rows: 20,843
-- Number of iterations: 651
-- Training loss: 1.1113
-- Validation loss: 1.0125
-## Chat evaluation sft
-timestamp: 2025-10-17 20:25:29
-- source: sft
-- task_name: None
-- dtype: bfloat16
-- temperature: 0.0000
-- max_new_tokens: 512
-- num_samples: 1
-- top_k: 50
-- batch_size: 8
-- model_tag: None
-- step: None
-- max_problems: None
-- ARC-Easy: 0.4360
-- ARC-Challenge: 0.3012
-- MMLU: 0.3322
-- GSM8K: 0.0576
-- HumanEval: 0.0305
-- ChatCORE metric: 0.1028
-## Summary
-- Characters: 351,931
-- Lines: 8,552
-- Files: 43
-- Tokens (approx): 87,982
-- Dependencies (uv.lock lines): 2,004
-| Metric          | BASE     | MID      | SFT      | RL       |
-|-----------------|----------|----------|----------|----------|
-| CORE            | 0.2152   | -        | -        | -        |
-| ARC-Challenge   | -        | 0.3012   | 0.3012   | -        |
-| ARC-Easy        | -        | 0.4116   | 0.4360   | -        |
-| GSM8K           | -        | 0.0417   | 0.0576   | -        |
-| HumanEval       | -        | 0.0305   | 0.0305   | -        |
-| MMLU            | -        | 0.3284   | 0.3322   | -        |
-| ChatCORE        | -        | 0.0921   | 0.1028   | -        |
-Total wall clock time: 3h34m

token_bytes.pt DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:e280877820a90174f3b47bf797b67b9026cd859b7d6d5b7f78e64bcdaca126b4
-size 263721

tokenizer.pkl DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:33f28610ffd37a57d6631f8d7bd91929bd877ae3f4a87dcbdff00b07f6bd7cc3
-size 846092