Upload logs/train_output.log with huggingface_hub
Browse files- logs/train_output.log +61 -0
logs/train_output.log
ADDED
|
@@ -0,0 +1,61 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
============================================================
|
| 2 |
+
CogNet-1B Ultra-Fast Training V2 β MAXIMUM SPEED
|
| 3 |
+
============================================================
|
| 4 |
+
Device: cuda:0
|
| 5 |
+
Distributed: False (world_size=1)
|
| 6 |
+
Model: 350m
|
| 7 |
+
BF16: True
|
| 8 |
+
Compile: False
|
| 9 |
+
Compile step: False
|
| 10 |
+
CUDA prefetch: False
|
| 11 |
+
Seq warmup: False
|
| 12 |
+
Async checkpoint: False
|
| 13 |
+
8-bit optimizer: True
|
| 14 |
+
TF32 enabled: True
|
| 15 |
+
HF repo: thefinalboss/CogNet-1B
|
| 16 |
+
HF token: SET
|
| 17 |
+
============================================================
|
| 18 |
+
Loaded tokenizer from /root/cognet-1b/tokenizer_v3.json (vocab=136)
|
| 19 |
+
Skipping data preparation (--skip-data-prep)
|
| 20 |
+
Loading data from: /root/cognet-1b/data_1b/train_merged.pt
|
| 21 |
+
|
| 22 |
+
Building CogNet-350M (optimized)...
|
| 23 |
+
Total parameters: 304,232,960 (0.30B)
|
| 24 |
+
8-bit AdamW (bitsandbytes) enabled β 50% less VRAM for optimizer states
|
| 25 |
+
Mixed precision: BF16
|
| 26 |
+
|
| 27 |
+
Starting: step 0 -> 100000
|
| 28 |
+
Batch=4 x GradAccum=8 x GPUs=1 = Effective 32
|
| 29 |
+
SeqLen=512, LR=1e-05-0.0003
|
| 30 |
+
TF32=ON, Gradient checkpointing=True
|
| 31 |
+
Graceful shutdown: SIGTERM/SIGINT will save checkpoint
|
| 32 |
+
|
| 33 |
+
[BENCH] Un benchmark de 10 steps va mesurer la vitesse rΓ©elle...
|
| 34 |
+
|
| 35 |
+
============================================================
|
| 36 |
+
BENCHMARK β Mesure des performances rΓ©elles
|
| 37 |
+
============================================================
|
| 38 |
+
Warmup: 3 steps
|
| 39 |
+
Mesure: 10 steps
|
| 40 |
+
Config: batch=4, grad_accum=8, seq_len=512
|
| 41 |
+
Warmup terminΓ© β dΓ©but de la mesure...
|
| 42 |
+
|
| 43 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 44 |
+
β RΓSULTATS DU BENCHMARK β
|
| 45 |
+
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
|
| 46 |
+
β 0.10 steps/sec (optimizer steps) β
|
| 47 |
+
β 1581 tokens/sec β
|
| 48 |
+
β 103.62 sec pour 10 steps β
|
| 49 |
+
β 3.2 GB VRAM utilisΓ© β
|
| 50 |
+
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
|
| 51 |
+
β Temps estimΓ© pour 100,000 steps restants β
|
| 52 |
+
β ~ 287.8 heures (12.0 jours) β
|
| 53 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 54 |
+
============================================================
|
| 55 |
+
|
| 56 |
+
Benchmark sauvΓ©: /root/cognet-1b/checkpoints_1b/benchmark_results.json
|
| 57 |
+
Step 0/100000 | Loss: 3.3116 | PPL: 27.4 | LR: 0.00e+00 | Grad: 2.75 | VRAM: 3.2GB | 1378 tok/s | 0.1 step/s | ETA: 12.0j
|
| 58 |
+
Step 10/100000 | Loss: 3.2792 | PPL: 26.6 | LR: 1.50e-06 | Grad: 2.48 | VRAM: 3.2GB | 1583 tok/s | 0.1 step/s | ETA: 12.0j
|
| 59 |
+
Step 20/100000 | Loss: 3.2696 | PPL: 26.3 | LR: 3.00e-06 | Grad: 1.62 | VRAM: 3.2GB | 1585 tok/s | 0.1 step/s | ETA: 12.0j
|
| 60 |
+
Step 30/100000 | Loss: 3.2555 | PPL: 25.9 | LR: 4.50e-06 | Grad: 0.64 | VRAM: 3.2GB | 1568 tok/s | 0.1 step/s | ETA: 12.0j
|
| 61 |
+
Step 40/100000 | Loss: 3.2414 | PPL: 25.6 | LR: 6.00e-06 | Grad: 0.81 | VRAM: 3.2GB | 1590 tok/s | 0.1 step/s | ETA: 12.0j
|