CLM
Collection
anima consciousness model. β’ 3 items β’ Updated
One-line summary: Lane-G-ref Β· substrate = PyTorch-CUDA Β· 7.25B-param byte-level GPT REFERENCE rung (bounded-budget, NOT converged).
What this checkpoint is and how it was produced.
This is the 7.25B-parameter rung of the anima Lane-G reference ladder (85.6M β 3.149B β 7.25B). It is a byte-level (V=256) decoder-only GPT (Llama-7B-ish shape adapted to byte vocab) trained from scratch with PyTorch + CUDA AMP/bf16 + gradient checkpointing + 8-bit AdamW on a single H100 80GB.
dancinlab/clm-backbone-5lang-sample (same 5-lang c4 backbone as the 85.6M PUBLIC ref and the 3.149B ref), flattened to a UTF-8 byte stream β 6,553,600 tokens seenclm_ref_pytorch_cuda_7b.py (PyTorch-CUDA, CUDA-required) β included in this repolane-g/d768-cuda-fire Β· Lane-G-ref ladder 85.6M (dancinlab/clm-v1-ref-pytorch-cuda) β 3.149B (dancinlab/clm-v1-ref-pytorch-cuda-3b) β 7.25B (this repo)clm_ref_7b_train.log.json)
| metric | value | verdict |
|---|---|---|
| descent | val_CE 5.360630989 β 2.412078857 (F_CLM_REF_7B_DESCENT=1) | π’ PASS |
| GPU util | PEAK 100.0% Β· MEAN 99.1788990825688% (n=436) | π’ PASS (β«20%) |
| throughput | 7406.1 tok/s final Β· 6,553,600 tok seen | β |
| mem peak | 46,025 MiB | β |
| power mean | 651.3842201834855 W | β |
| wall | 884.9 s | β |
Closure = PASS (descent π’ AND util π’) β this reference rung is PUBLIC. It is still NOT converged (bounded 400 steps); do not deploy.
Concrete reproducible tests this checkpoint passes.
Hardware / software / data dependencies.
| field | value |
|---|---|
| vocab | 256 (byte-level) |
| d_model | 4096 |
| n_layer | 36 |
| n_head | 32 (head_dim 128) |
| block | 512 |
| params | 7,252,828,160 (7.25B) |
| dtype | bf16 (master weights + grads) |
| optimizer | bitsandbytes AdamW8bit (8-bit states) |
| grad_ckpt | true |
Honest limitations (raw#10).
a_train_flame_forge. This torch trainer is an a_completeness_over_cheap optional baseline/reference, never the primary, and never claimed as the forge artifact.a_lane_akida_gpu_split). This is a pure GPU (Lane-G) reference; AKIDA on-chip (Lane A) results are tracked separately and must never be blended into one verdict.a_scale_honest_scope): a single 400-step rung is not a convergence or generalization claim; it is the 7.25B point of the 85.6Mβ3.149Bβ7.25B reference ladder.dancinlab/clm-v1-ref-pytorch-cuda) and 3.149B (dancinlab/clm-v1-ref-pytorch-cuda-3b) ladder rungs β directly comparable scale points (only d_model / n_layer / n_head differ).clm_ref_pytorch_cuda_7b.py) + corpus-prep (prep_corpus_7b.py) are included so the rung is fully reproducible.clm_ref_pytorch_cuda_7b.pt β model state_dict + config (bf16); sha256 38ef2ed55b47b670fa915bba0c2827782799a9070ba087210cd44db1fddb4d41, 14,505,817,922 bytesclm_ref_7b_train.log.json β full training curve + util/throughputclm_ref_pytorch_cuda_7b.py β the trainer (PyTorch-CUDA, CUDA-required)prep_corpus_7b.py β corpus prep (5-lang c4 backbone β byte stream)