datasysdev commited on
Commit
eaa45c8
·
verified ·
1 Parent(s): 3316bf4

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +16 -0
README.md CHANGED
@@ -118,3 +118,19 @@ Based on:
118
  ## Source Code
119
 
120
  [unixsysdev/helm (h200-optimizations branch)](https://github.com/unixsysdev/helm/tree/h200-optimizations)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
118
  ## Source Code
119
 
120
  [unixsysdev/helm (h200-optimizations branch)](https://github.com/unixsysdev/helm/tree/h200-optimizations)
121
+
122
+ ## Roadmap: 1.37B Pretraining
123
+
124
+ The 130M checkpoints in this repo are seeds for the **1.37B HELM-D** model (L24W1536A24), upscaled via Network Morphism:
125
+
126
+ 1. **Width 384→1536**: Zero-pad Lorentz spatial dims (manifold constraint preserved exactly)
127
+ 2. **Depth 6→24 layers**: Interleaved cloning — repeats the full 6-layer pipeline 4× with residual scaling
128
+ 3. **All linear weights**: Top-left corner placement in expanded matrices, remainder N(0, 0.001)
129
+
130
+ The 1.37B model is currently training on **2B tokens from FineWeb-Edu** on a single NVIDIA H200.
131
+
132
+ ### Next Steps
133
+
134
+ - **KL divergence distillation** from Qwen3-30B using Nebius SWE-agent trajectories (80K agentic tool-use sequences)
135
+ - **Context extension** to 128K via NTK-RoPE scaling
136
+ - **Fine-tuning** on agentic coding trajectories for downstream tool-use tasks