OpenTransformer commited on
Commit
7fdbcab
·
0 Parent(s):

Super-squash branch 'main' using huggingface_hub

Browse files
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AGILLM-3.5
2
+
3
+ 698M-param decoder LM (d=1024, 24 layers, 16 heads, rank=128, expansion 2.0×), DeepSeek-V3.2
4
+ tokenizer, AR + SAT (speculative) heads, trained with **DiffusionBlocks** (block-wise EDM
5
+ denoising, 8 blocks). Forked from the AGILLM-3 ~step-51081 base.
6
+
7
+ ## Checkpoints
8
+
9
+ ### `distributed/` — the live distributed model
10
+ `master_round244.pt` — trained **block-disaggregated across 4 Hetzner nodes** (GETH +
11
+ MCP + PRIME + COMMUNIST-WEB). GETH coordinates and trains blocks 0,1,3,5,7 locally; MCP/PRIME/
12
+ COMMUNIST-WEB train blocks 2/4/6 over the private network. Each round exports block slices,
13
+ trains them independently, and merges them back. **739 merged block-updates** at snapshot.
14
+ Single full file per snapshot (each round is a block merge, not a delta).
15
+
16
+ ### `single_node/` — single-node dblock lineage (full + delta)
17
+ - `pretrain_step00053908.pt` — full checkpoint (7.3 GB)
18
+ - `pretrain_delta_step00053702.pt` (+ `.sha256`) — delta checkpoint (2.8 GB)
19
+
20
+ Checkpoint dict keys: `core` (backbone), `ar`, `sat` (heads), `cfg`, embedded
21
+ `tokenizer_json`, plus `disagg_updates` (merge provenance) on the distributed master.
22
+
23
+ ## Inference
24
+ Load with the AGILLM nB300 code (`infer --mode ar|sat`); the tokenizer round-trips from the
25
+ embedded `tokenizer_json`.
distributed/master_latest.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6fde4b297b70b8aa6ed240268214843a9ff6f2767ca1f3c9a8b68723e8a5563e
3
+ size 6569221865
single_node/pretrain_delta_step00053702.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:32a9390cfaeb1908eef98c75ca49babd8eda4eba2116c233f9023dd82a077eca
3
+ size 2793593611
single_node/pretrain_delta_step00053702.sha256 ADDED
@@ -0,0 +1 @@
 
 
1
+ 32a9390cfaeb1908eef98c75ca49babd8eda4eba2116c233f9023dd82a077eca pretrain_delta_step00053702.pt
single_node/pretrain_step00053908.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c207fe74dedd0e132c87453b55dbf0777a97249250b9105496c406ad99f43932
3
+ size 7328309140