ddddamn commited on
Commit
35f7607
·
verified ·
1 Parent(s): 2a8571b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -0
README.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: meta-llama/Llama-3.1-8B
4
+ tags:
5
+ - sequence-compression
6
+ - kv-cache
7
+ - long-context
8
+ - efficiency
9
+ metrics:
10
+ - perplexity
11
+ ---
12
+
13
+
14
+
15
+ ![cell_vs_llama](https://cdn-uploads.huggingface.co/production/uploads/6891bed1f76477f415c0eaa6/yA9h2Pjb3ysk8M27Eg91d.png)
16
+
17
+
18
+ # IronCell — Mark 1: Technical Brief
19
+
20
+ **GitHub Repository:** [gaoang1111/IronMan](https://github.com/gaoang1111/IronMan)
21
+ **Checkpoints:** [HuggingFace - IronCell-Mark-1](https://huggingface.co/ddddamn/IronCell-Mark-1)
22
+ **Training Logs:** [WandB Overview](https://wandb.ai/gaoang001111-none/IronMan/overview)
23
+
24
+ ---
25
+
26
+ ## Core Efficiency Metrics
27
+
28
+ | Metric | Value / Performance |
29
+ | :--- | :--- |
30
+ | **VRAM Footprint** | **Reduced by 93.75%** (Requirement down to 6.25%) |
31
+ | **Logic Integrity (PPL)** | **11.20** (FineWeb Zero-Overlap) |
32
+ | **Baseline (Llama 3.1 8B)** | 7.40 PPL |
33
+
34
+ > **The Verdict:** This represents a marginal increase in perplexity exchanged for an impossible context capacity on consumer-grade GPUs.
35
+
36
+ ---
37
+
38
+ ## Cellular Differentiation Theory
39
+
40
+ The project views a pre-trained LLM as a powerful but rigid "state machine" and treats the homologous base (Llama 3.1 8B) as a "stem cell". Through induced functional differentiation, the model is split into collaborating units:
41
+
42
+ * **Compressor (`cmp`):** Specialized in distilling raw text chunks into dense semantic latent vectors.
43
+ * **Generator (`gen`):** A causal language model trained to reconstruct and reason based on these compressed vectors.
44
+ * **Projector (`proj`):** A linear mapping that translates compressor hidden states into the generator's hidden space.
45
+
46
+ ---
47
+
48
+ ## Zipper Layout (Masked Parallel Training)
49
+
50
+ To achieve **16:1** sequence compression, IronCell utilizes a "control chain + raw chunks" layout:
51
+
52
+ 1. **Structural Chain:** Formatted as `[<bos>][<soc>] V-1 [<eoc>] V0 [<eoc>] V1 [<eoc>] ... [Raw_Token chunks]`
53
+ 2. **Zipper (Staircase) Mask:** A custom attention mask ensures each raw segment only attends to its permitted control tokens, maintaining causal integrity without information leakage.
54
+
55
+ ---
56
+
57
+ ## Training & Reproducibility
58
+
59
+ The entire differentiation process is reproducible in an afternoon (**~5 hours**) using an **8×A800** node.
60
+
61
+ ### Phase 1: Alignment
62
+ * **Objective:** Only the projector and new special tokens are trained.
63
+ * **Performance:** Aligns the compressed signal as loss dropped from 12.8 to 4.12 in ~20 steps.
64
+
65
+ ### Phase 2: Differentiation
66
+ * **Objective:** Model weights are unfrozen with **L2 regularization**.
67
+ * **Performance:** Resulting in a steady eval loss decline from 2.72 to 2.41.
68
+
69
+ ---
70
+
71
+ ## Data Specifications
72
+
73
+ * **Source:** FineWeb-Edu (HuggingFace).
74
+ * **Scale:** Phase 2 uses 10,000 samples.
75
+ * **Length:** Individual string lengths ranging from 10k to 30k characters.
76
+ * **Protocol:** A **zero-overlap** sampling strategy was maintained within the first 150 training steps.