hunterbown commited on
Commit
f0f16e7
·
verified ·
1 Parent(s): 1521fa2

docs: publish clean model card (validation + control telemetry)

Browse files
Files changed (1) hide show
  1. README.md +32 -16
README.md CHANGED
@@ -20,27 +20,40 @@ inference: false
20
 
21
  # Shannon Control Unit (SCU) — Cruise Control for LLM Training
22
 
23
- [![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
24
  [![Patent Pending](https://img.shields.io/badge/Patent-Pending-orange.svg)](https://shannonlabs.dev)
25
  [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97-Models-yellow)](https://huggingface.co/hunterbown/shannon-control-unit)
26
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/hmbown/shannon-control-unit/blob/main/notebooks/SCU_Demo.ipynb)
27
  [![Website](https://img.shields.io/badge/Website-shannonlabs.dev-green)](https://shannonlabs.dev)
28
 
 
 
29
  **Like cruise control maintains your speed regardless of hills, SCU maintains optimal regularization regardless of data complexity.**
30
 
31
  Set your target information ratio \( S^* \), and our PI controller automatically adjusts \( \lambda \) to maintain it throughout training. No manual hyperparameter tuning required.
32
 
33
  **Validated Results:**
34
- - **Llama-3.2-1B:** Base 3.920 BPT → SCU 3.676 BPT (15.6% lower perplexity, 6.2% lower BPT)
35
- - **🎯 Llama-3.2-3B:** Base 1.8295 BPT SCU 1.6351 BPT (10.6% lower BPT)
36
- - **Production ready:** Seeking partnerships for 7B+ scale validation
 
 
 
 
 
 
 
 
37
 
38
  ## Available Models
39
 
40
- - **Main directory**: Llama-3.2-1B SCU adapter (validated, S=1.0%)
41
- - **1b-scu/**: Same as main (Llama-3.2-1B SCU, S=1.0%, λ adaptive)
42
- - **3b-scu/**: Llama-3.2-3B SCU adapter (S=2.88%, λ=2.61)
43
- - **3b-fixed/**: Llama-3.2-3B fixed λ=0.5 (S=3.35%)
 
 
 
 
44
 
45
  ![Validation: Base vs SCU](assets/figures/validation_delta.png)
46
 
@@ -74,7 +87,7 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
74
  from peft import PeftModel
75
  import torch
76
 
77
- # For 1B model (recommended - validated with 6.2% improvement)
78
  base_id = "meta-llama/Llama-3.2-1B" # accept terms on HF first
79
  base = AutoModelForCausalLM.from_pretrained(base_id, device_map="auto", torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32)
80
  tok = AutoTokenizer.from_pretrained(base_id)
@@ -89,28 +102,31 @@ model = PeftModel.from_pretrained(base, "hunterbown/shannon-control-unit")
89
  # model = PeftModel.from_pretrained(base, "hunterbown/shannon-control-unit", subfolder="3b-scu")
90
  ```
91
 
92
- **Demo notebook:** [Open in Colab](https://huggingface.co/hunterbown/shannon-control-unit/blob/main/notebooks/SCU_Demo.ipynb) (hosted on HuggingFace)
93
 
94
  ---
95
 
96
  ## How It Works (Cruise Control Analogy)
97
 
98
  Just like cruise control in your car:
99
- - **You set the target:** Choose your information ratio $S^*$ (typically 1.0%)
100
  - **SCU maintains it automatically:** PI controller adjusts $\lambda$ in real-time
101
  - **No manual intervention:** Works across data distribution shifts and training dynamics
102
 
103
  **Technical Details:**
104
  - **Control variable:** $S=\frac{\text{ParamBPT}}{\text{DataBPT}+\text{ParamBPT}}$
105
- - **Control law:** $\lambda \leftarrow \lambda \cdot \exp(-(K_p\,\text{error}+K_i\,I))$
106
  - **Result:** Automatic regularization without hyperparameter sweeps
107
 
 
 
 
108
  ---
109
 
110
  ## Licensing & IP
111
 
112
- * **Adapters/models:** Meta **Llama 3.2** Community License
113
- * **SCU training code:** **Apache-2.0**
114
- * **IP status:** U.S. **patent pending** (provisional filed September 2025)
115
 
116
  > Repro tips: block size 1024, batch 1, grad-accum 4, gradient checkpointing on, `use_cache=False`.
 
20
 
21
  # Shannon Control Unit (SCU) — Cruise Control for LLM Training
22
 
 
23
  [![Patent Pending](https://img.shields.io/badge/Patent-Pending-orange.svg)](https://shannonlabs.dev)
24
  [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97-Models-yellow)](https://huggingface.co/hunterbown/shannon-control-unit)
25
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hmbown/shannon-control-unit/blob/main/notebooks/SCU_Demo.ipynb)
26
  [![Website](https://img.shields.io/badge/Website-shannonlabs.dev-green)](https://shannonlabs.dev)
27
 
28
+ **Model Weights:** Llama 3.2 Community License | **Code:** Apache-2.0 ([GitHub](https://github.com/Hmbown/shannon-control-unit))
29
+
30
  **Like cruise control maintains your speed regardless of hills, SCU maintains optimal regularization regardless of data complexity.**
31
 
32
  Set your target information ratio \( S^* \), and our PI controller automatically adjusts \( \lambda \) to maintain it throughout training. No manual hyperparameter tuning required.
33
 
34
  **Validated Results:**
35
+
36
+ | Model | Metric | Baseline | SCU | Improvement |
37
+ |-------|--------|----------|-----|-------------|
38
+ | **Llama-3.2-1B** | BPT | 3.920 | 3.676 | **-6.2%** |
39
+ | | Perplexity | 15.14 | 12.78 | **-15.6%** |
40
+ | **Llama-3.2-3B** 🎯 | BPT | 1.830 | 1.635 | **-10.6%** |
41
+ | | Perplexity | 3.56 | 3.11 | **-12.6%** |
42
+
43
+ **Status:** Validated at 1B/3B scales | Seeking partners for 7B+ external validation
44
+
45
+ [View validation artifacts](./3b_validation_results.json) | [Evaluation protocol](./scripts/eval_bpt.py)
46
 
47
  ## Available Models
48
 
49
+ | Directory | Model | S* Target | λ Control | Notes |
50
+ |-----------|-------|-----------|-----------|-------|
51
+ | **main** | Llama-3.2-1B | 1.0% | Adaptive PI | Primary validated model |
52
+ | **1b-scu/** | Llama-3.2-1B | 1.0% | Adaptive PI | Same as main |
53
+ | **3b-scu/** | Llama-3.2-3B | 2.88% | Adaptive (λ=2.61) | Best 3B performance |
54
+ | **3b-fixed/** | Llama-3.2-3B | 3.35% | Fixed λ=0.5 | Ablation study |
55
+
56
+ **Note:** HuggingFace UI shows only the root 1B model. Load 3B models using `subfolder="3b-scu"` parameter in code.
57
 
58
  ![Validation: Base vs SCU](assets/figures/validation_delta.png)
59
 
 
87
  from peft import PeftModel
88
  import torch
89
 
90
+ # For 1B model (validated with 6.2% BPT improvement)
91
  base_id = "meta-llama/Llama-3.2-1B" # accept terms on HF first
92
  base = AutoModelForCausalLM.from_pretrained(base_id, device_map="auto", torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32)
93
  tok = AutoTokenizer.from_pretrained(base_id)
 
102
  # model = PeftModel.from_pretrained(base, "hunterbown/shannon-control-unit", subfolder="3b-scu")
103
  ```
104
 
105
+ **Demo notebook:** [Open in Colab](https://colab.research.google.com/github/Hmbown/shannon-control-unit/blob/main/notebooks/SCU_Demo.ipynb)
106
 
107
  ---
108
 
109
  ## How It Works (Cruise Control Analogy)
110
 
111
  Just like cruise control in your car:
112
+ - **You set the target:** Choose your information ratio $S^*$
113
  - **SCU maintains it automatically:** PI controller adjusts $\lambda$ in real-time
114
  - **No manual intervention:** Works across data distribution shifts and training dynamics
115
 
116
  **Technical Details:**
117
  - **Control variable:** $S=\frac{\text{ParamBPT}}{\text{DataBPT}+\text{ParamBPT}}$
118
+ - **Control law:** $\lambda \leftarrow \lambda \cdot \exp(-(K_p \cdot \text{error} + K_i \cdot I))$
119
  - **Result:** Automatic regularization without hyperparameter sweeps
120
 
121
+ **Key Research Question:**
122
+ Optimal $S^*$ scaling laws are still being discovered. We found 1.0% works for 1B models and 2.88% for 3B models. The relationship between model size, training data, and optimal $S^*$ is an active area of research.
123
+
124
  ---
125
 
126
  ## Licensing & IP
127
 
128
+ * **Model weights:** Meta Llama 3.2 Community License (inherited from base model)
129
+ * **SCU training code:** Apache-2.0 License ([GitHub repository](https://github.com/Hmbown/shannon-control-unit))
130
+ * **IP status:** U.S. patent pending (provisional filed September 2025)
131
 
132
  > Repro tips: block size 1024, batch 1, grad-accum 4, gradient checkpointing on, `use_cache=False`.