docs: publish clean model card (validation + control telemetry)
Browse files
README.md
CHANGED
|
@@ -20,27 +20,40 @@ inference: false
|
|
| 20 |
|
| 21 |
# Shannon Control Unit (SCU) — Cruise Control for LLM Training
|
| 22 |
|
| 23 |
-
[](https://opensource.org/licenses/Apache-2.0)
|
| 24 |
[](https://shannonlabs.dev)
|
| 25 |
[](https://huggingface.co/hunterbown/shannon-control-unit)
|
| 26 |
-
[](https://colab.research.google.com/github/
|
| 27 |
[](https://shannonlabs.dev)
|
| 28 |
|
|
|
|
|
|
|
| 29 |
**Like cruise control maintains your speed regardless of hills, SCU maintains optimal regularization regardless of data complexity.**
|
| 30 |
|
| 31 |
Set your target information ratio \( S^* \), and our PI controller automatically adjusts \( \lambda \) to maintain it throughout training. No manual hyperparameter tuning required.
|
| 32 |
|
| 33 |
**Validated Results:**
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
## Available Models
|
| 39 |
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |

|
| 46 |
|
|
@@ -74,7 +87,7 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
| 74 |
from peft import PeftModel
|
| 75 |
import torch
|
| 76 |
|
| 77 |
-
# For 1B model (
|
| 78 |
base_id = "meta-llama/Llama-3.2-1B" # accept terms on HF first
|
| 79 |
base = AutoModelForCausalLM.from_pretrained(base_id, device_map="auto", torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32)
|
| 80 |
tok = AutoTokenizer.from_pretrained(base_id)
|
|
@@ -89,28 +102,31 @@ model = PeftModel.from_pretrained(base, "hunterbown/shannon-control-unit")
|
|
| 89 |
# model = PeftModel.from_pretrained(base, "hunterbown/shannon-control-unit", subfolder="3b-scu")
|
| 90 |
```
|
| 91 |
|
| 92 |
-
**Demo notebook:** [Open in Colab](https://
|
| 93 |
|
| 94 |
---
|
| 95 |
|
| 96 |
## How It Works (Cruise Control Analogy)
|
| 97 |
|
| 98 |
Just like cruise control in your car:
|
| 99 |
-
- **You set the target:** Choose your information ratio $S^*$
|
| 100 |
- **SCU maintains it automatically:** PI controller adjusts $\lambda$ in real-time
|
| 101 |
- **No manual intervention:** Works across data distribution shifts and training dynamics
|
| 102 |
|
| 103 |
**Technical Details:**
|
| 104 |
- **Control variable:** $S=\frac{\text{ParamBPT}}{\text{DataBPT}+\text{ParamBPT}}$
|
| 105 |
-
- **Control law:** $\lambda \leftarrow \lambda \cdot \exp(-(K_p
|
| 106 |
- **Result:** Automatic regularization without hyperparameter sweeps
|
| 107 |
|
|
|
|
|
|
|
|
|
|
| 108 |
---
|
| 109 |
|
| 110 |
## Licensing & IP
|
| 111 |
|
| 112 |
-
* **
|
| 113 |
-
* **SCU training code:**
|
| 114 |
-
* **IP status:** U.S.
|
| 115 |
|
| 116 |
> Repro tips: block size 1024, batch 1, grad-accum 4, gradient checkpointing on, `use_cache=False`.
|
|
|
|
| 20 |
|
| 21 |
# Shannon Control Unit (SCU) — Cruise Control for LLM Training
|
| 22 |
|
|
|
|
| 23 |
[](https://shannonlabs.dev)
|
| 24 |
[](https://huggingface.co/hunterbown/shannon-control-unit)
|
| 25 |
+
[](https://colab.research.google.com/github/Hmbown/shannon-control-unit/blob/main/notebooks/SCU_Demo.ipynb)
|
| 26 |
[](https://shannonlabs.dev)
|
| 27 |
|
| 28 |
+
**Model Weights:** Llama 3.2 Community License | **Code:** Apache-2.0 ([GitHub](https://github.com/Hmbown/shannon-control-unit))
|
| 29 |
+
|
| 30 |
**Like cruise control maintains your speed regardless of hills, SCU maintains optimal regularization regardless of data complexity.**
|
| 31 |
|
| 32 |
Set your target information ratio \( S^* \), and our PI controller automatically adjusts \( \lambda \) to maintain it throughout training. No manual hyperparameter tuning required.
|
| 33 |
|
| 34 |
**Validated Results:**
|
| 35 |
+
|
| 36 |
+
| Model | Metric | Baseline | SCU | Improvement |
|
| 37 |
+
|-------|--------|----------|-----|-------------|
|
| 38 |
+
| **Llama-3.2-1B** | BPT | 3.920 | 3.676 | **-6.2%** |
|
| 39 |
+
| | Perplexity | 15.14 | 12.78 | **-15.6%** |
|
| 40 |
+
| **Llama-3.2-3B** 🎯 | BPT | 1.830 | 1.635 | **-10.6%** |
|
| 41 |
+
| | Perplexity | 3.56 | 3.11 | **-12.6%** |
|
| 42 |
+
|
| 43 |
+
**Status:** Validated at 1B/3B scales | Seeking partners for 7B+ external validation
|
| 44 |
+
|
| 45 |
+
[View validation artifacts](./3b_validation_results.json) | [Evaluation protocol](./scripts/eval_bpt.py)
|
| 46 |
|
| 47 |
## Available Models
|
| 48 |
|
| 49 |
+
| Directory | Model | S* Target | λ Control | Notes |
|
| 50 |
+
|-----------|-------|-----------|-----------|-------|
|
| 51 |
+
| **main** | Llama-3.2-1B | 1.0% | Adaptive PI | Primary validated model |
|
| 52 |
+
| **1b-scu/** | Llama-3.2-1B | 1.0% | Adaptive PI | Same as main |
|
| 53 |
+
| **3b-scu/** | Llama-3.2-3B | 2.88% | Adaptive (λ=2.61) | Best 3B performance |
|
| 54 |
+
| **3b-fixed/** | Llama-3.2-3B | 3.35% | Fixed λ=0.5 | Ablation study |
|
| 55 |
+
|
| 56 |
+
**Note:** HuggingFace UI shows only the root 1B model. Load 3B models using `subfolder="3b-scu"` parameter in code.
|
| 57 |
|
| 58 |

|
| 59 |
|
|
|
|
| 87 |
from peft import PeftModel
|
| 88 |
import torch
|
| 89 |
|
| 90 |
+
# For 1B model (validated with 6.2% BPT improvement)
|
| 91 |
base_id = "meta-llama/Llama-3.2-1B" # accept terms on HF first
|
| 92 |
base = AutoModelForCausalLM.from_pretrained(base_id, device_map="auto", torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32)
|
| 93 |
tok = AutoTokenizer.from_pretrained(base_id)
|
|
|
|
| 102 |
# model = PeftModel.from_pretrained(base, "hunterbown/shannon-control-unit", subfolder="3b-scu")
|
| 103 |
```
|
| 104 |
|
| 105 |
+
**Demo notebook:** [Open in Colab](https://colab.research.google.com/github/Hmbown/shannon-control-unit/blob/main/notebooks/SCU_Demo.ipynb)
|
| 106 |
|
| 107 |
---
|
| 108 |
|
| 109 |
## How It Works (Cruise Control Analogy)
|
| 110 |
|
| 111 |
Just like cruise control in your car:
|
| 112 |
+
- **You set the target:** Choose your information ratio $S^*$
|
| 113 |
- **SCU maintains it automatically:** PI controller adjusts $\lambda$ in real-time
|
| 114 |
- **No manual intervention:** Works across data distribution shifts and training dynamics
|
| 115 |
|
| 116 |
**Technical Details:**
|
| 117 |
- **Control variable:** $S=\frac{\text{ParamBPT}}{\text{DataBPT}+\text{ParamBPT}}$
|
| 118 |
+
- **Control law:** $\lambda \leftarrow \lambda \cdot \exp(-(K_p \cdot \text{error} + K_i \cdot I))$
|
| 119 |
- **Result:** Automatic regularization without hyperparameter sweeps
|
| 120 |
|
| 121 |
+
**Key Research Question:**
|
| 122 |
+
Optimal $S^*$ scaling laws are still being discovered. We found 1.0% works for 1B models and 2.88% for 3B models. The relationship between model size, training data, and optimal $S^*$ is an active area of research.
|
| 123 |
+
|
| 124 |
---
|
| 125 |
|
| 126 |
## Licensing & IP
|
| 127 |
|
| 128 |
+
* **Model weights:** Meta Llama 3.2 Community License (inherited from base model)
|
| 129 |
+
* **SCU training code:** Apache-2.0 License ([GitHub repository](https://github.com/Hmbown/shannon-control-unit))
|
| 130 |
+
* **IP status:** U.S. patent pending (provisional filed September 2025)
|
| 131 |
|
| 132 |
> Repro tips: block size 1024, batch 1, grad-accum 4, gradient checkpointing on, `use_cache=False`.
|