|
|
--- |
|
|
license: llama3.2 |
|
|
library_name: peft |
|
|
pipeline_tag: text-generation |
|
|
base_model: |
|
|
- meta-llama/Llama-3.2-1B |
|
|
- meta-llama/Llama-3.2-3B |
|
|
tags: |
|
|
- lora |
|
|
- peft |
|
|
- control-theory |
|
|
- regularization |
|
|
- information-theory |
|
|
- llama |
|
|
- adapter |
|
|
language: |
|
|
- en |
|
|
inference: false |
|
|
--- |
|
|
|
|
|
# Shannon Control Unit (SCU): Information-Theoretic Regularization via PI Control |
|
|
|
|
|
[](https://shannonlabs.dev) |
|
|
[](https://huggingface.co/hunterbown/shannon-control-unit) |
|
|
[](LICENSE) |
|
|
|
|
|
**Abstract** |
|
|
|
|
|
Shannon Control Unit (SCU) applies closed-loop control to large-scale language model training. Treating regularization strength ($\lambda$) as an actuator and the Minimum Description Length (MDL) information ratio ($S$) as the controlled variable, SCU uses a proportional-integral (PI) controller to maintain a target ($S^*$) throughout optimization. This feedback stabilizes model complexity without manual hyperparameter sweeps. On Llama 3.2 (1B, 3B) fine-tuning, SCU improves bits-per-token by 6-12% over tuned fixed-$\lambda$ baselines while preserving training stability. |
|
|
|
|
|
--- |
|
|
|
|
|
## 1. Problem Statement |
|
|
|
|
|
Conventional regularization (weight decay, dropout) is scheduled open-loop. The effective tendency to overfit varies over the course of training, so static or hand-tuned schedules either under-penalize (memorization) or over-penalize (underfitting). A feedback mechanism that measures the model’s instantaneous information balance and adjusts $\lambda$ accordingly is required. |
|
|
|
|
|
## 2. Methodology |
|
|
|
|
|
SCU couples information theory with PI control. We monitor the MDL-derived information ratio |
|
|
|
|
|
$$ S(t) = \frac{\text{ParamBPT}(t)}{\text{DataBPT}(t) + \text{ParamBPT}(t)} $$ |
|
|
|
|
|
where DataBPT is the bits-per-token of the loss and ParamBPT is the bits-per-token of the parameter update. The control objective is $S(t) \rightarrow S^*$. Let $e(t) = S(t) - S^*$. With plant gain $\partial S / \partial \lambda < 0$, the PI law updates the regularization strength as |
|
|
|
|
|
$$ \lambda_{t+1} = \lambda_t \cdot \exp\left( - (K_p \cdot e(t) + K_i \cdot \sum_{\tau \le t} e(\tau)) \right) $$ |
|
|
|
|
|
optionally with deadband and integral clamping for anti-windup. Updates are applied at gradient-accumulation boundaries to maintain stability. |
|
|
|
|
|
## 3. Results |
|
|
|
|
|
We validated SCU by fine-tuning Llama 3.2 models on a subset of WikiText-103. The results show significant improvements in compression efficiency (Bits Per Token) and Perplexity compared to an optimally tuned cross-entropy baseline. |
|
|
|
|
|
| Model | Metric | Baseline (Cross-Entropy) | SCU (PI Control) | Improvement | |
|
|
|-------|--------|--------------------------|------------------|-------------| |
|
|
| **Llama-3.2-1B** | BPT | 3.920 | **3.676** | **-6.2%** | |
|
|
| | Perplexity | 15.14 | **12.78** | **-15.6%** | |
|
|
| **Llama-3.2-3B** | BPT | 1.830 | **1.635** | **-10.6%** | |
|
|
| | Perplexity | 3.56 | **3.11** | **-12.6%** | |
|
|
|
|
|
*Note: Validation performed on Llama 3.2 LoRA adapters. Baseline represents the best-performing fixed-$\lambda$ configuration found via grid search.* |
|
|
|
|
|
## 4. Related & Concurrent Work |
|
|
|
|
|
The application of control theory to LLM training is an emerging and promising field. |
|
|
|
|
|
### 4.1 Independent Convergence: EntroPIC |
|
|
Recent independent work, **EntroPIC** (arXiv:2511.15248), applies PI control to stabilize policy entropy in reinforcement learning. This convergence indicates that control-theoretic feedback is effective for stabilizing training dynamics. SCU targets the MDL information ratio during supervised pretraining/fine-tuning, whereas EntroPIC targets policy entropy in RL; the objectives are complementary and suggest a broader control lens on neural training. |
|
|
|
|
|
## 5. Future Directions |
|
|
|
|
|
Our ongoing research focuses on: |
|
|
* **Scaling Laws for $S^*$:** Deriving the optimal target $S^*$ from first principles based on model size ($N$) and dataset size ($D$), removing the need for a target setpoint entirely. |
|
|
* **Full-Parameter Training:** Extending validation beyond LoRA to full model pretraining. |
|
|
* **Unified Control:** Investigating if regulating Information Ratio implicitly stabilizes entropy (unifying SCU and EntroPIC findings). |
|
|
|
|
|
## 6. Usage |
|
|
|
|
|
### Installation |
|
|
```bash |
|
|
git clone https://github.com/Shannon-Labs/shannon-control-unit.git |
|
|
cd shannon-control-unit |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
### Quick Start (Inference) |
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
from peft import PeftModel |
|
|
import torch |
|
|
|
|
|
# Load Base Model |
|
|
base_id = "meta-llama/Llama-3.2-3B" |
|
|
base = AutoModelForCausalLM.from_pretrained(base_id, device_map="auto", torch_dtype=torch.float16) |
|
|
|
|
|
# Load SCU Adapter |
|
|
model = PeftModel.from_pretrained(base, "hunterbown/shannon-control-unit", subfolder="3b-scu") |
|
|
``` |
|
|
|
|
|
For reproduction scripts and training details, see [`examples/`](./examples/) and [`scripts/`](./scripts/). |
|
|
|
|
|
## 7. Citation |
|
|
|
|
|
If you use SCU in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{bown2025scu, |
|
|
author = {Bown, Hunter}, |
|
|
title = {Shannon Control Unit: Information-Theoretic Regularization via PI Control}, |
|
|
year = {2025}, |
|
|
publisher = {GitHub}, |
|
|
journal = {GitHub repository}, |
|
|
howpublished = {\url{https://github.com/Shannon-Labs/shannon-control-unit}} |
|
|
} |
|
|
``` |
|
|
|
|
|
## 8. License |
|
|
|
|
|
This repository is dual-licensed: |
|
|
|
|
|
* **Research & Open Source:** [AGPL-3.0](LICENSE). Free for academic and open-source use. |
|
|
* **Commercial:** Proprietary licenses available for closed-source applications. Contact `hunter@shannonlabs.dev`. |
|
|
|
|
|
**Intellectual Property:** The SCU methodology is subject to a U.S. Provisional Patent (Filed September 2025). |
|
|
|