|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
library_name: peft |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- repetition-suppression |
|
|
- decode-time-intervention |
|
|
- llama |
|
|
- lora |
|
|
- research |
|
|
- degeneration |
|
|
- cf-hot |
|
|
base_model: LoganResearch/ARC-Base-8B |
|
|
model-index: |
|
|
- name: Adaptive-Repetition-Controller |
|
|
results: |
|
|
- task: |
|
|
type: text-generation |
|
|
metrics: |
|
|
- name: Repetition Reduction |
|
|
type: custom |
|
|
value: 48.4% |
|
|
- name: Risk Separation |
|
|
type: custom |
|
|
value: 125x |
|
|
- name: F1 Score |
|
|
type: f1 |
|
|
value: 0.99 |
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
# ⚡ Adaptive Repetition Controller |
|
|
|
|
|
### *CF-HoT 125x — Learned Decode-Time Intervention* |
|
|
|
|
|
[](.) |
|
|
[](.) |
|
|
[](.) |
|
|
[](.) |
|
|
|
|
|
*A learned system that predicts and prevents repetitive degeneration in language models.* |
|
|
|
|
|
[Base Model](https://huggingface.co/LoganResearch/ARC-Base-8B) | [GitHub](https://github.com/Loganwins/HolonomyTransformer) | [Paper (forthcoming)]() |
|
|
|
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
## 🎯 The Problem |
|
|
|
|
|
Autoregressive language models suffer from **repetitive degeneration** — the tendency to fall into loops, repeat phrases, or get stuck on patterns during long-form generation. |
|
|
|
|
|
Standard solutions apply **uniform penalties** to repeated tokens. But repetition isn't always bad, and uniform penalties can't distinguish between: |
|
|
- Natural repetition (articles, pronouns, common words) |
|
|
- Problematic repetition (loops, stuck patterns, degeneration) |
|
|
|
|
|
## 💡 The Solution |
|
|
|
|
|
The **Adaptive Repetition Controller** learns to **predict** when repetition is about to become problematic, then applies **targeted intervention** only when needed. |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
``` |
|
|
╔═══════════════════════════════════════════════════════════════╗ |
|
|
║ GENERATION PIPELINE ║ |
|
|
╠═══════════════════════════════════════════════════════════════╣ |
|
|
║ ║ |
|
|
║ Input ──▶ Base Model ──▶ Hidden States (32 layers) ║ |
|
|
║ │ ║ |
|
|
║ ▼ ║ |
|
|
║ ┌─────────────────┐ ║ |
|
|
║ │ Risk Predictor │ ║ |
|
|
║ │ (50K params) │ ║ |
|
|
║ └────────┬────────┘ ║ |
|
|
║ │ ║ |
|
|
║ ▼ ║ |
|
|
║ risk = 0.95 (HIGH) ║ |
|
|
║ │ ║ |
|
|
║ ▼ ║ |
|
|
║ logits[recent_tokens] -= penalty ║ |
|
|
║ │ ║ |
|
|
║ ▼ ║ |
|
|
║ Sample next token ║ |
|
|
║ ║ |
|
|
╚═══════════════════════════════════════════════════════════════╝ |
|
|
``` |
|
|
|
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
## 📊 Results |
|
|
|
|
|
### Risk Prediction Performance |
|
|
|
|
|
The system achieves **125x separation** between tokens that will repeat and those that won't: |
|
|
|
|
|
| Metric | Value | |
|
|
|--------|-------| |
|
|
| **F1 Score** | 0.99+ | |
|
|
| **Risk @ Repeating Tokens** | 0.998 | |
|
|
| **Risk @ Non-Repeating Tokens** | 0.008 | |
|
|
| **Separation Factor** | **125x** | |
|
|
|
|
|
### Generation Quality |
|
|
|
|
|
| Metric | Baseline | With CF-HoT | Change | |
|
|
|--------|----------|-------------|--------| |
|
|
| Repetition Rate | 33.9% | 17.5% | **↓ 48.4%** | |
|
|
| Distinct-2 (diversity) | 0.836 | 0.976 | **↑ 16.7%** | |
|
|
|
|
|
### Comparison to Standard Methods |
|
|
|
|
|
| Method | Adaptive | Learned | Repetition Reduction | |
|
|
|--------|----------|---------|---------------------| |
|
|
| HuggingFace `repetition_penalty` | ❌ | ❌ | ~20-30% | |
|
|
| OpenAI `frequency_penalty` | ❌ | ❌ | ~25-35% | |
|
|
| Contrastive Decoding | ❌ | ❌ | ~30-40% | |
|
|
| **CF-HoT (this)** | ✅ | ✅ | **48.4%** | |
|
|
|
|
|
--- |
|
|
|
|
|
## 🏗️ Architecture |
|
|
|
|
|
The risk predictor is remarkably small — only **~50,000 parameters** (0.0006% of the base model): |
|
|
|
|
|
```python |
|
|
RiskPredictor( |
|
|
# Extract features from each transformer layer |
|
|
fiber_projs = ModuleList([ |
|
|
Linear(4096 → 16) for _ in range(32) # 32 layers |
|
|
]), |
|
|
|
|
|
# Learn which layers matter most |
|
|
layer_weights = Parameter(shape=[32]), # Softmax-normalized |
|
|
|
|
|
# Predict repetition risk |
|
|
predictor = Sequential( |
|
|
Linear(16 → 64), |
|
|
GELU(), |
|
|
Linear(64 → 64), |
|
|
GELU(), |
|
|
Linear(64 → 1), # Risk logit |
|
|
) |
|
|
) |
|
|
``` |
|
|
|
|
|
### Why It Works |
|
|
|
|
|
1. **Hidden states contain predictive signal** — The model "knows" it's about to repeat before it happens |
|
|
2. **Different layers encode different information** — Learned aggregation finds the most predictive layers |
|
|
3. **Decode-time intervention preserves base model** — No modification to attention patterns or learned representations |
|
|
|
|
|
--- |
|
|
|
|
|
## 🚀 Quick Start |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install transformers peft accelerate torch |
|
|
``` |
|
|
|
|
|
### Loading the Models |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
from peft import PeftModel |
|
|
|
|
|
# Load base model |
|
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
|
"LoganResearch/ARC-Base-8B", |
|
|
torch_dtype=torch.bfloat16, |
|
|
device_map="auto" |
|
|
) |
|
|
|
|
|
# Load tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained("LoganResearch/ARC-Base-8B") |
|
|
|
|
|
# Load CF-HoT adapter |
|
|
model = PeftModel.from_pretrained( |
|
|
base_model, |
|
|
"LoganResearch/Adaptive-Repetition-Controller" |
|
|
) |
|
|
|
|
|
# Load risk predictor |
|
|
risk_predictor = torch.load( |
|
|
hf_hub_download("LoganResearch/Adaptive-Repetition-Controller", "risk_predictor.pt") |
|
|
) |
|
|
``` |
|
|
|
|
|
### Generation with CF-HoT Intervention |
|
|
|
|
|
```python |
|
|
def generate_with_cfhot( |
|
|
prompt: str, |
|
|
max_tokens: int = 512, |
|
|
penalty_scale: float = 3.0, |
|
|
threshold: float = 0.1, |
|
|
temperature: float = 0.8, |
|
|
rep_window: int = 32, |
|
|
): |
|
|
"""Generate text with adaptive repetition suppression.""" |
|
|
|
|
|
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device) |
|
|
|
|
|
for _ in range(max_tokens): |
|
|
with torch.no_grad(): |
|
|
# Forward pass with hidden states |
|
|
outputs = model(input_ids, output_hidden_states=True) |
|
|
logits = outputs.logits[:, -1, :] |
|
|
hidden_states = outputs.hidden_states |
|
|
|
|
|
# Predict repetition risk |
|
|
risk = risk_predictor(hidden_states).sigmoid().item() |
|
|
|
|
|
# Apply adaptive penalty if risk is high |
|
|
if risk > threshold: |
|
|
recent_tokens = input_ids[0, -rep_window:].tolist() |
|
|
penalty = risk * penalty_scale |
|
|
for token_id in set(recent_tokens): |
|
|
logits[0, token_id] -= penalty |
|
|
|
|
|
# Sample next token |
|
|
probs = torch.softmax(logits / temperature, dim=-1) |
|
|
next_token = torch.multinomial(probs, num_samples=1) |
|
|
|
|
|
# Append and check for EOS |
|
|
input_ids = torch.cat([input_ids, next_token], dim=-1) |
|
|
if next_token.item() == tokenizer.eos_token_id: |
|
|
break |
|
|
|
|
|
return tokenizer.decode(input_ids[0], skip_special_tokens=True) |
|
|
|
|
|
# Example usage |
|
|
response = generate_with_cfhot( |
|
|
"Write a detailed essay on the nature of consciousness:", |
|
|
max_tokens=1000, |
|
|
penalty_scale=4.0, |
|
|
) |
|
|
print(response) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## 📁 Files |
|
|
|
|
|
| File | Size | Description | |
|
|
|------|------|-------------| |
|
|
| `risk_predictor.pt` | 8.4 MB | Trained risk prediction network | |
|
|
| `adapter_model.safetensors` | 218 MB | LoRA adapter weights | |
|
|
| `adapter_config.json` | 1 KB | PEFT adapter configuration | |
|
|
|
|
|
--- |
|
|
|
|
|
## ⚙️ Training Details |
|
|
|
|
|
### Dataset & Objective |
|
|
|
|
|
- **Dataset:** WikiText-2 |
|
|
- **Task:** Binary classification — "Will this token appear in the next 32 tokens?" |
|
|
- **Loss:** BCEWithLogitsLoss with dynamic class balancing |
|
|
|
|
|
### Hyperparameters |
|
|
|
|
|
| Parameter | Value | |
|
|
|-----------|-------| |
|
|
| `d_fiber` | 16 | |
|
|
| `d_control` | 64 | |
|
|
| `rep_window` | 32 | |
|
|
| `lr_predictor` | 1e-4 | |
|
|
| `lr_lora` | 2e-5 | |
|
|
| `batch_size` | 4 | |
|
|
| `gradient_accumulation` | 8 | |
|
|
| `optimal_checkpoint` | Step 5000 | |
|
|
|
|
|
### Training Progression |
|
|
|
|
|
| Step | F1 | Risk @ Reps | Risk @ Non-Reps | Separation | |
|
|
|------|-----|-------------|-----------------|------------| |
|
|
| 3000 | 0.96 | 0.946 | 0.076 | 12x | |
|
|
| 4000 | 0.99 | 0.997 | 0.014 | 71x | |
|
|
| **5000** | **0.99+** | **0.998** | **0.008** | **125x** ⭐ | |
|
|
| 6000 | 0.99+ | 0.999 | 0.021 | 48x | |
|
|
|
|
|
*Step 5000 is optimal — further training reduces separation due to overfitting.* |
|
|
|
|
|
--- |
|
|
|
|
|
## 🔬 Research Context |
|
|
|
|
|
### The Journey |
|
|
|
|
|
This system emerged from research into geometric approaches to semantic consistency. The original theory proposed using **fiber bundles and holonomy** to detect inconsistency in transformer representations. |
|
|
|
|
|
**What we tried:** |
|
|
1. ❌ Multiplicative attention gating — destroyed signal |
|
|
2. ❌ Log-space score modification — gates collapsed to uniform |
|
|
3. ❌ Normalized gating — NaN at inference |
|
|
4. ❌ Causal EMA — training/inference mismatch |
|
|
5. ❌ Extended training — complete collapse |
|
|
|
|
|
**What worked:** |
|
|
- ✅ Supervised risk prediction on explicit labels |
|
|
- ✅ Decode-time intervention (no attention modification) |
|
|
- ✅ Adaptive penalty based on predicted risk |
|
|
|
|
|
### What This Is (and Isn't) |
|
|
|
|
|
<table> |
|
|
<tr> |
|
|
<td width="50%"> |
|
|
|
|
|
#### ✅ What It IS |
|
|
- Learned repetition penalty |
|
|
- Decode-time intervention |
|
|
- ~50K parameter predictor |
|
|
- 48% repetition reduction |
|
|
- Proof that hidden states predict degeneration |
|
|
|
|
|
</td> |
|
|
<td width="50%"> |
|
|
|
|
|
#### ❌ What It's NOT |
|
|
- Full Lie Holonomy Transformer |
|
|
- Attention modification |
|
|
- Geometric computation |
|
|
- Validation of fiber bundle theory |
|
|
|
|
|
</td> |
|
|
</tr> |
|
|
</table> |
|
|
|
|
|
--- |
|
|
|
|
|
## 📚 Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{napolitano2026arc, |
|
|
author = {Napolitano, Logan Matthew}, |
|
|
title = {Adaptive Repetition Controller: Learned Decode-Time Intervention |
|
|
for Repetition Suppression}, |
|
|
year = {2026}, |
|
|
publisher = {Hugging Face}, |
|
|
howpublished = {\url{https://huggingface.co/LoganResearch/Adaptive-Repetition-Controller}}, |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## 🔗 Links |
|
|
|
|
|
| Resource | Link | |
|
|
|----------|------| |
|
|
| **Base Model** | [LoganResearch/ARC-Base-8B](https://huggingface.co/LoganResearch/ARC-Base-8B) | |
|
|
| **Source Code** | [GitHub: HolonomyTransformer](https://github.com/Loganwins/HolonomyTransformer) | |
|
|
| **Paper** | *"The Übermensch Who Cannot Loop"* (forthcoming) | |
|
|
| **Author** | [Logan Matthew Napolitano](https://github.com/Loganwins) | |
|
|
|
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
**The Übermensch who cannot loop is forced to CREATE.** |
|
|
|
|
|
--- |
|
|
|
|
|
*Built with determination by [Logan Matthew Napolitano](https://github.com/Loganwins)* |
|
|
|
|
|
</div> |