LoganResearch's picture
Update README.md
4b937cf verified
---
license: apache-2.0
language:
- en
library_name: peft
pipeline_tag: text-generation
tags:
- repetition-suppression
- decode-time-intervention
- llama
- lora
- research
- degeneration
- cf-hot
base_model: LoganResearch/ARC-Base-8B
model-index:
- name: Adaptive-Repetition-Controller
results:
- task:
type: text-generation
metrics:
- name: Repetition Reduction
type: custom
value: 48.4%
- name: Risk Separation
type: custom
value: 125x
- name: F1 Score
type: f1
value: 0.99
---
<div align="center">
# ⚡ Adaptive Repetition Controller
### *CF-HoT 125x — Learned Decode-Time Intervention*
[![Separation](https://img.shields.io/badge/Risk_Separation-125x-brightgreen?style=for-the-badge)](.)
[![Reduction](https://img.shields.io/badge/Repetition_Reduction-48.4%25-blue?style=for-the-badge)](.)
[![F1](https://img.shields.io/badge/F1_Score-0.99+-purple?style=for-the-badge)](.)
[![Params](https://img.shields.io/badge/Predictor-50K_params-orange?style=for-the-badge)](.)
*A learned system that predicts and prevents repetitive degeneration in language models.*
[Base Model](https://huggingface.co/LoganResearch/ARC-Base-8B) | [GitHub](https://github.com/Loganwins/HolonomyTransformer) | [Paper (forthcoming)]()
</div>
---
## 🎯 The Problem
Autoregressive language models suffer from **repetitive degeneration** — the tendency to fall into loops, repeat phrases, or get stuck on patterns during long-form generation.
Standard solutions apply **uniform penalties** to repeated tokens. But repetition isn't always bad, and uniform penalties can't distinguish between:
- Natural repetition (articles, pronouns, common words)
- Problematic repetition (loops, stuck patterns, degeneration)
## 💡 The Solution
The **Adaptive Repetition Controller** learns to **predict** when repetition is about to become problematic, then applies **targeted intervention** only when needed.
<div align="center">
```
╔═══════════════════════════════════════════════════════════════╗
║ GENERATION PIPELINE ║
╠═══════════════════════════════════════════════════════════════╣
║ ║
║ Input ──▶ Base Model ──▶ Hidden States (32 layers) ║
║ │ ║
║ ▼ ║
║ ┌─────────────────┐ ║
║ │ Risk Predictor │ ║
║ │ (50K params) │ ║
║ └────────┬────────┘ ║
║ │ ║
║ ▼ ║
║ risk = 0.95 (HIGH) ║
║ │ ║
║ ▼ ║
║ logits[recent_tokens] -= penalty ║
║ │ ║
║ ▼ ║
║ Sample next token ║
║ ║
╚═══════════════════════════════════════════════════════════════╝
```
</div>
---
## 📊 Results
### Risk Prediction Performance
The system achieves **125x separation** between tokens that will repeat and those that won't:
| Metric | Value |
|--------|-------|
| **F1 Score** | 0.99+ |
| **Risk @ Repeating Tokens** | 0.998 |
| **Risk @ Non-Repeating Tokens** | 0.008 |
| **Separation Factor** | **125x** |
### Generation Quality
| Metric | Baseline | With CF-HoT | Change |
|--------|----------|-------------|--------|
| Repetition Rate | 33.9% | 17.5% | **↓ 48.4%** |
| Distinct-2 (diversity) | 0.836 | 0.976 | **↑ 16.7%** |
### Comparison to Standard Methods
| Method | Adaptive | Learned | Repetition Reduction |
|--------|----------|---------|---------------------|
| HuggingFace `repetition_penalty` | ❌ | ❌ | ~20-30% |
| OpenAI `frequency_penalty` | ❌ | ❌ | ~25-35% |
| Contrastive Decoding | ❌ | ❌ | ~30-40% |
| **CF-HoT (this)** | ✅ | ✅ | **48.4%** |
---
## 🏗️ Architecture
The risk predictor is remarkably small — only **~50,000 parameters** (0.0006% of the base model):
```python
RiskPredictor(
# Extract features from each transformer layer
fiber_projs = ModuleList([
Linear(4096 → 16) for _ in range(32) # 32 layers
]),
# Learn which layers matter most
layer_weights = Parameter(shape=[32]), # Softmax-normalized
# Predict repetition risk
predictor = Sequential(
Linear(16 → 64),
GELU(),
Linear(64 → 64),
GELU(),
Linear(64 → 1), # Risk logit
)
)
```
### Why It Works
1. **Hidden states contain predictive signal** — The model "knows" it's about to repeat before it happens
2. **Different layers encode different information** — Learned aggregation finds the most predictive layers
3. **Decode-time intervention preserves base model** — No modification to attention patterns or learned representations
---
## 🚀 Quick Start
### Installation
```bash
pip install transformers peft accelerate torch
```
### Loading the Models
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"LoganResearch/ARC-Base-8B",
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("LoganResearch/ARC-Base-8B")
# Load CF-HoT adapter
model = PeftModel.from_pretrained(
base_model,
"LoganResearch/Adaptive-Repetition-Controller"
)
# Load risk predictor
risk_predictor = torch.load(
hf_hub_download("LoganResearch/Adaptive-Repetition-Controller", "risk_predictor.pt")
)
```
### Generation with CF-HoT Intervention
```python
def generate_with_cfhot(
prompt: str,
max_tokens: int = 512,
penalty_scale: float = 3.0,
threshold: float = 0.1,
temperature: float = 0.8,
rep_window: int = 32,
):
"""Generate text with adaptive repetition suppression."""
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
for _ in range(max_tokens):
with torch.no_grad():
# Forward pass with hidden states
outputs = model(input_ids, output_hidden_states=True)
logits = outputs.logits[:, -1, :]
hidden_states = outputs.hidden_states
# Predict repetition risk
risk = risk_predictor(hidden_states).sigmoid().item()
# Apply adaptive penalty if risk is high
if risk > threshold:
recent_tokens = input_ids[0, -rep_window:].tolist()
penalty = risk * penalty_scale
for token_id in set(recent_tokens):
logits[0, token_id] -= penalty
# Sample next token
probs = torch.softmax(logits / temperature, dim=-1)
next_token = torch.multinomial(probs, num_samples=1)
# Append and check for EOS
input_ids = torch.cat([input_ids, next_token], dim=-1)
if next_token.item() == tokenizer.eos_token_id:
break
return tokenizer.decode(input_ids[0], skip_special_tokens=True)
# Example usage
response = generate_with_cfhot(
"Write a detailed essay on the nature of consciousness:",
max_tokens=1000,
penalty_scale=4.0,
)
print(response)
```
---
## 📁 Files
| File | Size | Description |
|------|------|-------------|
| `risk_predictor.pt` | 8.4 MB | Trained risk prediction network |
| `adapter_model.safetensors` | 218 MB | LoRA adapter weights |
| `adapter_config.json` | 1 KB | PEFT adapter configuration |
---
## ⚙️ Training Details
### Dataset & Objective
- **Dataset:** WikiText-2
- **Task:** Binary classification — "Will this token appear in the next 32 tokens?"
- **Loss:** BCEWithLogitsLoss with dynamic class balancing
### Hyperparameters
| Parameter | Value |
|-----------|-------|
| `d_fiber` | 16 |
| `d_control` | 64 |
| `rep_window` | 32 |
| `lr_predictor` | 1e-4 |
| `lr_lora` | 2e-5 |
| `batch_size` | 4 |
| `gradient_accumulation` | 8 |
| `optimal_checkpoint` | Step 5000 |
### Training Progression
| Step | F1 | Risk @ Reps | Risk @ Non-Reps | Separation |
|------|-----|-------------|-----------------|------------|
| 3000 | 0.96 | 0.946 | 0.076 | 12x |
| 4000 | 0.99 | 0.997 | 0.014 | 71x |
| **5000** | **0.99+** | **0.998** | **0.008** | **125x** ⭐ |
| 6000 | 0.99+ | 0.999 | 0.021 | 48x |
*Step 5000 is optimal — further training reduces separation due to overfitting.*
---
## 🔬 Research Context
### The Journey
This system emerged from research into geometric approaches to semantic consistency. The original theory proposed using **fiber bundles and holonomy** to detect inconsistency in transformer representations.
**What we tried:**
1. ❌ Multiplicative attention gating — destroyed signal
2. ❌ Log-space score modification — gates collapsed to uniform
3. ❌ Normalized gating — NaN at inference
4. ❌ Causal EMA — training/inference mismatch
5. ❌ Extended training — complete collapse
**What worked:**
- ✅ Supervised risk prediction on explicit labels
- ✅ Decode-time intervention (no attention modification)
- ✅ Adaptive penalty based on predicted risk
### What This Is (and Isn't)
<table>
<tr>
<td width="50%">
#### ✅ What It IS
- Learned repetition penalty
- Decode-time intervention
- ~50K parameter predictor
- 48% repetition reduction
- Proof that hidden states predict degeneration
</td>
<td width="50%">
#### ❌ What It's NOT
- Full Lie Holonomy Transformer
- Attention modification
- Geometric computation
- Validation of fiber bundle theory
</td>
</tr>
</table>
---
## 📚 Citation
```bibtex
@misc{napolitano2026arc,
author = {Napolitano, Logan Matthew},
title = {Adaptive Repetition Controller: Learned Decode-Time Intervention
for Repetition Suppression},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/LoganResearch/Adaptive-Repetition-Controller}},
}
```
---
## 🔗 Links
| Resource | Link |
|----------|------|
| **Base Model** | [LoganResearch/ARC-Base-8B](https://huggingface.co/LoganResearch/ARC-Base-8B) |
| **Source Code** | [GitHub: HolonomyTransformer](https://github.com/Loganwins/HolonomyTransformer) |
| **Paper** | *"The Übermensch Who Cannot Loop"* (forthcoming) |
| **Author** | [Logan Matthew Napolitano](https://github.com/Loganwins) |
---
<div align="center">
**The Übermensch who cannot loop is forced to CREATE.**
---
*Built with determination by [Logan Matthew Napolitano](https://github.com/Loganwins)*
</div>