---
license: apache-2.0
language:
- en
library_name: peft
pipeline_tag: text-generation
tags:
- repetition-suppression
- decode-time-intervention
- llama
- lora
- research
- degeneration
- cf-hot
base_model: LoganResearch/ARC-Base-8B
model-index:
- name: Adaptive-Repetition-Controller
results:
- task:
type: text-generation
metrics:
- name: Repetition Reduction
type: custom
value: 48.4%
- name: Risk Separation
type: custom
value: 125x
- name: F1 Score
type: f1
value: 0.99
---
# β‘ Adaptive Repetition Controller
### *CF-HoT 125x β Learned Decode-Time Intervention*
[](.)
[](.)
[](.)
[](.)
*A learned system that predicts and prevents repetitive degeneration in language models.*
[Base Model](https://huggingface.co/LoganResearch/ARC-Base-8B) | [GitHub](https://github.com/Loganwins/HolonomyTransformer) | [Paper (forthcoming)]()
---
## π― The Problem
Autoregressive language models suffer from **repetitive degeneration** β the tendency to fall into loops, repeat phrases, or get stuck on patterns during long-form generation.
Standard solutions apply **uniform penalties** to repeated tokens. But repetition isn't always bad, and uniform penalties can't distinguish between:
- Natural repetition (articles, pronouns, common words)
- Problematic repetition (loops, stuck patterns, degeneration)
## π‘ The Solution
The **Adaptive Repetition Controller** learns to **predict** when repetition is about to become problematic, then applies **targeted intervention** only when needed.
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GENERATION PIPELINE β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β β
β Input βββΆ Base Model βββΆ Hidden States (32 layers) β
β β β
β βΌ β
β βββββββββββββββββββ β
β β Risk Predictor β β
β β (50K params) β β
β ββββββββββ¬βββββββββ β
β β β
β βΌ β
β risk = 0.95 (HIGH) β
β β β
β βΌ β
β logits[recent_tokens] -= penalty β
β β β
β βΌ β
β Sample next token β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
```
---
## π Results
### Risk Prediction Performance
The system achieves **125x separation** between tokens that will repeat and those that won't:
| Metric | Value |
|--------|-------|
| **F1 Score** | 0.99+ |
| **Risk @ Repeating Tokens** | 0.998 |
| **Risk @ Non-Repeating Tokens** | 0.008 |
| **Separation Factor** | **125x** |
### Generation Quality
| Metric | Baseline | With CF-HoT | Change |
|--------|----------|-------------|--------|
| Repetition Rate | 33.9% | 17.5% | **β 48.4%** |
| Distinct-2 (diversity) | 0.836 | 0.976 | **β 16.7%** |
### Comparison to Standard Methods
| Method | Adaptive | Learned | Repetition Reduction |
|--------|----------|---------|---------------------|
| HuggingFace `repetition_penalty` | β | β | ~20-30% |
| OpenAI `frequency_penalty` | β | β | ~25-35% |
| Contrastive Decoding | β | β | ~30-40% |
| **CF-HoT (this)** | β
| β
| **48.4%** |
---
## ποΈ Architecture
The risk predictor is remarkably small β only **~50,000 parameters** (0.0006% of the base model):
```python
RiskPredictor(
# Extract features from each transformer layer
fiber_projs = ModuleList([
Linear(4096 β 16) for _ in range(32) # 32 layers
]),
# Learn which layers matter most
layer_weights = Parameter(shape=[32]), # Softmax-normalized
# Predict repetition risk
predictor = Sequential(
Linear(16 β 64),
GELU(),
Linear(64 β 64),
GELU(),
Linear(64 β 1), # Risk logit
)
)
```
### Why It Works
1. **Hidden states contain predictive signal** β The model "knows" it's about to repeat before it happens
2. **Different layers encode different information** β Learned aggregation finds the most predictive layers
3. **Decode-time intervention preserves base model** β No modification to attention patterns or learned representations
---
## π Quick Start
### Installation
```bash
pip install transformers peft accelerate torch
```
### Loading the Models
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"LoganResearch/ARC-Base-8B",
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("LoganResearch/ARC-Base-8B")
# Load CF-HoT adapter
model = PeftModel.from_pretrained(
base_model,
"LoganResearch/Adaptive-Repetition-Controller"
)
# Load risk predictor
risk_predictor = torch.load(
hf_hub_download("LoganResearch/Adaptive-Repetition-Controller", "risk_predictor.pt")
)
```
### Generation with CF-HoT Intervention
```python
def generate_with_cfhot(
prompt: str,
max_tokens: int = 512,
penalty_scale: float = 3.0,
threshold: float = 0.1,
temperature: float = 0.8,
rep_window: int = 32,
):
"""Generate text with adaptive repetition suppression."""
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
for _ in range(max_tokens):
with torch.no_grad():
# Forward pass with hidden states
outputs = model(input_ids, output_hidden_states=True)
logits = outputs.logits[:, -1, :]
hidden_states = outputs.hidden_states
# Predict repetition risk
risk = risk_predictor(hidden_states).sigmoid().item()
# Apply adaptive penalty if risk is high
if risk > threshold:
recent_tokens = input_ids[0, -rep_window:].tolist()
penalty = risk * penalty_scale
for token_id in set(recent_tokens):
logits[0, token_id] -= penalty
# Sample next token
probs = torch.softmax(logits / temperature, dim=-1)
next_token = torch.multinomial(probs, num_samples=1)
# Append and check for EOS
input_ids = torch.cat([input_ids, next_token], dim=-1)
if next_token.item() == tokenizer.eos_token_id:
break
return tokenizer.decode(input_ids[0], skip_special_tokens=True)
# Example usage
response = generate_with_cfhot(
"Write a detailed essay on the nature of consciousness:",
max_tokens=1000,
penalty_scale=4.0,
)
print(response)
```
---
## π Files
| File | Size | Description |
|------|------|-------------|
| `risk_predictor.pt` | 8.4 MB | Trained risk prediction network |
| `adapter_model.safetensors` | 218 MB | LoRA adapter weights |
| `adapter_config.json` | 1 KB | PEFT adapter configuration |
---
## βοΈ Training Details
### Dataset & Objective
- **Dataset:** WikiText-2
- **Task:** Binary classification β "Will this token appear in the next 32 tokens?"
- **Loss:** BCEWithLogitsLoss with dynamic class balancing
### Hyperparameters
| Parameter | Value |
|-----------|-------|
| `d_fiber` | 16 |
| `d_control` | 64 |
| `rep_window` | 32 |
| `lr_predictor` | 1e-4 |
| `lr_lora` | 2e-5 |
| `batch_size` | 4 |
| `gradient_accumulation` | 8 |
| `optimal_checkpoint` | Step 5000 |
### Training Progression
| Step | F1 | Risk @ Reps | Risk @ Non-Reps | Separation |
|------|-----|-------------|-----------------|------------|
| 3000 | 0.96 | 0.946 | 0.076 | 12x |
| 4000 | 0.99 | 0.997 | 0.014 | 71x |
| **5000** | **0.99+** | **0.998** | **0.008** | **125x** β |
| 6000 | 0.99+ | 0.999 | 0.021 | 48x |
*Step 5000 is optimal β further training reduces separation due to overfitting.*
---
## π¬ Research Context
### The Journey
This system emerged from research into geometric approaches to semantic consistency. The original theory proposed using **fiber bundles and holonomy** to detect inconsistency in transformer representations.
**What we tried:**
1. β Multiplicative attention gating β destroyed signal
2. β Log-space score modification β gates collapsed to uniform
3. β Normalized gating β NaN at inference
4. β Causal EMA β training/inference mismatch
5. β Extended training β complete collapse
**What worked:**
- β
Supervised risk prediction on explicit labels
- β
Decode-time intervention (no attention modification)
- β
Adaptive penalty based on predicted risk
### What This Is (and Isn't)
|
#### β
What It IS
- Learned repetition penalty
- Decode-time intervention
- ~50K parameter predictor
- 48% repetition reduction
- Proof that hidden states predict degeneration
|
#### β What It's NOT
- Full Lie Holonomy Transformer
- Attention modification
- Geometric computation
- Validation of fiber bundle theory
|
---
## π Citation
```bibtex
@misc{napolitano2026arc,
author = {Napolitano, Logan Matthew},
title = {Adaptive Repetition Controller: Learned Decode-Time Intervention
for Repetition Suppression},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/LoganResearch/Adaptive-Repetition-Controller}},
}
```
---
## π Links
| Resource | Link |
|----------|------|
| **Base Model** | [LoganResearch/ARC-Base-8B](https://huggingface.co/LoganResearch/ARC-Base-8B) |
| **Source Code** | [GitHub: HolonomyTransformer](https://github.com/Loganwins/HolonomyTransformer) |
| **Paper** | *"The Γbermensch Who Cannot Loop"* (forthcoming) |
| **Author** | [Logan Matthew Napolitano](https://github.com/Loganwins) |
---
**The Γbermensch who cannot loop is forced to CREATE.**
---
*Built with determination by [Logan Matthew Napolitano](https://github.com/Loganwins)*