File size: 11,665 Bytes

---
license: apache-2.0
language:
- en
library_name: peft
pipeline_tag: text-generation
tags:
- repetition-suppression
- decode-time-intervention
- llama
- lora
- research
- degeneration
- cf-hot
base_model: LoganResearch/ARC-Base-8B
model-index:
- name: Adaptive-Repetition-Controller
  results:
  - task:
      type: text-generation
    metrics:
    - name: Repetition Reduction
      type: custom
      value: 48.4%
    - name: Risk Separation
      type: custom
      value: 125x
    - name: F1 Score
      type: f1
      value: 0.99
---

<div align="center">

# ⚡ Adaptive Repetition Controller

### *CF-HoT 125x — Learned Decode-Time Intervention*

[![Separation](https://img.shields.io/badge/Risk_Separation-125x-brightgreen?style=for-the-badge)](.)
[![Reduction](https://img.shields.io/badge/Repetition_Reduction-48.4%25-blue?style=for-the-badge)](.)
[![F1](https://img.shields.io/badge/F1_Score-0.99+-purple?style=for-the-badge)](.)
[![Params](https://img.shields.io/badge/Predictor-50K_params-orange?style=for-the-badge)](.)

*A learned system that predicts and prevents repetitive degeneration in language models.*

[Base Model](https://huggingface.co/LoganResearch/ARC-Base-8B) | [GitHub](https://github.com/Loganwins/HolonomyTransformer) | [Paper (forthcoming)]()

</div>

---

## 🎯 The Problem

Autoregressive language models suffer from **repetitive degeneration** — the tendency to fall into loops, repeat phrases, or get stuck on patterns during long-form generation.

Standard solutions apply **uniform penalties** to repeated tokens. But repetition isn't always bad, and uniform penalties can't distinguish between:
- Natural repetition (articles, pronouns, common words)
- Problematic repetition (loops, stuck patterns, degeneration)

## 💡 The Solution

The **Adaptive Repetition Controller** learns to **predict** when repetition is about to become problematic, then applies **targeted intervention** only when needed.

<div align="center">

```
╔═══════════════════════════════════════════════════════════════╗
║                    GENERATION PIPELINE                        ║
╠═══════════════════════════════════════════════════════════════╣
║                                                               ║
║   Input  ──▶  Base Model  ──▶  Hidden States (32 layers)     ║
║                                       │                       ║
║                                       ▼                       ║
║                              ┌─────────────────┐              ║
║                              │ Risk Predictor  │              ║
║                              │   (50K params)  │              ║
║                              └────────┬────────┘              ║
║                                       │                       ║
║                                       ▼                       ║
║                              risk = 0.95 (HIGH)               ║
║                                       │                       ║
║                                       ▼                       ║
║                    logits[recent_tokens] -= penalty           ║
║                                       │                       ║
║                                       ▼                       ║
║                              Sample next token                ║
║                                                               ║
╚═══════════════════════════════════════════════════════════════╝
```

</div>

---

## 📊 Results

### Risk Prediction Performance

The system achieves **125x separation** between tokens that will repeat and those that won't:

| Metric | Value |
|--------|-------|
| **F1 Score** | 0.99+ |
| **Risk @ Repeating Tokens** | 0.998 |
| **Risk @ Non-Repeating Tokens** | 0.008 |
| **Separation Factor** | **125x** |

### Generation Quality

| Metric | Baseline | With CF-HoT | Change |
|--------|----------|-------------|--------|
| Repetition Rate | 33.9% | 17.5% | **↓ 48.4%** |
| Distinct-2 (diversity) | 0.836 | 0.976 | **↑ 16.7%** |

### Comparison to Standard Methods

| Method | Adaptive | Learned | Repetition Reduction |
|--------|----------|---------|---------------------|
| HuggingFace `repetition_penalty` | ❌ | ❌ | ~20-30% |
| OpenAI `frequency_penalty` | ❌ | ❌ | ~25-35% |
| Contrastive Decoding | ❌ | ❌ | ~30-40% |
| **CF-HoT (this)** | ✅ | ✅ | **48.4%** |

---

## 🏗️ Architecture

The risk predictor is remarkably small — only **~50,000 parameters** (0.0006% of the base model):

```python
RiskPredictor(
    # Extract features from each transformer layer
    fiber_projs = ModuleList([
        Linear(4096 → 16) for _ in range(32)  # 32 layers
    ]),
    
    # Learn which layers matter most
    layer_weights = Parameter(shape=[32]),  # Softmax-normalized
    
    # Predict repetition risk
    predictor = Sequential(
        Linear(16 → 64),
        GELU(),
        Linear(64 → 64),
        GELU(),
        Linear(64 → 1),  # Risk logit
    )
)
```

### Why It Works

1. **Hidden states contain predictive signal** — The model "knows" it's about to repeat before it happens
2. **Different layers encode different information** — Learned aggregation finds the most predictive layers
3. **Decode-time intervention preserves base model** — No modification to attention patterns or learned representations

---

## 🚀 Quick Start

### Installation

```bash
pip install transformers peft accelerate torch
```

### Loading the Models

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "LoganResearch/ARC-Base-8B",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("LoganResearch/ARC-Base-8B")

# Load CF-HoT adapter
model = PeftModel.from_pretrained(
    base_model,
    "LoganResearch/Adaptive-Repetition-Controller"
)

# Load risk predictor
risk_predictor = torch.load(
    hf_hub_download("LoganResearch/Adaptive-Repetition-Controller", "risk_predictor.pt")
)
```

### Generation with CF-HoT Intervention

```python
def generate_with_cfhot(
    prompt: str,
    max_tokens: int = 512,
    penalty_scale: float = 3.0,
    threshold: float = 0.1,
    temperature: float = 0.8,
    rep_window: int = 32,
):
    """Generate text with adaptive repetition suppression."""
    
    input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
    
    for _ in range(max_tokens):
        with torch.no_grad():
            # Forward pass with hidden states
            outputs = model(input_ids, output_hidden_states=True)
            logits = outputs.logits[:, -1, :]
            hidden_states = outputs.hidden_states
            
            # Predict repetition risk
            risk = risk_predictor(hidden_states).sigmoid().item()
            
            # Apply adaptive penalty if risk is high
            if risk > threshold:
                recent_tokens = input_ids[0, -rep_window:].tolist()
                penalty = risk * penalty_scale
                for token_id in set(recent_tokens):
                    logits[0, token_id] -= penalty
            
            # Sample next token
            probs = torch.softmax(logits / temperature, dim=-1)
            next_token = torch.multinomial(probs, num_samples=1)
            
            # Append and check for EOS
            input_ids = torch.cat([input_ids, next_token], dim=-1)
            if next_token.item() == tokenizer.eos_token_id:
                break
    
    return tokenizer.decode(input_ids[0], skip_special_tokens=True)

# Example usage
response = generate_with_cfhot(
    "Write a detailed essay on the nature of consciousness:",
    max_tokens=1000,
    penalty_scale=4.0,
)
print(response)
```

---

## 📁 Files

| File | Size | Description |
|------|------|-------------|
| `risk_predictor.pt` | 8.4 MB | Trained risk prediction network |
| `adapter_model.safetensors` | 218 MB | LoRA adapter weights |
| `adapter_config.json` | 1 KB | PEFT adapter configuration |

---

## ⚙️ Training Details

### Dataset & Objective

- **Dataset:** WikiText-2
- **Task:** Binary classification — "Will this token appear in the next 32 tokens?"
- **Loss:** BCEWithLogitsLoss with dynamic class balancing

### Hyperparameters

| Parameter | Value |
|-----------|-------|
| `d_fiber` | 16 |
| `d_control` | 64 |
| `rep_window` | 32 |
| `lr_predictor` | 1e-4 |
| `lr_lora` | 2e-5 |
| `batch_size` | 4 |
| `gradient_accumulation` | 8 |
| `optimal_checkpoint` | Step 5000 |

### Training Progression

| Step | F1 | Risk @ Reps | Risk @ Non-Reps | Separation |
|------|-----|-------------|-----------------|------------|
| 3000 | 0.96 | 0.946 | 0.076 | 12x |
| 4000 | 0.99 | 0.997 | 0.014 | 71x |
| **5000** | **0.99+** | **0.998** | **0.008** | **125x** ⭐ |
| 6000 | 0.99+ | 0.999 | 0.021 | 48x |

*Step 5000 is optimal — further training reduces separation due to overfitting.*

---

## 🔬 Research Context

### The Journey

This system emerged from research into geometric approaches to semantic consistency. The original theory proposed using **fiber bundles and holonomy** to detect inconsistency in transformer representations.

**What we tried:**
1. ❌ Multiplicative attention gating — destroyed signal
2. ❌ Log-space score modification — gates collapsed to uniform
3. ❌ Normalized gating — NaN at inference
4. ❌ Causal EMA — training/inference mismatch
5. ❌ Extended training — complete collapse

**What worked:**
- ✅ Supervised risk prediction on explicit labels
- ✅ Decode-time intervention (no attention modification)
- ✅ Adaptive penalty based on predicted risk

### What This Is (and Isn't)

<table>
<tr>
<td width="50%">

#### ✅ What It IS
- Learned repetition penalty
- Decode-time intervention
- ~50K parameter predictor
- 48% repetition reduction
- Proof that hidden states predict degeneration

</td>
<td width="50%">

#### ❌ What It's NOT
- Full Lie Holonomy Transformer
- Attention modification
- Geometric computation
- Validation of fiber bundle theory

</td>
</tr>
</table>

---

## 📚 Citation

```bibtex
@misc{napolitano2026arc,
  author = {Napolitano, Logan Matthew},
  title = {Adaptive Repetition Controller: Learned Decode-Time Intervention 
           for Repetition Suppression},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/LoganResearch/Adaptive-Repetition-Controller}},
}
```

---

## 🔗 Links

| Resource | Link |
|----------|------|
| **Base Model** | [LoganResearch/ARC-Base-8B](https://huggingface.co/LoganResearch/ARC-Base-8B) |
| **Source Code** | [GitHub: HolonomyTransformer](https://github.com/Loganwins/HolonomyTransformer) |
| **Paper** | *"The Übermensch Who Cannot Loop"* (forthcoming) |
| **Author** | [Logan Matthew Napolitano](https://github.com/Loganwins) |

---

<div align="center">

**The Übermensch who cannot loop is forced to CREATE.**

---

*Built with determination by [Logan Matthew Napolitano](https://github.com/Loganwins)*

</div>