Update README.md

4b937cf verified 15 days ago

11.7 kB

	---
	license: apache-2.0
	language:
	- en
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- repetition-suppression
	- decode-time-intervention
	- llama
	- lora
	- research
	- degeneration
	- cf-hot
	base_model: LoganResearch/ARC-Base-8B
	model-index:
	- name: Adaptive-Repetition-Controller
	results:
	- task:
	type: text-generation
	metrics:
	- name: Repetition Reduction
	type: custom
	value: 48.4%
	- name: Risk Separation
	type: custom
	value: 125x
	- name: F1 Score
	type: f1
	value: 0.99
	---

	<div align="center">

	# ⚡ Adaptive Repetition Controller

	### CF-HoT 125x — Learned Decode-Time Intervention

	[![Separation](https://img.shields.io/badge/Risk_Separation-125x-brightgreen?style=for-the-badge)](.)
	[![Reduction](https://img.shields.io/badge/Repetition_Reduction-48.4%25-blue?style=for-the-badge)](.)
	[![F1](https://img.shields.io/badge/F1_Score-0.99+-purple?style=for-the-badge)](.)
	[![Params](https://img.shields.io/badge/Predictor-50K_params-orange?style=for-the-badge)](.)

	A learned system that predicts and prevents repetitive degeneration in language models.

	[Base Model](https://huggingface.co/LoganResearch/ARC-Base-8B) \| [GitHub](https://github.com/Loganwins/HolonomyTransformer) \| [Paper (forthcoming)]()

	</div>

	---

	## 🎯 The Problem

	Autoregressive language models suffer from repetitive degeneration — the tendency to fall into loops, repeat phrases, or get stuck on patterns during long-form generation.

	Standard solutions apply uniform penalties to repeated tokens. But repetition isn't always bad, and uniform penalties can't distinguish between:
	- Natural repetition (articles, pronouns, common words)
	- Problematic repetition (loops, stuck patterns, degeneration)

	## 💡 The Solution

	The Adaptive Repetition Controller learns to predict when repetition is about to become problematic, then applies targeted intervention only when needed.

	<div align="center">

	```
	╔═══════════════════════════════════════════════════════════════╗
	║ GENERATION PIPELINE ║
	╠═══════════════════════════════════════════════════════════════╣
	║ ║
	║ Input ──▶ Base Model ──▶ Hidden States (32 layers) ║
	║ │ ║
	║ ▼ ║
	║ ┌─────────────────┐ ║
	║ │ Risk Predictor │ ║
	║ │ (50K params) │ ║
	║ └────────┬────────┘ ║
	║ │ ║
	║ ▼ ║
	║ risk = 0.95 (HIGH) ║
	║ │ ║
	║ ▼ ║
	║ logits[recent_tokens] -= penalty ║
	║ │ ║
	║ ▼ ║
	║ Sample next token ║
	║ ║
	╚═══════════════════════════════════════════════════════════════╝
	```

	</div>

	---

	## 📊 Results

	### Risk Prediction Performance

	The system achieves 125x separation between tokens that will repeat and those that won't:

	\| Metric \| Value \|
	\|--------\|-------\|
	\| F1 Score \| 0.99+ \|
	\| Risk @ Repeating Tokens \| 0.998 \|
	\| Risk @ Non-Repeating Tokens \| 0.008 \|
	\| Separation Factor \| 125x \|

	### Generation Quality

	\| Metric \| Baseline \| With CF-HoT \| Change \|
	\|--------\|----------\|-------------\|--------\|
	\| Repetition Rate \| 33.9% \| 17.5% \| ↓ 48.4% \|
	\| Distinct-2 (diversity) \| 0.836 \| 0.976 \| ↑ 16.7% \|

	### Comparison to Standard Methods

	\| Method \| Adaptive \| Learned \| Repetition Reduction \|
	\|--------\|----------\|---------\|---------------------\|
	\| HuggingFace `repetition_penalty` \| ❌ \| ❌ \| ~20-30% \|
	\| OpenAI `frequency_penalty` \| ❌ \| ❌ \| ~25-35% \|
	\| Contrastive Decoding \| ❌ \| ❌ \| ~30-40% \|
	\| CF-HoT (this) \| ✅ \| ✅ \| 48.4% \|

	---

	## 🏗️ Architecture

	The risk predictor is remarkably small — only ~50,000 parameters (0.0006% of the base model):

	```python
	RiskPredictor(
	# Extract features from each transformer layer
	fiber_projs = ModuleList([
	Linear(4096 → 16) for _ in range(32) # 32 layers
	]),

	# Learn which layers matter most
	layer_weights = Parameter(shape=[32]), # Softmax-normalized

	# Predict repetition risk
	predictor = Sequential(
	Linear(16 → 64),
	GELU(),
	Linear(64 → 64),
	GELU(),
	Linear(64 → 1), # Risk logit
	)
	)
	```

	### Why It Works

	1. Hidden states contain predictive signal — The model "knows" it's about to repeat before it happens
	2. Different layers encode different information — Learned aggregation finds the most predictive layers
	3. Decode-time intervention preserves base model — No modification to attention patterns or learned representations

	---

	## 🚀 Quick Start

	### Installation

	```bash
	pip install transformers peft accelerate torch
	```

	### Loading the Models

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	# Load base model
	base_model = AutoModelForCausalLM.from_pretrained(
	"LoganResearch/ARC-Base-8B",
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained("LoganResearch/ARC-Base-8B")

	# Load CF-HoT adapter
	model = PeftModel.from_pretrained(
	base_model,
	"LoganResearch/Adaptive-Repetition-Controller"
	)

	# Load risk predictor
	risk_predictor = torch.load(
	hf_hub_download("LoganResearch/Adaptive-Repetition-Controller", "risk_predictor.pt")
	)
	```

	### Generation with CF-HoT Intervention

	```python
	def generate_with_cfhot(
	prompt: str,
	max_tokens: int = 512,
	penalty_scale: float = 3.0,
	threshold: float = 0.1,
	temperature: float = 0.8,
	rep_window: int = 32,
	):
	"""Generate text with adaptive repetition suppression."""

	input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)

	for _ in range(max_tokens):
	with torch.no_grad():
	# Forward pass with hidden states
	outputs = model(input_ids, output_hidden_states=True)
	logits = outputs.logits[:, -1, :]
	hidden_states = outputs.hidden_states

	# Predict repetition risk
	risk = risk_predictor(hidden_states).sigmoid().item()

	# Apply adaptive penalty if risk is high
	if risk > threshold:
	recent_tokens = input_ids[0, -rep_window:].tolist()
	penalty = risk * penalty_scale
	for token_id in set(recent_tokens):
	logits[0, token_id] -= penalty

	# Sample next token
	probs = torch.softmax(logits / temperature, dim=-1)
	next_token = torch.multinomial(probs, num_samples=1)

	# Append and check for EOS
	input_ids = torch.cat([input_ids, next_token], dim=-1)
	if next_token.item() == tokenizer.eos_token_id:
	break

	return tokenizer.decode(input_ids[0], skip_special_tokens=True)

	# Example usage
	response = generate_with_cfhot(
	"Write a detailed essay on the nature of consciousness:",
	max_tokens=1000,
	penalty_scale=4.0,
	)
	print(response)
	```

	---

	## 📁 Files

	\| File \| Size \| Description \|
	\|------\|------\|-------------\|
	\| `risk_predictor.pt` \| 8.4 MB \| Trained risk prediction network \|
	\| `adapter_model.safetensors` \| 218 MB \| LoRA adapter weights \|
	\| `adapter_config.json` \| 1 KB \| PEFT adapter configuration \|

	---

	## ⚙️ Training Details

	### Dataset & Objective

	- Dataset: WikiText-2
	- Task: Binary classification — "Will this token appear in the next 32 tokens?"
	- Loss: BCEWithLogitsLoss with dynamic class balancing

	### Hyperparameters

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| `d_fiber` \| 16 \|
	\| `d_control` \| 64 \|
	\| `rep_window` \| 32 \|
	\| `lr_predictor` \| 1e-4 \|
	\| `lr_lora` \| 2e-5 \|
	\| `batch_size` \| 4 \|
	\| `gradient_accumulation` \| 8 \|
	\| `optimal_checkpoint` \| Step 5000 \|

	### Training Progression

	\| Step \| F1 \| Risk @ Reps \| Risk @ Non-Reps \| Separation \|
	\|------\|-----\|-------------\|-----------------\|------------\|
	\| 3000 \| 0.96 \| 0.946 \| 0.076 \| 12x \|
	\| 4000 \| 0.99 \| 0.997 \| 0.014 \| 71x \|
	\| 5000 \| 0.99+ \| 0.998 \| 0.008 \| 125x ⭐ \|
	\| 6000 \| 0.99+ \| 0.999 \| 0.021 \| 48x \|

	Step 5000 is optimal — further training reduces separation due to overfitting.

	---

	## 🔬 Research Context

	### The Journey

	This system emerged from research into geometric approaches to semantic consistency. The original theory proposed using fiber bundles and holonomy to detect inconsistency in transformer representations.

	What we tried:
	1. ❌ Multiplicative attention gating — destroyed signal
	2. ❌ Log-space score modification — gates collapsed to uniform
	3. ❌ Normalized gating — NaN at inference
	4. ❌ Causal EMA — training/inference mismatch
	5. ❌ Extended training — complete collapse

	What worked:
	- ✅ Supervised risk prediction on explicit labels
	- ✅ Decode-time intervention (no attention modification)
	- ✅ Adaptive penalty based on predicted risk

	### What This Is (and Isn't)

	<table>
	<tr>
	<td width="50%">

	#### ✅ What It IS
	- Learned repetition penalty
	- Decode-time intervention
	- ~50K parameter predictor
	- 48% repetition reduction
	- Proof that hidden states predict degeneration

	</td>
	<td width="50%">

	#### ❌ What It's NOT
	- Full Lie Holonomy Transformer
	- Attention modification
	- Geometric computation
	- Validation of fiber bundle theory

	</td>
	</tr>
	</table>

	---

	## 📚 Citation

	```bibtex
	@misc{napolitano2026arc,
	author = {Napolitano, Logan Matthew},
	title = {Adaptive Repetition Controller: Learned Decode-Time Intervention
	for Repetition Suppression},
	year = {2026},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/LoganResearch/Adaptive-Repetition-Controller}},
	}
	```

	---

	## 🔗 Links

	\| Resource \| Link \|
	\|----------\|------\|
	\| Base Model \| [LoganResearch/ARC-Base-8B](https://huggingface.co/LoganResearch/ARC-Base-8B) \|
	\| Source Code \| [GitHub: HolonomyTransformer](https://github.com/Loganwins/HolonomyTransformer) \|
	\| Paper \| "The Übermensch Who Cannot Loop" (forthcoming) \|
	\| Author \| [Logan Matthew Napolitano](https://github.com/Loganwins) \|

	---

	<div align="center">

	The Übermensch who cannot loop is forced to CREATE.

	---

	Built with determination by [Logan Matthew Napolitano](https://github.com/Loganwins)

	</div>