--- license: apache-2.0 language: - en library_name: peft pipeline_tag: text-generation tags: - repetition-suppression - decode-time-intervention - llama - lora - research - degeneration - cf-hot base_model: LoganResearch/ARC-Base-8B model-index: - name: Adaptive-Repetition-Controller results: - task: type: text-generation metrics: - name: Repetition Reduction type: custom value: 48.4% - name: Risk Separation type: custom value: 125x - name: F1 Score type: f1 value: 0.99 ---
# ⚑ Adaptive Repetition Controller ### *CF-HoT 125x β€” Learned Decode-Time Intervention* [![Separation](https://img.shields.io/badge/Risk_Separation-125x-brightgreen?style=for-the-badge)](.) [![Reduction](https://img.shields.io/badge/Repetition_Reduction-48.4%25-blue?style=for-the-badge)](.) [![F1](https://img.shields.io/badge/F1_Score-0.99+-purple?style=for-the-badge)](.) [![Params](https://img.shields.io/badge/Predictor-50K_params-orange?style=for-the-badge)](.) *A learned system that predicts and prevents repetitive degeneration in language models.* [Base Model](https://huggingface.co/LoganResearch/ARC-Base-8B) | [GitHub](https://github.com/Loganwins/HolonomyTransformer) | [Paper (forthcoming)]()
--- ## 🎯 The Problem Autoregressive language models suffer from **repetitive degeneration** β€” the tendency to fall into loops, repeat phrases, or get stuck on patterns during long-form generation. Standard solutions apply **uniform penalties** to repeated tokens. But repetition isn't always bad, and uniform penalties can't distinguish between: - Natural repetition (articles, pronouns, common words) - Problematic repetition (loops, stuck patterns, degeneration) ## πŸ’‘ The Solution The **Adaptive Repetition Controller** learns to **predict** when repetition is about to become problematic, then applies **targeted intervention** only when needed.
``` ╔═══════════════════════════════════════════════════════════════╗ β•‘ GENERATION PIPELINE β•‘ ╠═══════════════════════════════════════════════════════════════╣ β•‘ β•‘ β•‘ Input ──▢ Base Model ──▢ Hidden States (32 layers) β•‘ β•‘ β”‚ β•‘ β•‘ β–Ό β•‘ β•‘ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β•‘ β•‘ β”‚ Risk Predictor β”‚ β•‘ β•‘ β”‚ (50K params) β”‚ β•‘ β•‘ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β•‘ β•‘ β”‚ β•‘ β•‘ β–Ό β•‘ β•‘ risk = 0.95 (HIGH) β•‘ β•‘ β”‚ β•‘ β•‘ β–Ό β•‘ β•‘ logits[recent_tokens] -= penalty β•‘ β•‘ β”‚ β•‘ β•‘ β–Ό β•‘ β•‘ Sample next token β•‘ β•‘ β•‘ β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β• ```
--- ## πŸ“Š Results ### Risk Prediction Performance The system achieves **125x separation** between tokens that will repeat and those that won't: | Metric | Value | |--------|-------| | **F1 Score** | 0.99+ | | **Risk @ Repeating Tokens** | 0.998 | | **Risk @ Non-Repeating Tokens** | 0.008 | | **Separation Factor** | **125x** | ### Generation Quality | Metric | Baseline | With CF-HoT | Change | |--------|----------|-------------|--------| | Repetition Rate | 33.9% | 17.5% | **↓ 48.4%** | | Distinct-2 (diversity) | 0.836 | 0.976 | **↑ 16.7%** | ### Comparison to Standard Methods | Method | Adaptive | Learned | Repetition Reduction | |--------|----------|---------|---------------------| | HuggingFace `repetition_penalty` | ❌ | ❌ | ~20-30% | | OpenAI `frequency_penalty` | ❌ | ❌ | ~25-35% | | Contrastive Decoding | ❌ | ❌ | ~30-40% | | **CF-HoT (this)** | βœ… | βœ… | **48.4%** | --- ## πŸ—οΈ Architecture The risk predictor is remarkably small β€” only **~50,000 parameters** (0.0006% of the base model): ```python RiskPredictor( # Extract features from each transformer layer fiber_projs = ModuleList([ Linear(4096 β†’ 16) for _ in range(32) # 32 layers ]), # Learn which layers matter most layer_weights = Parameter(shape=[32]), # Softmax-normalized # Predict repetition risk predictor = Sequential( Linear(16 β†’ 64), GELU(), Linear(64 β†’ 64), GELU(), Linear(64 β†’ 1), # Risk logit ) ) ``` ### Why It Works 1. **Hidden states contain predictive signal** β€” The model "knows" it's about to repeat before it happens 2. **Different layers encode different information** β€” Learned aggregation finds the most predictive layers 3. **Decode-time intervention preserves base model** β€” No modification to attention patterns or learned representations --- ## πŸš€ Quick Start ### Installation ```bash pip install transformers peft accelerate torch ``` ### Loading the Models ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel # Load base model base_model = AutoModelForCausalLM.from_pretrained( "LoganResearch/ARC-Base-8B", torch_dtype=torch.bfloat16, device_map="auto" ) # Load tokenizer tokenizer = AutoTokenizer.from_pretrained("LoganResearch/ARC-Base-8B") # Load CF-HoT adapter model = PeftModel.from_pretrained( base_model, "LoganResearch/Adaptive-Repetition-Controller" ) # Load risk predictor risk_predictor = torch.load( hf_hub_download("LoganResearch/Adaptive-Repetition-Controller", "risk_predictor.pt") ) ``` ### Generation with CF-HoT Intervention ```python def generate_with_cfhot( prompt: str, max_tokens: int = 512, penalty_scale: float = 3.0, threshold: float = 0.1, temperature: float = 0.8, rep_window: int = 32, ): """Generate text with adaptive repetition suppression.""" input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device) for _ in range(max_tokens): with torch.no_grad(): # Forward pass with hidden states outputs = model(input_ids, output_hidden_states=True) logits = outputs.logits[:, -1, :] hidden_states = outputs.hidden_states # Predict repetition risk risk = risk_predictor(hidden_states).sigmoid().item() # Apply adaptive penalty if risk is high if risk > threshold: recent_tokens = input_ids[0, -rep_window:].tolist() penalty = risk * penalty_scale for token_id in set(recent_tokens): logits[0, token_id] -= penalty # Sample next token probs = torch.softmax(logits / temperature, dim=-1) next_token = torch.multinomial(probs, num_samples=1) # Append and check for EOS input_ids = torch.cat([input_ids, next_token], dim=-1) if next_token.item() == tokenizer.eos_token_id: break return tokenizer.decode(input_ids[0], skip_special_tokens=True) # Example usage response = generate_with_cfhot( "Write a detailed essay on the nature of consciousness:", max_tokens=1000, penalty_scale=4.0, ) print(response) ``` --- ## πŸ“ Files | File | Size | Description | |------|------|-------------| | `risk_predictor.pt` | 8.4 MB | Trained risk prediction network | | `adapter_model.safetensors` | 218 MB | LoRA adapter weights | | `adapter_config.json` | 1 KB | PEFT adapter configuration | --- ## βš™οΈ Training Details ### Dataset & Objective - **Dataset:** WikiText-2 - **Task:** Binary classification β€” "Will this token appear in the next 32 tokens?" - **Loss:** BCEWithLogitsLoss with dynamic class balancing ### Hyperparameters | Parameter | Value | |-----------|-------| | `d_fiber` | 16 | | `d_control` | 64 | | `rep_window` | 32 | | `lr_predictor` | 1e-4 | | `lr_lora` | 2e-5 | | `batch_size` | 4 | | `gradient_accumulation` | 8 | | `optimal_checkpoint` | Step 5000 | ### Training Progression | Step | F1 | Risk @ Reps | Risk @ Non-Reps | Separation | |------|-----|-------------|-----------------|------------| | 3000 | 0.96 | 0.946 | 0.076 | 12x | | 4000 | 0.99 | 0.997 | 0.014 | 71x | | **5000** | **0.99+** | **0.998** | **0.008** | **125x** ⭐ | | 6000 | 0.99+ | 0.999 | 0.021 | 48x | *Step 5000 is optimal β€” further training reduces separation due to overfitting.* --- ## πŸ”¬ Research Context ### The Journey This system emerged from research into geometric approaches to semantic consistency. The original theory proposed using **fiber bundles and holonomy** to detect inconsistency in transformer representations. **What we tried:** 1. ❌ Multiplicative attention gating β€” destroyed signal 2. ❌ Log-space score modification β€” gates collapsed to uniform 3. ❌ Normalized gating β€” NaN at inference 4. ❌ Causal EMA β€” training/inference mismatch 5. ❌ Extended training β€” complete collapse **What worked:** - βœ… Supervised risk prediction on explicit labels - βœ… Decode-time intervention (no attention modification) - βœ… Adaptive penalty based on predicted risk ### What This Is (and Isn't)
#### βœ… What It IS - Learned repetition penalty - Decode-time intervention - ~50K parameter predictor - 48% repetition reduction - Proof that hidden states predict degeneration #### ❌ What It's NOT - Full Lie Holonomy Transformer - Attention modification - Geometric computation - Validation of fiber bundle theory
--- ## πŸ“š Citation ```bibtex @misc{napolitano2026arc, author = {Napolitano, Logan Matthew}, title = {Adaptive Repetition Controller: Learned Decode-Time Intervention for Repetition Suppression}, year = {2026}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/LoganResearch/Adaptive-Repetition-Controller}}, } ``` --- ## πŸ”— Links | Resource | Link | |----------|------| | **Base Model** | [LoganResearch/ARC-Base-8B](https://huggingface.co/LoganResearch/ARC-Base-8B) | | **Source Code** | [GitHub: HolonomyTransformer](https://github.com/Loganwins/HolonomyTransformer) | | **Paper** | *"The Übermensch Who Cannot Loop"* (forthcoming) | | **Author** | [Logan Matthew Napolitano](https://github.com/Loganwins) | ---
**The Übermensch who cannot loop is forced to CREATE.** --- *Built with determination by [Logan Matthew Napolitano](https://github.com/Loganwins)*