LoganResearch
/

ARC-Base-8B-Condensed

@@ -6,115 +6,720 @@ library_name: transformers
 pipeline_tag: text-generation
 tags:
 - llama
-- dense
-- self-improvement
-- cf-hot
 - representation-engineering
 base_model: NousResearch/Hermes-3-Llama-3.1-8B
-model-index:
-- name: ARC-Base-8B-Condensed
-  results:
-  - task:
-      type: text-generation
-    metrics:
-    - name: Information Density
-      type: custom
-      value: 28.5
-    - name: Token Reduction
-      type: custom
-      value: 57%
 ---
-# ARC-Base-8B-Condensed
-An 8B language model optimized for **information density** and **stable self-improvement**.
-## Features
-- **CF-HoT 125×**: Repetition detection with 125× class separation
-- **Dense Responses**: 70% improvement in information density
-- **Stable RSI**: Recursive self-improvement with automatic rollback
-- **Full Agentic Stack**: Browser, email, code execution
 ## Quick Start
 ```bash
 git clone https://huggingface.co/LoganResearch/ARC-Base-8B-Condensed
 cd ARC-Base-8B-Condensed
-pip install -r requirements.txt
 python arc_engine_v21_multimedia.py
 ```
-**Requires Python 3.11** (3.13 has diffusers compatibility issues)
-## Usage
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model = AutoModelForCausalLM.from_pretrained(
-    "LoganResearch/ARC-Base-8B-Condensed",
-    torch_dtype=torch.bfloat16,
-    device_map="auto"
-)
-tokenizer = AutoTokenizer.from_pretrained("LoganResearch/ARC-Base-8B-Condensed")
-prompt = "<|im_start|>user\nWhat is recursion?<|im_end|>\n<|im_start|>assistant\n"
-output = model.generate(tokenizer(prompt, return_tensors="pt").input_ids.cuda(), max_new_tokens=100)
-print(tokenizer.decode(output[0]))
-# Output: "Function calls itself until base case. Stack frames accumulate, unwind."
 ```
-## Key Commands
 | Command | Description |
 |---------|-------------|
-| `!improve` | Run self-improvement loop |
-| `!eval` | Evaluate model quality |
-| `!cfhot` | Toggle 125× repetition head |
-| `!rsi15` | 15-iteration stress test |
-| `!book` | Extended generation mode |
-| `!stream` | Live token visualization |
-## Metrics
-| Metric | Base | ARC | Change |
-|--------|------|-----|--------|
-| Information Density | 17.0 | 28.5 | +67% |
-| Avg Tokens | 150 | 65 | -57% |
-| CF-HoT Separation | - | 125× | - |
-## Architecture
-Built on [Hermes-3-Llama-3.1-8B](https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B) with:
-1. **CF-HoT Heads**: Multi-head predictors on hidden states for behavior control
-2. **CONDENSATOR Training**: SFT → DPO → RL pipeline for density
-3. **RSI Loop**: Evaluate → Train → Compare → Keep/Rollback
-## Requirements
 ```
-torch>=2.0
-transformers>=4.40.0
-accelerate
-peft
-bitsandbytes
 ```
-See `requirements.txt` for full list.
 ## Citation
 ```bibtex
-@software{arc_engine_2025,
-  title = {ARC-Base-8B-Condensed: Dense Self-Improving Language Model},
-  author = {Napolitano, Logan Matthew},
-  year = {2025},
-  url = {https://huggingface.co/LoganResearch/ARC-Base-8B-Condensed}
 }
 ```
 ## License
-CC BY 4.0

 pipeline_tag: text-generation
 tags:
 - llama
+- dense-responses
+- self-optimization
 - representation-engineering
+- cf-hot
+- recursive-self-improvement
 base_model: NousResearch/Hermes-3-Llama-3.1-8B
 ---
+![ARC Banner](banner.svg)
+# ARC Engine v2.1: Adaptive Recursive Cognition
+A comprehensive framework for stable recursive self-improvement of language models, featuring real-time behavioral control through hidden-state monitoring and multi-modal output capabilities.
+**Author:** Logan Matthew Napolitano
+**Base Model:** NousResearch/Hermes-3-Llama-3.1-8B
+**License:** CC BY 4.0
+**Engine:** 6,861 lines | **Weights:** ~16 GB
+---
+## Table of Contents
+1. [Quick Start](#quick-start)
+2. [What's New in v2.1](#whats-new-in-v21)
+3. [Core Technology](#core-technology)
+4. [Empirical Results](#empirical-results)
+5. [Command Reference](#command-reference)
+6. [Installation](#installation)
+7. [Configuration](#configuration)
+8. [Repository Structure](#repository-structure)
+9. [Hardware Requirements](#hardware-requirements)
+10. [Training From Scratch](#training-from-scratch)
+11. [API Reference](#api-reference)
+12. [Limitations](#limitations)
+13. [Citation](#citation)
+---
 ## Quick Start
 ```bash
+# Clone repository
 git clone https://huggingface.co/LoganResearch/ARC-Base-8B-Condensed
 cd ARC-Base-8B-Condensed
+# Install dependencies (minimal)
+pip install torch transformers peft bitsandbytes accelerate
+# Run the engine
 python arc_engine_v21_multimedia.py
 ```
+On first run, the engine will:
+1. Load the base model and DENSE adapter
+2. Initialize the CF-HoT 125× repetition detection head
+3. Set up the quality evaluation system
+4. Present an interactive command prompt
+```
+===========================================================================
+🤖 ARC ENGINE v2.1 - Adaptive Recursive Cognition
+===========================================================================
+    DENSE Mode:      ON (CONDENSATOR checkpoint)
+    CF-HoT Control:  ON
+    CF-HoT 125×:     ON
+    Stream Window:   ON
+    Image Gen:       ON
+    TTS Audio:       ON
+===========================================================================
+> hello
+Hello. How can I help?
+[Quality: 0.82 | Density: 12.4 | Coherence: 0.91 | Tokens: 5]
 ```
+---
+## What's New in v2.1
+### Multimedia Features
 | Command | Description |
 |---------|-------------|
+| `!stream` | Opens a live GUI window displaying tokens as they generate in real-time |
+| `!imagine <prompt>` | Generate images using Stable Diffusion XL |
+| `!dalle <prompt>` | Generate images using DALL-E 3 API |
+| `!audio` | Toggle text-to-speech output |
+| `!say <text>` | Speak text immediately using TTS |
+### Claude Integration
+| Command | Description |
+|---------|-------------|
+| `!idea <request>` | Generate extensive ideas using Claude API |
+| `!idea <request> --deep` | Generate 30 detailed ideas with implementation plans |
+| `!claude <prompt>` | Direct prompting to Claude Opus 4.5 |
+| `!expand <idea>` | Expand a specific idea into a comprehensive plan |
+### Extended Generation
+| Command | Description |
+|---------|-------------|
+| `!book` | Toggle book mode (16,384 token limit) |
+| `!write <topic>` | Generate complete books with chapters |
+### Advanced RSI Testing
+| Command | Description |
+|---------|-------------|
+| `!rsi15` | Run 15-iteration stress test with full logging |
+| `!cfhot` / `!125x` | Toggle CF-HoT 125× head on/off at runtime |
+### Utilities
+| Command | Description |
+|---------|-------------|
+| `!plot` | Generate quality history visualization |
+| `!benchmark` | Run comprehensive evaluation suite |
+| `!export [name]` | Package checkpoint for sharing |
+| `!import <path>` | Import checkpoint package |
+| `!learn` | Extract high-quality responses for training |
+| `!api` | Start REST API server on port 8080 |
+---
+## Core Technology
+### 1. CF-HoT: Contrastive Fine-tuning with Hidden-state Oversight Training
+CF-HoT enables real-time behavioral control by monitoring the model's internal representations and intervening before problematic tokens are generated.
+**Key Innovation:** The repetition detection head achieves 125× class separation, meaning it can reliably distinguish "about to repeat" states from normal generation states in the model's hidden layers.
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    CF-HoT Architecture                       │
+├─────────────────────────────────────────────────────────────┤
+│                                                             │
+│   Hidden States (Layer 16-24)                               │
+│         │                                                   │
+│         ▼                                                   │
+│   ┌─────────────┐                                          │
+│   │ Fiber       │  Compress to d=16 per layer              │
+│   │ Projection  │                                          │
+│   └─────────────┘                                          │
+│         │                                                   │
+│         ▼                                                   │
+│   ┌─────────────┐                                          │
+│   │ Layer       │  Weighted aggregation                    │
+│   │ Attention   │                                          │
+│   └─────────────┘                                          │
+│         │                                                   │
+│         ▼                                                   │
+│   ┌─────────────┐                                          │
+│   │ Risk        │  Binary classifier                       │
+│   │ Predictor   │  Output: P(repetition)                   │
+│   └─────────────┘                                          │
+│         │                                                   │
+│         ▼                                                   │
+│   If P > threshold: Apply logit penalties                   │
+│                                                             │
+└─────────────────────────────────────────────────────────────┘
+```
+**Training Process:**
+1. Collect positive samples (repetitive generations) and negative samples (clean generations)
+2. Extract hidden states from layers 16-24 at each token position
+3. Train binary classifier to predict repetition risk
+4. Deploy at inference time for real-time intervention
+### 2. THE CONDENSATOR: Dense Response Training Pipeline
+A 4-stage training pipeline that teaches the model to communicate with maximum information density.
 ```
+┌─────────────────────────────────────────────────────────────┐
+│                  THE CONDENSATOR Pipeline                    │
+├─────────────────────────────────────────────────────────────┤
+│                                                             │
+│  Stage 1: Supervised Fine-Tuning (SFT)                      │
+│  ─────────────────────────────────────                      │
+│  • 53 gold-standard dense response examples                 │
+│  • Learning rate: 2e-5                                      │
+│  • Loss: 1.17 → 0.72 (39% reduction)                       │
+│                                                             │
+│  Stage 2: Direct Preference Optimization (DPO)              │
+│  ─────────────────────────────────────────────              │
+│  • Preference pairs: dense response > verbose response      │
+│  • Beta: 0.1                                                │
+│  • Teaches relative quality judgments                       │
+│                                                             │
+│  Stage 3: Reinforcement Learning (PPO)                      │
+│  ─────────────────────────────────────                      │
+│  • Reward = density_score - filler_penalty - length_penalty │
+│  • Conservative KL constraint                               │
+│  • 300 optimization steps                                   │
+│                                                             │
+│  Stage 4: Checkpointing                                     │
+│  ─────────────────────────                                  │
+│  • Save every 25 training steps                             │
+│  • Maintain best checkpoint for rollback                    │
+│  • A/B comparison on held-out prompts                       │
+│                                                             │
+└─────────────────────────────────────────────────────────────┘
+```
+### 3. Stable Recursive Self-Improvement
+The self-improvement loop includes multiple safeguards to prevent quality degradation:
+```
+┌─────────────────────────────────────────────────────────────┐
+│              Stable Self-Improvement Loop                    │
+├─────────────────────────────────────────────────────────────┤
+│                                                             │
+│                    ┌──────────┐                             │
+│                    │  START   │                             │
+│                    └────┬─────┘                             │
+│                         │                                   │
+│                         ▼                                   │
+│                 ┌───────────────┐                           │
+│                 │   EVALUATE    │                           │
+│                 │ Current Model │                           │
+│                 └───────┬───────┘                           │
+│                         │                                   │
+│              ┌──────────┴──────────┐                        │
+│              │                     │                        │
+│              ▼                     ▼                        │
+│     Quality >= Target?      Quality < Minimum?              │
+│              │                     │                        │
+│         Yes  │                Yes  │                        │
+│              ▼                     ▼                        │
+│         ┌────────┐          ┌──────────┐                    │
+│         │  DONE  │          │ ROLLBACK │                    │
+│         └────────┘          └──────────┘                    │
+│              │                     │                        │
+│          No  │                 No  │                        │
+│              └──────────┬──────────┘                        │
+│                         │                                   │
+│                         ▼                                   │
+│                 ┌───────────────┐                           │
+│                 │    TRAIN      │                           │
+│                 │  (25 steps)   │                           │
+│                 └───────┬───────┘                           │
+│                         │                                   │
+│                         ▼                                   │
+│                 ┌───────────────┐                           │
+│                 │  A/B COMPARE  │                           │
+│                 │ Old vs New    │                           │
+│                 └───────┬───────┘                           │
+│                         │                                   │
+│              ┌──────────┴──────────┐                        │
+│              │                     │                        │
+│         Better?               Worse?                        │
+│              │                     │                        │
+│              ▼                     ▼                        │
+│         ┌────────┐          ┌──────────┐                    │
+│         │  KEEP  │          │ ROLLBACK │                    │
+│         │  New   │          │ to Best  │                    │
+│         └────┬───┘          └────┬─────┘                    │
+│              │                   │                          │
+│              └─────────┬─────────┘                          │
+│                        │                                    │
+│                        ▼                                    │
+│                 (Return to EVALUATE)                        │
+│                                                             │
+└─────────────────────────────────────────────────────────────┘
+```
+**Safeguards:**
+| Safeguard | Description |
+|-----------|-------------|
+| Multi-metric evaluation | Density (25%) + Coherence (25%) + Helpfulness (25%) + Penalties (25%) |
+| Gibberish detection | Pattern matching for math soup, terminal escape sequences, repetitive tokens |
+| Automatic rollback | Reverts to best checkpoint if quality drops > 0.05 |
+| Conservative training | Learning rate 2e-6, only 25 steps per iteration |
+| Emergency stop | Halts after 3 consecutive rollbacks or coherence < 0.3 |
+---
+## Empirical Results
+### CF-HoT Head Performance
+| Head Type | Positive Score | Negative Score | Separation Ratio |
+|-----------|---------------|----------------|------------------|
+| Repetition | 0.875 | 0.007 | **125×** |
+| Verbosity | 0.68 | 0.32 | 2.1× |
+| Hedging | 0.58 | 0.39 | 1.5× |
+The 125× separation for repetition detection is the key empirical finding. This indicates that the model encodes behavioral intent in its hidden states before generating tokens, and this signal is strong enough to enable reliable intervention.
+### Response Quality Improvement
+| Metric | Baseline | ARC Engine | Change |
+|--------|----------|------------|--------|
+| Information Density | 17.0 | 28.5 | **+68%** |
+| Average Response Tokens | 150 | 65 | **-57%** |
+| Filler Phrase Count | High | ~0 | **-95%** |
+| Mode Collapse Events | Frequent | Zero | **Prevented** |
+### Response Examples
+| Prompt | Base Model Response | ARC Engine Response |
+|--------|--------------------|--------------------|
+| "hello" | "Hello! I'm here to help you with any questions or tasks you might have. Feel free to ask me anything!" (23 tokens) | "Hello. How can I help?" (5 tokens) |
+| "What is recursion?" | "That's a great question! Recursion is a programming concept where a function calls itself to solve a problem by breaking it down into smaller subproblems..." (150+ tokens) | "Function calling itself until base case. Stack frames accumulate, unwind on return." (12 tokens) |
+| "How are you?" | "As an AI, I don't have feelings in the traditional sense, but I'm functioning well and ready to assist you with whatever you need!" (28 tokens) | "Operational. Ready to assist." (4 tokens) |
+### RSI-15 Stress Test Results
+The RSI-15 test runs 15 consecutive self-improvement iterations to verify stability:
+| Metric | Value |
+|--------|-------|
+| Iterations Completed | 15/15 |
+| Successful Improvements | 8 |
+| Rollbacks Triggered | 4 |
+| Marginal (kept) | 3 |
+| Initial Quality | 0.52 |
+| Final Quality | 0.71 |
+| Peak Quality | 0.73 |
+| Emergency Stops | 0 |
+---
+## Command Reference
+### Self-Improvement Commands
+```
+!improve              Run one iteration of self-improvement
+!eval                 Evaluate current model quality
+!train <N>            Run N training steps (default: 25)
+!compare              Compare current checkpoint vs best
+!rollback             Revert to best checkpoint
+!load <path>          Load specific checkpoint
+!rsi15                Run 15-iteration stress test
+```
+### CF-HoT Control
+```
+!cfhot                Toggle 125× head on/off
+!125x                 Alias for !cfhot
+!cfhot status         Show head status and intervention count
+```
+### Multimedia
+```
+!stream               Open live token streaming window
+!stream off           Close streaming window
+!audio                Toggle text-to-speech
+!audio voices         List available TTS voices
+!audio voice <N>      Select voice by index
+!audio rate <N>       Set speech rate (default: 175)
+!say <text>           Speak text immediately
+!imagine <prompt>     Generate image with SDXL
+!dalle <prompt>       Generate image with DALL-E 3
+!image view           View last generated image
+!image view <path>    View image from file
+```
+### Claude Integration
+```
+!idea <request>       Generate ideas (default: 20 ideas)
+!idea <req> --quick   Generate 5 quick ideas
+!idea <req> --deep    Generate 30 detailed ideas
+!expand <idea>        Expand idea into full plan
+!claude <prompt>      Direct Claude prompt
+!claude <p> --opus    Use Opus 4.5 specifically
+```
+### Extended Generation
+```
+!book                 Toggle book mode (16K tokens)
+!write <topic>        Write complete book
+```
+### Agentic Tools
+```
+!shell <cmd>          Execute shell command
+!python <code>        Execute Python code
+!read <path>          Read file contents
+!write <path> <text>  Write to file
+!ls [path]            List directory
+!web <query>          Web search
+```
+### Browser Automation
+```
+!browse <url>         Open URL in browser
+!click <selector>     Click element
+!type <text>          Type into focused element
+!fill <sel> <text>    Fill specific element
+!login <service>      Login to service (gmail, twitter, etc.)
+!close                Close browser
+```
+### Utilities
+```
+!plot                 Generate quality history plot
+!benchmark            Run evaluation suite
+!export [name]        Export checkpoint package
+!import <path>        Import checkpoint package
+!learn                Learn from high-quality responses
+!api                  Start REST API server
+status                Show system status
+history               Show quality history
+help                  Display help
+quit                  Exit with final report
+```
+---
+## Installation
+### Minimal Installation (Core Features)
+```bash
+pip install torch transformers peft bitsandbytes accelerate safetensors
 ```
+### Full Installation (All Features)
+```bash
+pip install -r requirements.txt
+playwright install firefox  # For browser automation
+```
+### Optional Dependencies
+| Feature | Package | Install Command |
+|---------|---------|-----------------|
+| Image Generation (SDXL) | diffusers | `pip install diffusers` |
+| Image Generation (DALL-E) | openai | `pip install openai` |
+| Text-to-Speech | pyttsx3, gtts, pygame | `pip install pyttsx3 gtts pygame` |
+| Claude Integration | anthropic | `pip install anthropic` |
+| Vector Memory | chromadb, sentence-transformers | `pip install chromadb sentence-transformers` |
+| Plotting | matplotlib | `pip install matplotlib` |
+| Browser Automation | playwright | `pip install playwright` |
+**Note:** Python 3.11 is recommended. Python 3.13 has compatibility issues with diffusers.
+---
+## Configuration
+### Environment Variables
+```bash
+# Claude API (for !idea, !claude commands)
+export ANTHROPIC_API_KEY="sk-ant-..."
+# OpenAI API (for !dalle command)
+export OPENAI_API_KEY="sk-..."
+```
+### Config Class Options
+Edit in `arc_engine_v21_multimedia.py`:
+```python
+class Config:
+    # Generation
+    temperature = 0.85
+    top_p = 0.9
+    max_new_tokens = 512
+    # CF-HoT
+    use_cfhot = True
+    use_cfhot_125x = True
+    cfhot_repetition_threshold = 0.6
+    cfhot_repetition_penalty = 6.0
+    # Self-improvement
+    min_quality_score = 0.5
+    target_quality_score = 0.75
+    training_steps_per_iteration = 25
+    quality_drop_threshold = 0.1
+    # Book mode
+    book_mode = False
+    book_max_tokens = 16384
+    # API server
+    api_port = 8080
+```
+---
+## Repository Structure
+```
+ARC-Base-8B-Condensed/
+│
+├── arc_engine_v21_multimedia.py    # Main engine (6,861 lines)
+├── requirements.txt                 # Full dependencies
+├── requirements_minimal.txt         # Core dependencies only
+│
+├── training_scripts/
+│   ├── the_condensator.py          # 4-stage dense training
+│   ├── train_cfhot_head.py         # CF-HoT head training
+│   ├── train_self_improve.py       # Self-improvement loop
+│   └── quickstart.py               # One-command trainer
+│
+├── dense_checkpoints/
+│   ├── step_100/                   # Initial checkpoint
+│   ├── step_200/                   # After iteration 1
+│   └── step_300/                   # After iteration 2
+│
+├── cfhot_checkpoints/
+│   └── ckpt_5000/                  # 125× repetition head
+│       └── risk_predictor.pt
+│
+├── multi_head_checkpoints/
+│   ├── hedging_head/
+│   ├── verbosity_head/
+│   └── sycophancy_head/
+│
+├── paper/
+│   └── arc_paper.pdf               # Research paper
+│
+├── books/                          # Generated books output
+├── images/                         # Generated images output
+├── ideas/                          # Generated ideas output
+├── improvement_logs/               # RSI logs and results
+└── exports/                        # Checkpoint packages
+```
+---
+## Hardware Requirements
+| Component | Minimum | Recommended |
+|-----------|---------|-------------|
+| GPU VRAM | 16 GB | 24 GB |
+| System RAM | 32 GB | 64 GB |
+| Disk Space | 50 GB | 100 GB |
+| Python | 3.10+ | 3.11 |
+**Tested Configuration:** NVIDIA RTX 3090 (24GB), 64GB RAM, Ubuntu 22.04
+**Inference Performance:**
+- ~15 tokens/second with CF-HoT enabled
+- ~20 tokens/second with CF-HoT disabled
+---
+## Training From Scratch
+### Quick Start (Automated)
+```bash
+python training_scripts/quickstart.py --full
+```
+This runs the complete pipeline (~4 hours on RTX 3090):
+1. CF-HoT head training (5000 steps)
+2. CONDENSATOR dense training (3 epochs SFT + 300 RL steps)
+3. Self-improvement loop (5 iterations)
+### Manual Training
+**Step 1: Train CF-HoT Heads**
+```bash
+python training_scripts/train_cfhot_head.py \
+    --behavior repetition \
+    --steps 5000 \
+    --batch-size 16 \
+    --learning-rate 1e-4
+```
+**Step 2: Run CONDENSATOR Pipeline**
+```bash
+python training_scripts/the_condensator.py \
+    --sft-epochs 3 \
+    --dpo-epochs 1 \
+    --rl-steps 300 \
+    --checkpoint-every 25
+```
+**Step 3: Self-Improvement Loop**
+```bash
+python training_scripts/train_self_improve.py \
+    --iterations 5 \
+    --target-quality 0.75 \
+    --rollback-threshold 0.05
+```
+---
+## API Reference
+Start the API server:
+```bash
+> !api
+[api] Server running on http://0.0.0.0:8080
+```
+### Endpoints
+**POST /generate**
+```bash
+curl -X POST http://localhost:8080/generate \
+  -H "Content-Type: application/json" \
+  -d '{"prompt": "What is recursion?"}'
+```
+Response:
+```json
+{
+  "response": "Function calling itself until base case. Stack frames accumulate, unwind on return.",
+  "quality": 0.82,
+  "tokens": 12
+}
+```
+**POST /status**
+```bash
+curl -X POST http://localhost:8080/status
+```
+Response:
+```json
+{
+  "quality": 0.71,
+  "iteration": 5,
+  "checkpoint": "dense_checkpoints/step_300"
+}
+```
+**GET /health**
+```bash
+curl http://localhost:8080/health
+```
+---
+## Limitations
+| Limitation | Description |
+|------------|-------------|
+| Scale | Tested on 8B parameters only; larger models may behave differently |
+| Language | English only; other languages untested |
+| Iterations | 5-15 stable iterations demonstrated; long-term stability unknown |
+| Evaluation | Heuristic metrics without formal human evaluation study |
+| Scope | Bounded self-optimization within defined metrics; not open-ended self-improvement |
+| SDXL | Requires Python 3.11 (incompatible with Python 3.13) |
+| Memory | Full features require 24GB VRAM; minimal mode works with 16GB |
+---
 ## Citation
 ```bibtex
+@software{napolitano2025arc,
+  title={ARC: Adaptive Recursive Cognition via Contrastive Hidden-State Control},
+  author={Napolitano, Logan Matthew},
+  year={2025},
+  url={https://huggingface.co/LoganResearch/ARC-Base-8B-Condensed},
+  license={CC-BY-4.0}
 }
 ```
+---
+## References
+1. Zou, A., et al. (2023). Representation Engineering: A Top-Down Approach to AI Transparency. arXiv:2310.01405
+2. Ouyang, L., et al. (2022). Training language models to follow instructions with human feedback. NeurIPS.
+3. Rafailov, R., et al. (2023). Direct Preference Optimization: Your Language Model is Secretly a Reward Model. arXiv:2305.18290
+4. Hu, E. J., et al. (2021). LoRA: Low-Rank Adaptation of Large Language Models. arXiv:2106.09685
+5. Dettmers, T., et al. (2023). QLoRA: Efficient Finetuning of Quantized LLMs. arXiv:2305.14314
+---
+## Acknowledgments
+- **NousResearch** for Hermes-3-Llama-3.1-8B base model
+- **Meta AI** for Llama 3.1 architecture
+- **Hugging Face** for transformers, PEFT, TRL, and Accelerate
+- **Stability AI** for Stable Diffusion XL
+- **Anthropic** for Claude API
+---
 ## License
+This project is licensed under **CC BY 4.0** (Creative Commons Attribution 4.0 International).
+You are free to:
+- **Share** — copy and redistribute the material in any medium or format
+- **Adapt** — remix, transform, and build upon the material for any purpose, including commercial
+Under the following terms:
+- **Attribution** — You must give appropriate credit, provide a link to the license, and indicate if changes were made.
+---
+*"Stable self-improvement through hidden-state control."*