atles / docs /SCRATCHPAD_SYSTEM.md
spartan8806's picture
ATLES codebase - Source code only
99b8067

ATLES Scratchpad System - Internal Thinking Workspace

Overview

The Scratchpad System gives ATLES an internal "thinking space" where it can draft, critique, and revise responses before sending them to the user. This significantly improves response quality without the user seeing the messy draft stages.

Key Features:

  • Internal thinking (invisible to user)
  • Multi-stage response generation: Draft β†’ Critique β†’ Revise β†’ Send
  • Self-critique capability
  • Automatic archival for analysis and debugging
  • Configurable thinking depth

Note: Unlike ATLAS (which has its own trainable model), ATLES uses external models like Qwen. The scratchpad here is purely for improving response quality, not for training data generation.

How It Works

User: "What is the capital of France?"
    ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  INTERNAL THINKING (user can't see) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Stage 1: Draft                     β”‚
β”‚  "Paris is the capital."            β”‚
β”‚                                     β”‚
β”‚  Stage 2: Self-Critique             β”‚
β”‚  "Too brief, add context"           β”‚
β”‚                                     β”‚
β”‚  Stage 3: Revision                  β”‚
β”‚  "Paris is the capital and largest  β”‚
β”‚   city of France, located on the    β”‚
β”‚   Seine River..."                   β”‚
β”‚                                     β”‚
β”‚  Stage 4: Final Check               β”‚
β”‚  βœ“ Complete βœ“ Accurate βœ“ Clear     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    ↓
User sees: "Paris is the capital and largest city of France..."

Quick Start

Using the Thinking Client

from atles import create_thinking_constitutional_client

# Create client with thinking enabled
client = create_thinking_constitutional_client()

# Generate response (with internal thinking)
response = client.generate("llama3.2", "Explain quantum computing")

# User only sees the polished final response
print(response)

# Check thinking stats
stats = client.get_thinking_stats()
print(f"Thoughts recorded: {stats['num_thoughts']}")

Configuration

Edit config/scratchpad_config.yaml:

scratchpad:
  enabled: true  # Enable/disable thinking
  mode: "every_response"  # always think, or "complex_only"
  
  thinking:
    max_revisions: 2  # How many times to revise
    critique_enabled: true  # Enable self-critique
    self_check_enabled: true  # Final check before sending

Storage Structure

atles_memory/scratchpad/
β”œβ”€β”€ active/
β”‚   └── session_20251107_143000.jsonl  # Current session thoughts
β”œβ”€β”€ archive/
β”‚   └── 2025-11-07/
β”‚       β”œβ”€β”€ session_001.jsonl  # Archived sessions
β”‚       β”œβ”€β”€ key_thoughts.jsonl  # Important patterns
β”‚       └── summary.txt  # Human-readable summary
└── scratchpad.log  # System logs

Thought Format

Each thought is stored as structured JSON:

{
  "timestamp": "2025-11-07T14:30:00",
  "user_input": "What is quantum computing?",
  "thought_stages": {
    "initial": {
      "timestamp": "2025-11-07T14:30:01",
      "data": {
        "text": "Quantum computing uses quantum bits...",
        "confidence": 0.7
      }
    },
    "critique": {
      "timestamp": "2025-11-07T14:30:02",
      "data": {
        "text": "Too technical, needs simpler explanation",
        "needs_revision": true,
        "issues": ["too technical", "no examples"]
      }
    },
    "revision_1": {
      "timestamp": "2025-11-07T14:30:03",
      "data": {
        "text": "Quantum computing is a new type of computing...",
        "improvements": ["simpler language", "added examples"]
      }
    },
    "final": {
      "timestamp": "2025-11-07T14:30:04",
      "data": {
        "text": "Quantum computing...",
        "ready": true
      }
    }
  },
  "is_key_thought": false,
  "metadata": {
    "response_time": 4.2,
    "num_stages": 4
  }
}

Key Thoughts

Not all thoughts are equally important. The system identifies "key thoughts":

Types of Key Thoughts

  1. Multiple Revisions - Response needed 2+ revisions (indicates complexity)
  2. User Correction - User corrected ATLES (important learning opportunity)
  3. Novel Solution - Creative/unexpected approach
  4. Error Recovery - ATLES caught and fixed its own error

Marking Key Thoughts

# Manually mark when user corrects ATLES
client.mark_user_correction("user_correction")

Daily Archival

The system automatically archives thoughts for analysis:

from atles import ScratchpadArchiver

archiver = ScratchpadArchiver()

# Archive yesterday's sessions
stats = archiver.archive_daily()
print(f"Archived {stats['sessions_archived']} sessions")
print(f"Found {stats['key_thoughts']} key thoughts")

# Get overall stats
stats = archiver.get_archive_stats()
print(f"Total dates: {stats['total_dates']}")
print(f"Total key thoughts: {stats['total_key_thoughts']}")

Performance Impact

Response Time

  • Without thinking: ~1-2 seconds
  • With thinking: ~3-6 seconds
  • Trade-off: Slower but higher quality responses

Storage

  • Per thought: ~1-5 KB
  • Per session: ~100-500 KB (100-1000 thoughts)
  • 30-day archive: ~30-150 MB

Configuration Options

Thinking Modes

  1. every_response (recommended) - Always think before responding
  2. complex_only - Only for complex queries (faster for simple questions)
  3. manual - Only when explicitly triggered

Thinking Depth

thinking:
  max_revisions: 2  # 0 = no revision, 1-3 = increasing quality
  critique_enabled: true  # Self-critique stage
  self_check_enabled: true  # Final check stage
  min_confidence: 0.8  # Skip revision if confidence > 0.8

Archival Settings

archival:
  frequency: "daily"  # Archive old sessions
  keep_days: 30  # Keep last 30 days
  extract_key_thoughts: true  # Extract important patterns
  create_summaries: true  # Human-readable summaries

Use Cases

1. Improved Response Quality

ATLES can catch and fix errors before the user sees them.

2. Debugging

Review archived thoughts to understand why ATLES responded a certain way.

3. System Improvement

Analyze key thoughts to identify patterns and improvement areas.

4. User Feedback Analysis

Track user corrections to understand where ATLES needs improvement.

API Reference

ThinkingConstitutionalClient

class ThinkingConstitutionalClient(LightweightConstitutionalClient):
    def generate(model: str, prompt: str, **kwargs) -> str
        """Generate response with internal thinking"""
    
    def mark_user_correction(reason: str = "user_correction")
        """Mark last thought as key due to user correction"""
    
    def get_thinking_stats() -> Dict
        """Get statistics about thinking process"""

Scratchpad

class Scratchpad:
    def __init__(session_dir, archive_dir)
    def start_thought(user_input: str)
    def write_thought(stage: str, data: Dict)
    def mark_key_thought(reason: str)
    def finalize_thought()
    def read_thoughts() -> List[Dict]
    def get_key_thoughts() -> List[Dict]
    def get_session_stats() -> Dict

ScratchpadArchiver

class ScratchpadArchiver:
    def __init__(session_dir, archive_dir, keep_days)
    def archive_daily(date: str = None) -> Dict
    def extract_key_thoughts(date_dir: Path) -> List[Dict]
    def get_archive_stats() -> Dict

Troubleshooting

Issue: Thinking is too slow

Solution:

  • Set mode: "complex_only" to skip thinking for simple requests
  • Reduce max_revisions from 2 to 1
  • Disable critique_enabled for faster responses

Issue: Not seeing improvement in responses

Solution:

  • Increase max_revisions to 2 or 3
  • Ensure critique_enabled: true
  • Check logs to verify thinking is actually happening

Issue: Storage growing too large

Solution:

  • Reduce keep_days from 30 to 7 or 14
  • Enable compress_archives: true (future feature)
  • Manually delete old archives

Examples

Example 1: Simple Question

User: "Hi"
Thinking: SKIPPED (too simple)
Response: "Hello! How can I help you today?"

Example 2: Complex Question

User: "Explain how neural networks learn"

Internal Thinking:
1. Draft: "Neural networks learn using backpropagation..."
2. Critique: "Too technical, needs simpler explanation"
3. Revision: "Neural networks learn by adjusting connections..."
4. Final: "Neural networks learn similar to how humans do..."

User Sees: "Neural networks learn similar to how humans do..."

Example 3: User Correction

User: "What's 2+2?"
ATLES: "5"
User: "No, it's 4"

# Mark as key thought
client.mark_user_correction("user_correction")
# This helps identify calculation errors for future improvements

Future Enhancements

  1. Adaptive Thinking - Automatically adjust thinking depth based on question complexity
  2. Learning from Corrections - Use user corrections to improve future responses
  3. Parallel Thinking - Generate multiple draft responses and choose the best
  4. Confidence Scoring - Better estimate of response quality

Summary

The Scratchpad System gives ATLES the ability to "think before it speaks", resulting in:

  • βœ… Higher quality responses
  • βœ… Fewer errors
  • βœ… Better user experience
  • βœ… Improved debugging capabilities
  • βœ… Data for system improvements

Trade-off: Slightly slower responses for significantly better quality.