atles

File size: 9,752 Bytes

99b8067

# ATLES Scratchpad System - Internal Thinking Workspace

## Overview

The Scratchpad System gives ATLES an internal "thinking space" where it can draft, critique, and revise responses **before** sending them to the user. This significantly improves response quality without the user seeing the messy draft stages.

**Key Features:**
- Internal thinking (invisible to user)
- Multi-stage response generation: Draft → Critique → Revise → Send
- Self-critique capability
- Automatic archival for analysis and debugging
- Configurable thinking depth

**Note:** Unlike ATLAS (which has its own trainable model), ATLES uses external models like Qwen. The scratchpad here is purely for improving response quality, not for training data generation.

## How It Works

```
User: "What is the capital of France?"
    ↓
┌─────────────────────────────────────┐
│  INTERNAL THINKING (user can't see) │
├─────────────────────────────────────┤
│  Stage 1: Draft                     │
│  "Paris is the capital."            │
│                                     │
│  Stage 2: Self-Critique             │
│  "Too brief, add context"           │
│                                     │
│  Stage 3: Revision                  │
│  "Paris is the capital and largest  │
│   city of France, located on the    │
│   Seine River..."                   │
│                                     │
│  Stage 4: Final Check               │
│  ✓ Complete ✓ Accurate ✓ Clear     │
└─────────────────────────────────────┘
    ↓
User sees: "Paris is the capital and largest city of France..."
```

## Quick Start

### Using the Thinking Client

```python
from atles import create_thinking_constitutional_client

# Create client with thinking enabled
client = create_thinking_constitutional_client()

# Generate response (with internal thinking)
response = client.generate("llama3.2", "Explain quantum computing")

# User only sees the polished final response
print(response)

# Check thinking stats
stats = client.get_thinking_stats()
print(f"Thoughts recorded: {stats['num_thoughts']}")
```

### Configuration

Edit `config/scratchpad_config.yaml`:

```yaml
scratchpad:
  enabled: true  # Enable/disable thinking
  mode: "every_response"  # always think, or "complex_only"
  
  thinking:
    max_revisions: 2  # How many times to revise
    critique_enabled: true  # Enable self-critique
    self_check_enabled: true  # Final check before sending
```

## Storage Structure

```
atles_memory/scratchpad/
├── active/
│   └── session_20251107_143000.jsonl  # Current session thoughts
├── archive/
│   └── 2025-11-07/
│       ├── session_001.jsonl  # Archived sessions
│       ├── key_thoughts.jsonl  # Important patterns
│       └── summary.txt  # Human-readable summary
└── scratchpad.log  # System logs
```

## Thought Format

Each thought is stored as structured JSON:

```json
{
  "timestamp": "2025-11-07T14:30:00",
  "user_input": "What is quantum computing?",
  "thought_stages": {
    "initial": {
      "timestamp": "2025-11-07T14:30:01",
      "data": {
        "text": "Quantum computing uses quantum bits...",
        "confidence": 0.7
      }
    },
    "critique": {
      "timestamp": "2025-11-07T14:30:02",
      "data": {
        "text": "Too technical, needs simpler explanation",
        "needs_revision": true,
        "issues": ["too technical", "no examples"]
      }
    },
    "revision_1": {
      "timestamp": "2025-11-07T14:30:03",
      "data": {
        "text": "Quantum computing is a new type of computing...",
        "improvements": ["simpler language", "added examples"]
      }
    },
    "final": {
      "timestamp": "2025-11-07T14:30:04",
      "data": {
        "text": "Quantum computing...",
        "ready": true
      }
    }
  },
  "is_key_thought": false,
  "metadata": {
    "response_time": 4.2,
    "num_stages": 4
  }
}
```

## Key Thoughts

Not all thoughts are equally important. The system identifies "key thoughts":

### Types of Key Thoughts

1. **Multiple Revisions** - Response needed 2+ revisions (indicates complexity)
2. **User Correction** - User corrected ATLES (important learning opportunity)
3. **Novel Solution** - Creative/unexpected approach
4. **Error Recovery** - ATLES caught and fixed its own error

### Marking Key Thoughts

```python
# Manually mark when user corrects ATLES
client.mark_user_correction("user_correction")
```

## Daily Archival

The system automatically archives thoughts for analysis:

```python
from atles import ScratchpadArchiver

archiver = ScratchpadArchiver()

# Archive yesterday's sessions
stats = archiver.archive_daily()
print(f"Archived {stats['sessions_archived']} sessions")
print(f"Found {stats['key_thoughts']} key thoughts")

# Get overall stats
stats = archiver.get_archive_stats()
print(f"Total dates: {stats['total_dates']}")
print(f"Total key thoughts: {stats['total_key_thoughts']}")
```

## Performance Impact

### Response Time
- Without thinking: ~1-2 seconds
- With thinking: ~3-6 seconds
- Trade-off: Slower but higher quality responses

### Storage
- Per thought: ~1-5 KB
- Per session: ~100-500 KB (100-1000 thoughts)
- 30-day archive: ~30-150 MB

## Configuration Options

### Thinking Modes

1. **every_response** (recommended) - Always think before responding
2. **complex_only** - Only for complex queries (faster for simple questions)
3. **manual** - Only when explicitly triggered

### Thinking Depth

```yaml
thinking:
  max_revisions: 2  # 0 = no revision, 1-3 = increasing quality
  critique_enabled: true  # Self-critique stage
  self_check_enabled: true  # Final check stage
  min_confidence: 0.8  # Skip revision if confidence > 0.8
```

### Archival Settings

```yaml
archival:
  frequency: "daily"  # Archive old sessions
  keep_days: 30  # Keep last 30 days
  extract_key_thoughts: true  # Extract important patterns
  create_summaries: true  # Human-readable summaries
```

## Use Cases

### 1. Improved Response Quality
ATLES can catch and fix errors before the user sees them.

### 2. Debugging
Review archived thoughts to understand why ATLES responded a certain way.

### 3. System Improvement
Analyze key thoughts to identify patterns and improvement areas.

### 4. User Feedback Analysis
Track user corrections to understand where ATLES needs improvement.

## API Reference

### ThinkingConstitutionalClient

```python
class ThinkingConstitutionalClient(LightweightConstitutionalClient):
    def generate(model: str, prompt: str, **kwargs) -> str
        """Generate response with internal thinking"""
    
    def mark_user_correction(reason: str = "user_correction")
        """Mark last thought as key due to user correction"""
    
    def get_thinking_stats() -> Dict
        """Get statistics about thinking process"""
```

### Scratchpad

```python
class Scratchpad:
    def __init__(session_dir, archive_dir)
    def start_thought(user_input: str)
    def write_thought(stage: str, data: Dict)
    def mark_key_thought(reason: str)
    def finalize_thought()
    def read_thoughts() -> List[Dict]
    def get_key_thoughts() -> List[Dict]
    def get_session_stats() -> Dict
```

### ScratchpadArchiver

```python
class ScratchpadArchiver:
    def __init__(session_dir, archive_dir, keep_days)
    def archive_daily(date: str = None) -> Dict
    def extract_key_thoughts(date_dir: Path) -> List[Dict]
    def get_archive_stats() -> Dict
```

## Troubleshooting

### Issue: Thinking is too slow

**Solution:** 
- Set `mode: "complex_only"` to skip thinking for simple requests
- Reduce `max_revisions` from 2 to 1
- Disable `critique_enabled` for faster responses

### Issue: Not seeing improvement in responses

**Solution:**
- Increase `max_revisions` to 2 or 3
- Ensure `critique_enabled: true`
- Check logs to verify thinking is actually happening

### Issue: Storage growing too large

**Solution:**
- Reduce `keep_days` from 30 to 7 or 14
- Enable `compress_archives: true` (future feature)
- Manually delete old archives

## Examples

### Example 1: Simple Question
```
User: "Hi"
Thinking: SKIPPED (too simple)
Response: "Hello! How can I help you today?"
```

### Example 2: Complex Question
```
User: "Explain how neural networks learn"

Internal Thinking:
1. Draft: "Neural networks learn using backpropagation..."
2. Critique: "Too technical, needs simpler explanation"
3. Revision: "Neural networks learn by adjusting connections..."
4. Final: "Neural networks learn similar to how humans do..."

User Sees: "Neural networks learn similar to how humans do..."
```

### Example 3: User Correction
```
User: "What's 2+2?"
ATLES: "5"
User: "No, it's 4"

# Mark as key thought
client.mark_user_correction("user_correction")
# This helps identify calculation errors for future improvements
```

## Future Enhancements

1. **Adaptive Thinking** - Automatically adjust thinking depth based on question complexity
2. **Learning from Corrections** - Use user corrections to improve future responses
3. **Parallel Thinking** - Generate multiple draft responses and choose the best
4. **Confidence Scoring** - Better estimate of response quality

## Summary

The Scratchpad System gives ATLES the ability to "think before it speaks", resulting in:
- ✅ Higher quality responses
- ✅ Fewer errors
- ✅ Better user experience
- ✅ Improved debugging capabilities
- ✅ Data for system improvements

**Trade-off:** Slightly slower responses for significantly better quality.