atles / docs /SCRATCHPAD_SYSTEM.md

ATLES codebase - Source code only

99b8067 3 months ago

9.75 kB

	# ATLES Scratchpad System - Internal Thinking Workspace

	## Overview

	The Scratchpad System gives ATLES an internal "thinking space" where it can draft, critique, and revise responses before sending them to the user. This significantly improves response quality without the user seeing the messy draft stages.

	Key Features:
	- Internal thinking (invisible to user)
	- Multi-stage response generation: Draft → Critique → Revise → Send
	- Self-critique capability
	- Automatic archival for analysis and debugging
	- Configurable thinking depth

	Note: Unlike ATLAS (which has its own trainable model), ATLES uses external models like Qwen. The scratchpad here is purely for improving response quality, not for training data generation.

	## How It Works

	```
	User: "What is the capital of France?"
	↓
	┌─────────────────────────────────────┐
	│ INTERNAL THINKING (user can't see) │
	├─────────────────────────────────────┤
	│ Stage 1: Draft │
	│ "Paris is the capital." │
	│ │
	│ Stage 2: Self-Critique │
	│ "Too brief, add context" │
	│ │
	│ Stage 3: Revision │
	│ "Paris is the capital and largest │
	│ city of France, located on the │
	│ Seine River..." │
	│ │
	│ Stage 4: Final Check │
	│ ✓ Complete ✓ Accurate ✓ Clear │
	└─────────────────────────────────────┘
	↓
	User sees: "Paris is the capital and largest city of France..."
	```

	## Quick Start

	### Using the Thinking Client

	```python
	from atles import create_thinking_constitutional_client

	# Create client with thinking enabled
	client = create_thinking_constitutional_client()

	# Generate response (with internal thinking)
	response = client.generate("llama3.2", "Explain quantum computing")

	# User only sees the polished final response
	print(response)

	# Check thinking stats
	stats = client.get_thinking_stats()
	print(f"Thoughts recorded: {stats['num_thoughts']}")
	```

	### Configuration

	Edit `config/scratchpad_config.yaml`:

	```yaml
	scratchpad:
	enabled: true # Enable/disable thinking
	mode: "every_response" # always think, or "complex_only"

	thinking:
	max_revisions: 2 # How many times to revise
	critique_enabled: true # Enable self-critique
	self_check_enabled: true # Final check before sending
	```

	## Storage Structure

	```
	atles_memory/scratchpad/
	├── active/
	│ └── session_20251107_143000.jsonl # Current session thoughts
	├── archive/
	│ └── 2025-11-07/
	│ ├── session_001.jsonl # Archived sessions
	│ ├── key_thoughts.jsonl # Important patterns
	│ └── summary.txt # Human-readable summary
	└── scratchpad.log # System logs
	```

	## Thought Format

	Each thought is stored as structured JSON:

	```json
	{
	"timestamp": "2025-11-07T14:30:00",
	"user_input": "What is quantum computing?",
	"thought_stages": {
	"initial": {
	"timestamp": "2025-11-07T14:30:01",
	"data": {
	"text": "Quantum computing uses quantum bits...",
	"confidence": 0.7
	}
	},
	"critique": {
	"timestamp": "2025-11-07T14:30:02",
	"data": {
	"text": "Too technical, needs simpler explanation",
	"needs_revision": true,
	"issues": ["too technical", "no examples"]
	}
	},
	"revision_1": {
	"timestamp": "2025-11-07T14:30:03",
	"data": {
	"text": "Quantum computing is a new type of computing...",
	"improvements": ["simpler language", "added examples"]
	}
	},
	"final": {
	"timestamp": "2025-11-07T14:30:04",
	"data": {
	"text": "Quantum computing...",
	"ready": true
	}
	}
	},
	"is_key_thought": false,
	"metadata": {
	"response_time": 4.2,
	"num_stages": 4
	}
	}
	```

	## Key Thoughts

	Not all thoughts are equally important. The system identifies "key thoughts":

	### Types of Key Thoughts

	1. Multiple Revisions - Response needed 2+ revisions (indicates complexity)
	2. User Correction - User corrected ATLES (important learning opportunity)
	3. Novel Solution - Creative/unexpected approach
	4. Error Recovery - ATLES caught and fixed its own error

	### Marking Key Thoughts

	```python
	# Manually mark when user corrects ATLES
	client.mark_user_correction("user_correction")
	```

	## Daily Archival

	The system automatically archives thoughts for analysis:

	```python
	from atles import ScratchpadArchiver

	archiver = ScratchpadArchiver()

	# Archive yesterday's sessions
	stats = archiver.archive_daily()
	print(f"Archived {stats['sessions_archived']} sessions")
	print(f"Found {stats['key_thoughts']} key thoughts")

	# Get overall stats
	stats = archiver.get_archive_stats()
	print(f"Total dates: {stats['total_dates']}")
	print(f"Total key thoughts: {stats['total_key_thoughts']}")
	```

	## Performance Impact

	### Response Time
	- Without thinking: ~1-2 seconds
	- With thinking: ~3-6 seconds
	- Trade-off: Slower but higher quality responses

	### Storage
	- Per thought: ~1-5 KB
	- Per session: ~100-500 KB (100-1000 thoughts)
	- 30-day archive: ~30-150 MB

	## Configuration Options

	### Thinking Modes

	1. every_response (recommended) - Always think before responding
	2. complex_only - Only for complex queries (faster for simple questions)
	3. manual - Only when explicitly triggered

	### Thinking Depth

	```yaml
	thinking:
	max_revisions: 2 # 0 = no revision, 1-3 = increasing quality
	critique_enabled: true # Self-critique stage
	self_check_enabled: true # Final check stage
	min_confidence: 0.8 # Skip revision if confidence > 0.8
	```

	### Archival Settings

	```yaml
	archival:
	frequency: "daily" # Archive old sessions
	keep_days: 30 # Keep last 30 days
	extract_key_thoughts: true # Extract important patterns
	create_summaries: true # Human-readable summaries
	```

	## Use Cases

	### 1. Improved Response Quality
	ATLES can catch and fix errors before the user sees them.

	### 2. Debugging
	Review archived thoughts to understand why ATLES responded a certain way.

	### 3. System Improvement
	Analyze key thoughts to identify patterns and improvement areas.

	### 4. User Feedback Analysis
	Track user corrections to understand where ATLES needs improvement.

	## API Reference

	### ThinkingConstitutionalClient

	```python
	class ThinkingConstitutionalClient(LightweightConstitutionalClient):
	def generate(model: str, prompt: str, **kwargs) -> str
	"""Generate response with internal thinking"""

	def mark_user_correction(reason: str = "user_correction")
	"""Mark last thought as key due to user correction"""

	def get_thinking_stats() -> Dict
	"""Get statistics about thinking process"""
	```

	### Scratchpad

	```python
	class Scratchpad:
	def __init__(session_dir, archive_dir)
	def start_thought(user_input: str)
	def write_thought(stage: str, data: Dict)
	def mark_key_thought(reason: str)
	def finalize_thought()
	def read_thoughts() -> List[Dict]
	def get_key_thoughts() -> List[Dict]
	def get_session_stats() -> Dict
	```

	### ScratchpadArchiver

	```python
	class ScratchpadArchiver:
	def __init__(session_dir, archive_dir, keep_days)
	def archive_daily(date: str = None) -> Dict
	def extract_key_thoughts(date_dir: Path) -> List[Dict]
	def get_archive_stats() -> Dict
	```

	## Troubleshooting

	### Issue: Thinking is too slow

	Solution:
	- Set `mode: "complex_only"` to skip thinking for simple requests
	- Reduce `max_revisions` from 2 to 1
	- Disable `critique_enabled` for faster responses

	### Issue: Not seeing improvement in responses

	Solution:
	- Increase `max_revisions` to 2 or 3
	- Ensure `critique_enabled: true`
	- Check logs to verify thinking is actually happening

	### Issue: Storage growing too large

	Solution:
	- Reduce `keep_days` from 30 to 7 or 14
	- Enable `compress_archives: true` (future feature)
	- Manually delete old archives

	## Examples

	### Example 1: Simple Question
	```
	User: "Hi"
	Thinking: SKIPPED (too simple)
	Response: "Hello! How can I help you today?"
	```

	### Example 2: Complex Question
	```
	User: "Explain how neural networks learn"

	Internal Thinking:
	1. Draft: "Neural networks learn using backpropagation..."
	2. Critique: "Too technical, needs simpler explanation"
	3. Revision: "Neural networks learn by adjusting connections..."
	4. Final: "Neural networks learn similar to how humans do..."

	User Sees: "Neural networks learn similar to how humans do..."
	```

	### Example 3: User Correction
	```
	User: "What's 2+2?"
	ATLES: "5"
	User: "No, it's 4"

	# Mark as key thought
	client.mark_user_correction("user_correction")
	# This helps identify calculation errors for future improvements
	```

	## Future Enhancements

	1. Adaptive Thinking - Automatically adjust thinking depth based on question complexity
	2. Learning from Corrections - Use user corrections to improve future responses
	3. Parallel Thinking - Generate multiple draft responses and choose the best
	4. Confidence Scoring - Better estimate of response quality

	## Summary

	The Scratchpad System gives ATLES the ability to "think before it speaks", resulting in:
	- ✅ Higher quality responses
	- ✅ Fewer errors
	- ✅ Better user experience
	- ✅ Improved debugging capabilities
	- ✅ Data for system improvements

	Trade-off: Slightly slower responses for significantly better quality.