ATLES Scratchpad System - Internal Thinking Workspace
Overview
The Scratchpad System gives ATLES an internal "thinking space" where it can draft, critique, and revise responses before sending them to the user. This significantly improves response quality without the user seeing the messy draft stages.
Key Features:
- Internal thinking (invisible to user)
- Multi-stage response generation: Draft β Critique β Revise β Send
- Self-critique capability
- Automatic archival for analysis and debugging
- Configurable thinking depth
Note: Unlike ATLAS (which has its own trainable model), ATLES uses external models like Qwen. The scratchpad here is purely for improving response quality, not for training data generation.
How It Works
User: "What is the capital of France?"
β
βββββββββββββββββββββββββββββββββββββββ
β INTERNAL THINKING (user can't see) β
βββββββββββββββββββββββββββββββββββββββ€
β Stage 1: Draft β
β "Paris is the capital." β
β β
β Stage 2: Self-Critique β
β "Too brief, add context" β
β β
β Stage 3: Revision β
β "Paris is the capital and largest β
β city of France, located on the β
β Seine River..." β
β β
β Stage 4: Final Check β
β β Complete β Accurate β Clear β
βββββββββββββββββββββββββββββββββββββββ
β
User sees: "Paris is the capital and largest city of France..."
Quick Start
Using the Thinking Client
from atles import create_thinking_constitutional_client
# Create client with thinking enabled
client = create_thinking_constitutional_client()
# Generate response (with internal thinking)
response = client.generate("llama3.2", "Explain quantum computing")
# User only sees the polished final response
print(response)
# Check thinking stats
stats = client.get_thinking_stats()
print(f"Thoughts recorded: {stats['num_thoughts']}")
Configuration
Edit config/scratchpad_config.yaml:
scratchpad:
enabled: true # Enable/disable thinking
mode: "every_response" # always think, or "complex_only"
thinking:
max_revisions: 2 # How many times to revise
critique_enabled: true # Enable self-critique
self_check_enabled: true # Final check before sending
Storage Structure
atles_memory/scratchpad/
βββ active/
β βββ session_20251107_143000.jsonl # Current session thoughts
βββ archive/
β βββ 2025-11-07/
β βββ session_001.jsonl # Archived sessions
β βββ key_thoughts.jsonl # Important patterns
β βββ summary.txt # Human-readable summary
βββ scratchpad.log # System logs
Thought Format
Each thought is stored as structured JSON:
{
"timestamp": "2025-11-07T14:30:00",
"user_input": "What is quantum computing?",
"thought_stages": {
"initial": {
"timestamp": "2025-11-07T14:30:01",
"data": {
"text": "Quantum computing uses quantum bits...",
"confidence": 0.7
}
},
"critique": {
"timestamp": "2025-11-07T14:30:02",
"data": {
"text": "Too technical, needs simpler explanation",
"needs_revision": true,
"issues": ["too technical", "no examples"]
}
},
"revision_1": {
"timestamp": "2025-11-07T14:30:03",
"data": {
"text": "Quantum computing is a new type of computing...",
"improvements": ["simpler language", "added examples"]
}
},
"final": {
"timestamp": "2025-11-07T14:30:04",
"data": {
"text": "Quantum computing...",
"ready": true
}
}
},
"is_key_thought": false,
"metadata": {
"response_time": 4.2,
"num_stages": 4
}
}
Key Thoughts
Not all thoughts are equally important. The system identifies "key thoughts":
Types of Key Thoughts
- Multiple Revisions - Response needed 2+ revisions (indicates complexity)
- User Correction - User corrected ATLES (important learning opportunity)
- Novel Solution - Creative/unexpected approach
- Error Recovery - ATLES caught and fixed its own error
Marking Key Thoughts
# Manually mark when user corrects ATLES
client.mark_user_correction("user_correction")
Daily Archival
The system automatically archives thoughts for analysis:
from atles import ScratchpadArchiver
archiver = ScratchpadArchiver()
# Archive yesterday's sessions
stats = archiver.archive_daily()
print(f"Archived {stats['sessions_archived']} sessions")
print(f"Found {stats['key_thoughts']} key thoughts")
# Get overall stats
stats = archiver.get_archive_stats()
print(f"Total dates: {stats['total_dates']}")
print(f"Total key thoughts: {stats['total_key_thoughts']}")
Performance Impact
Response Time
- Without thinking: ~1-2 seconds
- With thinking: ~3-6 seconds
- Trade-off: Slower but higher quality responses
Storage
- Per thought: ~1-5 KB
- Per session: ~100-500 KB (100-1000 thoughts)
- 30-day archive: ~30-150 MB
Configuration Options
Thinking Modes
- every_response (recommended) - Always think before responding
- complex_only - Only for complex queries (faster for simple questions)
- manual - Only when explicitly triggered
Thinking Depth
thinking:
max_revisions: 2 # 0 = no revision, 1-3 = increasing quality
critique_enabled: true # Self-critique stage
self_check_enabled: true # Final check stage
min_confidence: 0.8 # Skip revision if confidence > 0.8
Archival Settings
archival:
frequency: "daily" # Archive old sessions
keep_days: 30 # Keep last 30 days
extract_key_thoughts: true # Extract important patterns
create_summaries: true # Human-readable summaries
Use Cases
1. Improved Response Quality
ATLES can catch and fix errors before the user sees them.
2. Debugging
Review archived thoughts to understand why ATLES responded a certain way.
3. System Improvement
Analyze key thoughts to identify patterns and improvement areas.
4. User Feedback Analysis
Track user corrections to understand where ATLES needs improvement.
API Reference
ThinkingConstitutionalClient
class ThinkingConstitutionalClient(LightweightConstitutionalClient):
def generate(model: str, prompt: str, **kwargs) -> str
"""Generate response with internal thinking"""
def mark_user_correction(reason: str = "user_correction")
"""Mark last thought as key due to user correction"""
def get_thinking_stats() -> Dict
"""Get statistics about thinking process"""
Scratchpad
class Scratchpad:
def __init__(session_dir, archive_dir)
def start_thought(user_input: str)
def write_thought(stage: str, data: Dict)
def mark_key_thought(reason: str)
def finalize_thought()
def read_thoughts() -> List[Dict]
def get_key_thoughts() -> List[Dict]
def get_session_stats() -> Dict
ScratchpadArchiver
class ScratchpadArchiver:
def __init__(session_dir, archive_dir, keep_days)
def archive_daily(date: str = None) -> Dict
def extract_key_thoughts(date_dir: Path) -> List[Dict]
def get_archive_stats() -> Dict
Troubleshooting
Issue: Thinking is too slow
Solution:
- Set
mode: "complex_only"to skip thinking for simple requests - Reduce
max_revisionsfrom 2 to 1 - Disable
critique_enabledfor faster responses
Issue: Not seeing improvement in responses
Solution:
- Increase
max_revisionsto 2 or 3 - Ensure
critique_enabled: true - Check logs to verify thinking is actually happening
Issue: Storage growing too large
Solution:
- Reduce
keep_daysfrom 30 to 7 or 14 - Enable
compress_archives: true(future feature) - Manually delete old archives
Examples
Example 1: Simple Question
User: "Hi"
Thinking: SKIPPED (too simple)
Response: "Hello! How can I help you today?"
Example 2: Complex Question
User: "Explain how neural networks learn"
Internal Thinking:
1. Draft: "Neural networks learn using backpropagation..."
2. Critique: "Too technical, needs simpler explanation"
3. Revision: "Neural networks learn by adjusting connections..."
4. Final: "Neural networks learn similar to how humans do..."
User Sees: "Neural networks learn similar to how humans do..."
Example 3: User Correction
User: "What's 2+2?"
ATLES: "5"
User: "No, it's 4"
# Mark as key thought
client.mark_user_correction("user_correction")
# This helps identify calculation errors for future improvements
Future Enhancements
- Adaptive Thinking - Automatically adjust thinking depth based on question complexity
- Learning from Corrections - Use user corrections to improve future responses
- Parallel Thinking - Generate multiple draft responses and choose the best
- Confidence Scoring - Better estimate of response quality
Summary
The Scratchpad System gives ATLES the ability to "think before it speaks", resulting in:
- β Higher quality responses
- β Fewer errors
- β Better user experience
- β Improved debugging capabilities
- β Data for system improvements
Trade-off: Slightly slower responses for significantly better quality.