| # ATLES Scratchpad System - Internal Thinking Workspace | |
| ## Overview | |
| The Scratchpad System gives ATLES an internal "thinking space" where it can draft, critique, and revise responses **before** sending them to the user. This significantly improves response quality without the user seeing the messy draft stages. | |
| **Key Features:** | |
| - Internal thinking (invisible to user) | |
| - Multi-stage response generation: Draft → Critique → Revise → Send | |
| - Self-critique capability | |
| - Automatic archival for analysis and debugging | |
| - Configurable thinking depth | |
| **Note:** Unlike ATLAS (which has its own trainable model), ATLES uses external models like Qwen. The scratchpad here is purely for improving response quality, not for training data generation. | |
| ## How It Works | |
| ``` | |
| User: "What is the capital of France?" | |
| ↓ | |
| ┌─────────────────────────────────────┐ | |
| │ INTERNAL THINKING (user can't see) │ | |
| ├─────────────────────────────────────┤ | |
| │ Stage 1: Draft │ | |
| │ "Paris is the capital." │ | |
| │ │ | |
| │ Stage 2: Self-Critique │ | |
| │ "Too brief, add context" │ | |
| │ │ | |
| │ Stage 3: Revision │ | |
| │ "Paris is the capital and largest │ | |
| │ city of France, located on the │ | |
| │ Seine River..." │ | |
| │ │ | |
| │ Stage 4: Final Check │ | |
| │ ✓ Complete ✓ Accurate ✓ Clear │ | |
| └─────────────────────────────────────┘ | |
| ↓ | |
| User sees: "Paris is the capital and largest city of France..." | |
| ``` | |
| ## Quick Start | |
| ### Using the Thinking Client | |
| ```python | |
| from atles import create_thinking_constitutional_client | |
| # Create client with thinking enabled | |
| client = create_thinking_constitutional_client() | |
| # Generate response (with internal thinking) | |
| response = client.generate("llama3.2", "Explain quantum computing") | |
| # User only sees the polished final response | |
| print(response) | |
| # Check thinking stats | |
| stats = client.get_thinking_stats() | |
| print(f"Thoughts recorded: {stats['num_thoughts']}") | |
| ``` | |
| ### Configuration | |
| Edit `config/scratchpad_config.yaml`: | |
| ```yaml | |
| scratchpad: | |
| enabled: true # Enable/disable thinking | |
| mode: "every_response" # always think, or "complex_only" | |
| thinking: | |
| max_revisions: 2 # How many times to revise | |
| critique_enabled: true # Enable self-critique | |
| self_check_enabled: true # Final check before sending | |
| ``` | |
| ## Storage Structure | |
| ``` | |
| atles_memory/scratchpad/ | |
| ├── active/ | |
| │ └── session_20251107_143000.jsonl # Current session thoughts | |
| ├── archive/ | |
| │ └── 2025-11-07/ | |
| │ ├── session_001.jsonl # Archived sessions | |
| │ ├── key_thoughts.jsonl # Important patterns | |
| │ └── summary.txt # Human-readable summary | |
| └── scratchpad.log # System logs | |
| ``` | |
| ## Thought Format | |
| Each thought is stored as structured JSON: | |
| ```json | |
| { | |
| "timestamp": "2025-11-07T14:30:00", | |
| "user_input": "What is quantum computing?", | |
| "thought_stages": { | |
| "initial": { | |
| "timestamp": "2025-11-07T14:30:01", | |
| "data": { | |
| "text": "Quantum computing uses quantum bits...", | |
| "confidence": 0.7 | |
| } | |
| }, | |
| "critique": { | |
| "timestamp": "2025-11-07T14:30:02", | |
| "data": { | |
| "text": "Too technical, needs simpler explanation", | |
| "needs_revision": true, | |
| "issues": ["too technical", "no examples"] | |
| } | |
| }, | |
| "revision_1": { | |
| "timestamp": "2025-11-07T14:30:03", | |
| "data": { | |
| "text": "Quantum computing is a new type of computing...", | |
| "improvements": ["simpler language", "added examples"] | |
| } | |
| }, | |
| "final": { | |
| "timestamp": "2025-11-07T14:30:04", | |
| "data": { | |
| "text": "Quantum computing...", | |
| "ready": true | |
| } | |
| } | |
| }, | |
| "is_key_thought": false, | |
| "metadata": { | |
| "response_time": 4.2, | |
| "num_stages": 4 | |
| } | |
| } | |
| ``` | |
| ## Key Thoughts | |
| Not all thoughts are equally important. The system identifies "key thoughts": | |
| ### Types of Key Thoughts | |
| 1. **Multiple Revisions** - Response needed 2+ revisions (indicates complexity) | |
| 2. **User Correction** - User corrected ATLES (important learning opportunity) | |
| 3. **Novel Solution** - Creative/unexpected approach | |
| 4. **Error Recovery** - ATLES caught and fixed its own error | |
| ### Marking Key Thoughts | |
| ```python | |
| # Manually mark when user corrects ATLES | |
| client.mark_user_correction("user_correction") | |
| ``` | |
| ## Daily Archival | |
| The system automatically archives thoughts for analysis: | |
| ```python | |
| from atles import ScratchpadArchiver | |
| archiver = ScratchpadArchiver() | |
| # Archive yesterday's sessions | |
| stats = archiver.archive_daily() | |
| print(f"Archived {stats['sessions_archived']} sessions") | |
| print(f"Found {stats['key_thoughts']} key thoughts") | |
| # Get overall stats | |
| stats = archiver.get_archive_stats() | |
| print(f"Total dates: {stats['total_dates']}") | |
| print(f"Total key thoughts: {stats['total_key_thoughts']}") | |
| ``` | |
| ## Performance Impact | |
| ### Response Time | |
| - Without thinking: ~1-2 seconds | |
| - With thinking: ~3-6 seconds | |
| - Trade-off: Slower but higher quality responses | |
| ### Storage | |
| - Per thought: ~1-5 KB | |
| - Per session: ~100-500 KB (100-1000 thoughts) | |
| - 30-day archive: ~30-150 MB | |
| ## Configuration Options | |
| ### Thinking Modes | |
| 1. **every_response** (recommended) - Always think before responding | |
| 2. **complex_only** - Only for complex queries (faster for simple questions) | |
| 3. **manual** - Only when explicitly triggered | |
| ### Thinking Depth | |
| ```yaml | |
| thinking: | |
| max_revisions: 2 # 0 = no revision, 1-3 = increasing quality | |
| critique_enabled: true # Self-critique stage | |
| self_check_enabled: true # Final check stage | |
| min_confidence: 0.8 # Skip revision if confidence > 0.8 | |
| ``` | |
| ### Archival Settings | |
| ```yaml | |
| archival: | |
| frequency: "daily" # Archive old sessions | |
| keep_days: 30 # Keep last 30 days | |
| extract_key_thoughts: true # Extract important patterns | |
| create_summaries: true # Human-readable summaries | |
| ``` | |
| ## Use Cases | |
| ### 1. Improved Response Quality | |
| ATLES can catch and fix errors before the user sees them. | |
| ### 2. Debugging | |
| Review archived thoughts to understand why ATLES responded a certain way. | |
| ### 3. System Improvement | |
| Analyze key thoughts to identify patterns and improvement areas. | |
| ### 4. User Feedback Analysis | |
| Track user corrections to understand where ATLES needs improvement. | |
| ## API Reference | |
| ### ThinkingConstitutionalClient | |
| ```python | |
| class ThinkingConstitutionalClient(LightweightConstitutionalClient): | |
| def generate(model: str, prompt: str, **kwargs) -> str | |
| """Generate response with internal thinking""" | |
| def mark_user_correction(reason: str = "user_correction") | |
| """Mark last thought as key due to user correction""" | |
| def get_thinking_stats() -> Dict | |
| """Get statistics about thinking process""" | |
| ``` | |
| ### Scratchpad | |
| ```python | |
| class Scratchpad: | |
| def __init__(session_dir, archive_dir) | |
| def start_thought(user_input: str) | |
| def write_thought(stage: str, data: Dict) | |
| def mark_key_thought(reason: str) | |
| def finalize_thought() | |
| def read_thoughts() -> List[Dict] | |
| def get_key_thoughts() -> List[Dict] | |
| def get_session_stats() -> Dict | |
| ``` | |
| ### ScratchpadArchiver | |
| ```python | |
| class ScratchpadArchiver: | |
| def __init__(session_dir, archive_dir, keep_days) | |
| def archive_daily(date: str = None) -> Dict | |
| def extract_key_thoughts(date_dir: Path) -> List[Dict] | |
| def get_archive_stats() -> Dict | |
| ``` | |
| ## Troubleshooting | |
| ### Issue: Thinking is too slow | |
| **Solution:** | |
| - Set `mode: "complex_only"` to skip thinking for simple requests | |
| - Reduce `max_revisions` from 2 to 1 | |
| - Disable `critique_enabled` for faster responses | |
| ### Issue: Not seeing improvement in responses | |
| **Solution:** | |
| - Increase `max_revisions` to 2 or 3 | |
| - Ensure `critique_enabled: true` | |
| - Check logs to verify thinking is actually happening | |
| ### Issue: Storage growing too large | |
| **Solution:** | |
| - Reduce `keep_days` from 30 to 7 or 14 | |
| - Enable `compress_archives: true` (future feature) | |
| - Manually delete old archives | |
| ## Examples | |
| ### Example 1: Simple Question | |
| ``` | |
| User: "Hi" | |
| Thinking: SKIPPED (too simple) | |
| Response: "Hello! How can I help you today?" | |
| ``` | |
| ### Example 2: Complex Question | |
| ``` | |
| User: "Explain how neural networks learn" | |
| Internal Thinking: | |
| 1. Draft: "Neural networks learn using backpropagation..." | |
| 2. Critique: "Too technical, needs simpler explanation" | |
| 3. Revision: "Neural networks learn by adjusting connections..." | |
| 4. Final: "Neural networks learn similar to how humans do..." | |
| User Sees: "Neural networks learn similar to how humans do..." | |
| ``` | |
| ### Example 3: User Correction | |
| ``` | |
| User: "What's 2+2?" | |
| ATLES: "5" | |
| User: "No, it's 4" | |
| # Mark as key thought | |
| client.mark_user_correction("user_correction") | |
| # This helps identify calculation errors for future improvements | |
| ``` | |
| ## Future Enhancements | |
| 1. **Adaptive Thinking** - Automatically adjust thinking depth based on question complexity | |
| 2. **Learning from Corrections** - Use user corrections to improve future responses | |
| 3. **Parallel Thinking** - Generate multiple draft responses and choose the best | |
| 4. **Confidence Scoring** - Better estimate of response quality | |
| ## Summary | |
| The Scratchpad System gives ATLES the ability to "think before it speaks", resulting in: | |
| - ✅ Higher quality responses | |
| - ✅ Fewer errors | |
| - ✅ Better user experience | |
| - ✅ Improved debugging capabilities | |
| - ✅ Data for system improvements | |
| **Trade-off:** Slightly slower responses for significantly better quality. | |