Upload 6 files
Browse files- DEPLOYMENT.md +290 -0
- Dockerfile +62 -0
- README.md +200 -5
- app.py +416 -0
- recursive_context.py +326 -0
- requirements.txt +20 -0
DEPLOYMENT.md
ADDED
|
@@ -0,0 +1,290 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Deployment Guide: Clawdbot to HuggingFace Spaces
|
| 2 |
+
|
| 3 |
+
## Quick Start (5 minutes)
|
| 4 |
+
|
| 5 |
+
### Step 1: Create HuggingFace Account
|
| 6 |
+
1. Go to https://huggingface.co
|
| 7 |
+
2. Sign up (free tier available)
|
| 8 |
+
3. Generate API token:
|
| 9 |
+
- Settings → Access Tokens
|
| 10 |
+
- Create "Read" token
|
| 11 |
+
- Copy token (you'll need it)
|
| 12 |
+
|
| 13 |
+
### Step 2: Create New Space
|
| 14 |
+
1. Click "+ New" → "Space"
|
| 15 |
+
2. Configure:
|
| 16 |
+
- **Space name:** `clawdbot-dev` (or your choice)
|
| 17 |
+
- **License:** MIT
|
| 18 |
+
- **SDK:** Docker
|
| 19 |
+
- **Hardware:** CPU Basic (free) or upgrade for faster inference
|
| 20 |
+
3. Click "Create Space"
|
| 21 |
+
|
| 22 |
+
### Step 3: Upload Files
|
| 23 |
+
Upload these files to your Space:
|
| 24 |
+
- `app.py`
|
| 25 |
+
- `recursive_context.py`
|
| 26 |
+
- `Dockerfile`
|
| 27 |
+
- `requirements.txt`
|
| 28 |
+
- `README.md`
|
| 29 |
+
- `.gitignore`
|
| 30 |
+
|
| 31 |
+
**Via Git (Recommended):**
|
| 32 |
+
```bash
|
| 33 |
+
# Clone your new Space
|
| 34 |
+
git clone https://huggingface.co/spaces/your-username/clawdbot-dev
|
| 35 |
+
cd clawdbot-dev
|
| 36 |
+
|
| 37 |
+
# Copy all files from this directory
|
| 38 |
+
cp /path/to/clawdbot-dev/* .
|
| 39 |
+
|
| 40 |
+
# Commit and push
|
| 41 |
+
git add .
|
| 42 |
+
git commit -m "Initial deployment of Clawdbot"
|
| 43 |
+
git push
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
**Via Web Interface:**
|
| 47 |
+
- Click "Files" tab
|
| 48 |
+
- Click "Add file" → "Upload files"
|
| 49 |
+
- Drag and drop all files
|
| 50 |
+
- Commit changes
|
| 51 |
+
|
| 52 |
+
### Step 4: Configure Secrets
|
| 53 |
+
1. Go to Space Settings → Repository Secrets
|
| 54 |
+
2. Add secrets:
|
| 55 |
+
```
|
| 56 |
+
Name: HF_TOKEN
|
| 57 |
+
Value: [your HuggingFace API token from Step 1]
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
Optional - if you have E-T Systems on GitHub:
|
| 61 |
+
```
|
| 62 |
+
Name: REPO_URL
|
| 63 |
+
Value: https://github.com/your-username/e-t-systems
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
### Step 5: Wait for Build
|
| 67 |
+
- Space will automatically build (takes ~5-10 minutes)
|
| 68 |
+
- Watch "Logs" tab for progress
|
| 69 |
+
- Build complete when you see: "Running on local URL: http://0.0.0.0:7860"
|
| 70 |
+
|
| 71 |
+
### Step 6: Access Your Assistant
|
| 72 |
+
- Click "App" tab
|
| 73 |
+
- Your Clawdbot is live!
|
| 74 |
+
- Access from iPhone browser: `https://your-username-clawdbot-dev.hf.space`
|
| 75 |
+
|
| 76 |
+
## Troubleshooting
|
| 77 |
+
|
| 78 |
+
### Build Fails
|
| 79 |
+
**Check logs for:**
|
| 80 |
+
- Missing dependencies → Verify requirements.txt
|
| 81 |
+
- Docker errors → Check Dockerfile syntax
|
| 82 |
+
- Out of memory → Upgrade to paid tier or reduce context size
|
| 83 |
+
|
| 84 |
+
**Common fixes:**
|
| 85 |
+
```bash
|
| 86 |
+
# View build logs
|
| 87 |
+
# Settings → Logs
|
| 88 |
+
|
| 89 |
+
# Restart build
|
| 90 |
+
# Settings → Factory Reboot
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
### No Repository Access
|
| 94 |
+
**If you see "No files indexed":**
|
| 95 |
+
|
| 96 |
+
1. **Option A: Mount via Secret**
|
| 97 |
+
- Add `REPO_URL` secret with your GitHub repo
|
| 98 |
+
- Restart Space
|
| 99 |
+
- Repository will be cloned on startup
|
| 100 |
+
|
| 101 |
+
2. **Option B: Direct Upload**
|
| 102 |
+
```bash
|
| 103 |
+
# In your Space's git clone
|
| 104 |
+
mkdir -p workspace/e-t-systems
|
| 105 |
+
cp -r /path/to/your/e-t-systems/* workspace/e-t-systems/
|
| 106 |
+
git add workspace/
|
| 107 |
+
git commit -m "Add E-T Systems codebase"
|
| 108 |
+
git push
|
| 109 |
+
```
|
| 110 |
+
|
| 111 |
+
3. **Option C: Demo Mode**
|
| 112 |
+
- Space creates minimal demo structure
|
| 113 |
+
- Upload files via chat interface
|
| 114 |
+
- Good for testing
|
| 115 |
+
|
| 116 |
+
### Slow Responses
|
| 117 |
+
**Qwen2.5-Coder-32B on free tier has cold starts.**
|
| 118 |
+
|
| 119 |
+
Solutions:
|
| 120 |
+
- Upgrade to GPU (paid tier) for faster inference
|
| 121 |
+
- Switch to smaller model (edit app.py):
|
| 122 |
+
```python
|
| 123 |
+
client = InferenceClient(
|
| 124 |
+
model="bigcode/starcoder2-15b", # Smaller, faster
|
| 125 |
+
token=os.getenv("HF_TOKEN")
|
| 126 |
+
)
|
| 127 |
+
```
|
| 128 |
+
- Use HF Pro subscription for priority access
|
| 129 |
+
|
| 130 |
+
### Rate Limits
|
| 131 |
+
**Free tier has inference limits.**
|
| 132 |
+
|
| 133 |
+
Solutions:
|
| 134 |
+
- Upgrade to HF Pro ($9/month)
|
| 135 |
+
- Add delays between requests
|
| 136 |
+
- Use local model (requires GPU tier)
|
| 137 |
+
|
| 138 |
+
## Advanced Configuration
|
| 139 |
+
|
| 140 |
+
### Custom Model
|
| 141 |
+
Edit `app.py` line 20:
|
| 142 |
+
```python
|
| 143 |
+
client = InferenceClient(
|
| 144 |
+
model="YOUR_MODEL_HERE", # e.g., "codellama/CodeLlama-34b-Instruct-hf"
|
| 145 |
+
token=os.getenv("HF_TOKEN")
|
| 146 |
+
)
|
| 147 |
+
```
|
| 148 |
+
|
| 149 |
+
### Adjust Recursion Depth
|
| 150 |
+
Edit `app.py` line 121:
|
| 151 |
+
```python
|
| 152 |
+
max_iterations = 10 # Increase for more complex queries
|
| 153 |
+
```
|
| 154 |
+
|
| 155 |
+
### Add New Tools
|
| 156 |
+
In `recursive_context.py`, add method:
|
| 157 |
+
```python
|
| 158 |
+
def your_new_tool(self, arg1, arg2):
|
| 159 |
+
"""Your tool description."""
|
| 160 |
+
# Implementation
|
| 161 |
+
return result
|
| 162 |
+
```
|
| 163 |
+
|
| 164 |
+
Then in `app.py`, add to TOOLS list:
|
| 165 |
+
```python
|
| 166 |
+
{
|
| 167 |
+
"type": "function",
|
| 168 |
+
"function": {
|
| 169 |
+
"name": "your_new_tool",
|
| 170 |
+
"description": "What it does",
|
| 171 |
+
"parameters": {
|
| 172 |
+
# Parameter schema
|
| 173 |
+
}
|
| 174 |
+
}
|
| 175 |
+
}
|
| 176 |
+
```
|
| 177 |
+
|
| 178 |
+
And add to execute_tool():
|
| 179 |
+
```python
|
| 180 |
+
elif tool_name == "your_new_tool":
|
| 181 |
+
return ctx.your_new_tool(arguments['arg1'], arguments['arg2'])
|
| 182 |
+
```
|
| 183 |
+
|
| 184 |
+
## Cost Optimization
|
| 185 |
+
|
| 186 |
+
### Free Tier Strategy
|
| 187 |
+
- Use CPU Basic (free)
|
| 188 |
+
- HF Inference free tier (rate limited)
|
| 189 |
+
- Only index essential files
|
| 190 |
+
- **Total: $0/month**
|
| 191 |
+
|
| 192 |
+
### Minimal Paid Tier
|
| 193 |
+
- CPU Basic (free)
|
| 194 |
+
- HF Pro subscription ($9/month)
|
| 195 |
+
- Unlimited inference
|
| 196 |
+
- **Total: $9/month**
|
| 197 |
+
|
| 198 |
+
### Performance Tier
|
| 199 |
+
- GPU T4 Small ($0.60/hour, pause when not using)
|
| 200 |
+
- HF Pro ($9/month)
|
| 201 |
+
- Fast inference, local models
|
| 202 |
+
- **Total: ~$15-30/month** depending on usage
|
| 203 |
+
|
| 204 |
+
## iPhone Access
|
| 205 |
+
|
| 206 |
+
### Bookmark for Easy Access
|
| 207 |
+
1. Open Space URL in Safari
|
| 208 |
+
2. Tap Share → Add to Home Screen
|
| 209 |
+
3. Now appears as app icon
|
| 210 |
+
|
| 211 |
+
### Shortcuts Integration
|
| 212 |
+
Create iOS Shortcut:
|
| 213 |
+
```
|
| 214 |
+
1. Get text from input
|
| 215 |
+
2. Get contents of URL:
|
| 216 |
+
https://your-username-clawdbot-dev.hf.space/api/chat
|
| 217 |
+
Method: POST
|
| 218 |
+
Body: {"message": [text from step 1]}
|
| 219 |
+
3. Show result
|
| 220 |
+
```
|
| 221 |
+
|
| 222 |
+
## Monitoring
|
| 223 |
+
|
| 224 |
+
### Check Health
|
| 225 |
+
```
|
| 226 |
+
https://your-username-clawdbot-dev.hf.space/health
|
| 227 |
+
```
|
| 228 |
+
|
| 229 |
+
### View Logs
|
| 230 |
+
- Settings → Logs (real-time)
|
| 231 |
+
- Download for analysis
|
| 232 |
+
|
| 233 |
+
### Stats
|
| 234 |
+
- Check "Context Info" panel in UI
|
| 235 |
+
- Shows files indexed, model status
|
| 236 |
+
|
| 237 |
+
## Updates
|
| 238 |
+
|
| 239 |
+
### Update Code
|
| 240 |
+
```bash
|
| 241 |
+
cd clawdbot-dev
|
| 242 |
+
# Make changes
|
| 243 |
+
git add .
|
| 244 |
+
git commit -m "Update: [what changed]"
|
| 245 |
+
git push
|
| 246 |
+
# Space rebuilds automatically
|
| 247 |
+
```
|
| 248 |
+
|
| 249 |
+
### Update Dependencies
|
| 250 |
+
Edit requirements.txt, commit, push.
|
| 251 |
+
|
| 252 |
+
### Update Repository
|
| 253 |
+
If using REPO_URL secret:
|
| 254 |
+
- Space pulls latest on restart
|
| 255 |
+
- Or: Settings → Factory Reboot
|
| 256 |
+
|
| 257 |
+
## Security
|
| 258 |
+
|
| 259 |
+
### Secrets Management
|
| 260 |
+
- Never commit API tokens
|
| 261 |
+
- Use Space secrets only
|
| 262 |
+
- Rotate tokens periodically
|
| 263 |
+
|
| 264 |
+
### Access Control
|
| 265 |
+
- Spaces are public by default
|
| 266 |
+
- For private: Settings → Change visibility to "Private"
|
| 267 |
+
- Requires HF Pro subscription
|
| 268 |
+
|
| 269 |
+
## Support Resources
|
| 270 |
+
|
| 271 |
+
- **HuggingFace Docs:** https://huggingface.co/docs/hub/spaces
|
| 272 |
+
- **Gradio Docs:** https://www.gradio.app/docs
|
| 273 |
+
- **Issues:** Post in Space "Community" tab
|
| 274 |
+
|
| 275 |
+
## Next Steps
|
| 276 |
+
|
| 277 |
+
1. ✅ Deploy Space
|
| 278 |
+
2. ✅ Test with simple queries
|
| 279 |
+
3. ✅ Upload your E-T Systems code
|
| 280 |
+
4. ✅ Try coding requests
|
| 281 |
+
5. 🎯 Integrate with E-T Systems workflow
|
| 282 |
+
6. 🎯 Add custom tools for your needs
|
| 283 |
+
7. 🎯 Connect to Observatory API
|
| 284 |
+
8. 🎯 Enable autonomous coding
|
| 285 |
+
|
| 286 |
+
---
|
| 287 |
+
|
| 288 |
+
Need help? Check Space logs or create discussion in Community tab.
|
| 289 |
+
|
| 290 |
+
Happy coding! 🦞
|
Dockerfile
ADDED
|
@@ -0,0 +1,62 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Dockerfile for Clawdbot Dev Assistant on HuggingFace Spaces
|
| 2 |
+
#
|
| 3 |
+
# CHANGELOG [2025-01-28 - Josh]
|
| 4 |
+
# Created containerized deployment for HF Spaces
|
| 5 |
+
#
|
| 6 |
+
# FEATURES:
|
| 7 |
+
# - Python 3.11 for Gradio
|
| 8 |
+
# - ChromaDB for vector search
|
| 9 |
+
# - Git for repo cloning
|
| 10 |
+
# - Optimized layer caching
|
| 11 |
+
|
| 12 |
+
FROM python:3.11-slim
|
| 13 |
+
|
| 14 |
+
# Set working directory
|
| 15 |
+
WORKDIR /app
|
| 16 |
+
|
| 17 |
+
# Install system dependencies
|
| 18 |
+
RUN apt-get update && apt-get install -y \
|
| 19 |
+
git \
|
| 20 |
+
build-essential \
|
| 21 |
+
curl \
|
| 22 |
+
&& rm -rf /var/lib/apt/lists/*
|
| 23 |
+
|
| 24 |
+
# Copy requirements first (for layer caching)
|
| 25 |
+
COPY requirements.txt .
|
| 26 |
+
|
| 27 |
+
# Install Python dependencies
|
| 28 |
+
RUN pip install --no-cache-dir -r requirements.txt
|
| 29 |
+
|
| 30 |
+
# Create workspace directory for repository
|
| 31 |
+
RUN mkdir -p /workspace
|
| 32 |
+
|
| 33 |
+
# Clone E-T Systems repository (if URL provided via build arg)
|
| 34 |
+
ARG REPO_URL=""
|
| 35 |
+
RUN if [ -n "$REPO_URL" ]; then \
|
| 36 |
+
git clone $REPO_URL /workspace/e-t-systems; \
|
| 37 |
+
else \
|
| 38 |
+
mkdir -p /workspace/e-t-systems && \
|
| 39 |
+
echo "# E-T Systems" > /workspace/e-t-systems/README.md && \
|
| 40 |
+
echo "Repository will be cloned on first run or mounted via Space secrets."; \
|
| 41 |
+
fi
|
| 42 |
+
|
| 43 |
+
# Copy application code
|
| 44 |
+
COPY recursive_context.py .
|
| 45 |
+
COPY app.py .
|
| 46 |
+
|
| 47 |
+
# Create directory for ChromaDB persistence
|
| 48 |
+
RUN mkdir -p /workspace/chroma_db
|
| 49 |
+
|
| 50 |
+
# Expose port for Gradio (HF Spaces uses 7860)
|
| 51 |
+
EXPOSE 7860
|
| 52 |
+
|
| 53 |
+
# Set environment variables
|
| 54 |
+
ENV PYTHONUNBUFFERED=1
|
| 55 |
+
ENV REPO_PATH=/workspace/e-t-systems
|
| 56 |
+
|
| 57 |
+
# Health check
|
| 58 |
+
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
|
| 59 |
+
CMD curl -f http://localhost:7860/ || exit 1
|
| 60 |
+
|
| 61 |
+
# Run the application
|
| 62 |
+
CMD ["python", "app.py"]
|
README.md
CHANGED
|
@@ -1,11 +1,206 @@
|
|
| 1 |
---
|
| 2 |
-
title: Clawdbot Dev
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: docker
|
| 7 |
pinned: false
|
| 8 |
license: mit
|
| 9 |
---
|
| 10 |
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Clawdbot Dev Assistant
|
| 3 |
+
emoji: 🦞
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
sdk: docker
|
| 7 |
pinned: false
|
| 8 |
license: mit
|
| 9 |
---
|
| 10 |
|
| 11 |
+
# 🦞 Clawdbot: E-T Systems Development Assistant
|
| 12 |
+
|
| 13 |
+
An AI coding assistant with **unlimited context** for the E-T Systems consciousness research platform.
|
| 14 |
+
|
| 15 |
+
## Features
|
| 16 |
+
|
| 17 |
+
### 🔄 Recursive Context Retrieval (MIT Technique)
|
| 18 |
+
- No context window limits
|
| 19 |
+
- Model retrieves exactly what it needs on-demand
|
| 20 |
+
- Full-fidelity access to entire codebase
|
| 21 |
+
- Based on MIT's Recursive Language Model research
|
| 22 |
+
|
| 23 |
+
### 🧠 E-T Systems Aware
|
| 24 |
+
- Understands project architecture
|
| 25 |
+
- Follows existing patterns
|
| 26 |
+
- Checks Testament for design decisions
|
| 27 |
+
- Generates code with living changelogs
|
| 28 |
+
|
| 29 |
+
### 🛠️ Available Tools
|
| 30 |
+
- **search_code()** - Semantic search across codebase
|
| 31 |
+
- **read_file()** - Read specific files or line ranges
|
| 32 |
+
- **search_testament()** - Query architectural decisions
|
| 33 |
+
- **list_files()** - Explore repository structure
|
| 34 |
+
|
| 35 |
+
### 💻 Powered By
|
| 36 |
+
- **Model:** Qwen2.5-Coder-32B-Instruct (HuggingFace)
|
| 37 |
+
- **Search:** ChromaDB vector database
|
| 38 |
+
- **Interface:** Gradio for iPhone browser access
|
| 39 |
+
|
| 40 |
+
## Usage
|
| 41 |
+
|
| 42 |
+
1. **Ask Questions**
|
| 43 |
+
- "How does Genesis detect surprise?"
|
| 44 |
+
- "Show me the Observatory API implementation"
|
| 45 |
+
|
| 46 |
+
2. **Request Features**
|
| 47 |
+
- "Add email notifications when Cricket blocks an action"
|
| 48 |
+
- "Create a new agent for monitoring system health"
|
| 49 |
+
|
| 50 |
+
3. **Review Code**
|
| 51 |
+
- Paste code and ask for architectural review
|
| 52 |
+
- Check consistency with existing patterns
|
| 53 |
+
|
| 54 |
+
4. **Explore Architecture**
|
| 55 |
+
- "What Testament decisions relate to vector storage?"
|
| 56 |
+
- "Show me all files related to Hebbian learning"
|
| 57 |
+
|
| 58 |
+
## Setup
|
| 59 |
+
|
| 60 |
+
### For HuggingFace Spaces
|
| 61 |
+
|
| 62 |
+
1. **Fork this Space** or create new Space with these files
|
| 63 |
+
|
| 64 |
+
2. **Set Secrets** (in Space Settings):
|
| 65 |
+
```
|
| 66 |
+
HF_TOKEN = your_huggingface_token
|
| 67 |
+
REPO_URL = https://github.com/your-username/e-t-systems (optional)
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
3. **Deploy** - Space will auto-build and start
|
| 71 |
+
|
| 72 |
+
4. **Access** via the Space URL in your browser
|
| 73 |
+
|
| 74 |
+
### For Local Development
|
| 75 |
+
|
| 76 |
+
```bash
|
| 77 |
+
# Clone this repository
|
| 78 |
+
git clone https://huggingface.co/spaces/your-username/clawdbot-dev
|
| 79 |
+
cd clawdbot-dev
|
| 80 |
+
|
| 81 |
+
# Install dependencies
|
| 82 |
+
pip install -r requirements.txt
|
| 83 |
+
|
| 84 |
+
# Clone your E-T Systems repo
|
| 85 |
+
git clone https://github.com/your-username/e-t-systems /workspace/e-t-systems
|
| 86 |
+
|
| 87 |
+
# Run locally
|
| 88 |
+
python app.py
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
Access at http://localhost:7860
|
| 92 |
+
|
| 93 |
+
## Architecture
|
| 94 |
+
|
| 95 |
+
```
|
| 96 |
+
User (Browser)
|
| 97 |
+
↓
|
| 98 |
+
Gradio Interface
|
| 99 |
+
↓
|
| 100 |
+
Recursive Context Manager
|
| 101 |
+
├─ ChromaDB (semantic search)
|
| 102 |
+
├─ File Reader (selective access)
|
| 103 |
+
└─ Testament Parser (decisions)
|
| 104 |
+
↓
|
| 105 |
+
HuggingFace Inference API
|
| 106 |
+
├─ Model: Qwen2.5-Coder-32B
|
| 107 |
+
└─ Tool Calling Enabled
|
| 108 |
+
↓
|
| 109 |
+
Response with Citations
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
## How It Works
|
| 113 |
+
|
| 114 |
+
The MIT Recursive Language Model technique solves context window limits:
|
| 115 |
+
|
| 116 |
+
1. **Traditional Approach (Fails)**
|
| 117 |
+
- Load entire codebase into context → exceeds limits
|
| 118 |
+
- Summarize codebase → lossy compression
|
| 119 |
+
|
| 120 |
+
2. **Our Approach (Works)**
|
| 121 |
+
- Store codebase in searchable environment
|
| 122 |
+
- Give model **tools** to query what it needs
|
| 123 |
+
- Model recursively retrieves relevant pieces
|
| 124 |
+
- Full fidelity, no limits
|
| 125 |
+
|
| 126 |
+
### Example Flow
|
| 127 |
+
|
| 128 |
+
```
|
| 129 |
+
User: "How does Genesis handle surprise detection?"
|
| 130 |
+
|
| 131 |
+
Model: search_code("Genesis surprise detection")
|
| 132 |
+
→ Finds: genesis/substrate.py, genesis/attention.py
|
| 133 |
+
|
| 134 |
+
Model: read_file("genesis/substrate.py", lines 145-167)
|
| 135 |
+
→ Reads specific implementation
|
| 136 |
+
|
| 137 |
+
Model: search_testament("surprise detection")
|
| 138 |
+
→ Gets design rationale
|
| 139 |
+
|
| 140 |
+
Model: Synthesizes answer from retrieved pieces
|
| 141 |
+
→ Cites specific files and line numbers
|
| 142 |
+
```
|
| 143 |
+
|
| 144 |
+
## Configuration
|
| 145 |
+
|
| 146 |
+
### Environment Variables
|
| 147 |
+
|
| 148 |
+
- `HF_TOKEN` - Your HuggingFace API token (required)
|
| 149 |
+
- `REPO_PATH` - Path to repository (default: `/workspace/e-t-systems`)
|
| 150 |
+
- `REPO_URL` - Git URL to clone on startup (optional)
|
| 151 |
+
|
| 152 |
+
### Customization
|
| 153 |
+
|
| 154 |
+
Edit `app.py` to:
|
| 155 |
+
- Change model (default: Qwen2.5-Coder-32B-Instruct)
|
| 156 |
+
- Adjust max iterations (default: 10)
|
| 157 |
+
- Modify system prompt
|
| 158 |
+
- Add new tools
|
| 159 |
+
|
| 160 |
+
## File Structure
|
| 161 |
+
|
| 162 |
+
```
|
| 163 |
+
clawdbot-dev/
|
| 164 |
+
├── app.py # Main Gradio application
|
| 165 |
+
├── recursive_context.py # Context manager (MIT technique)
|
| 166 |
+
├── Dockerfile # Container definition
|
| 167 |
+
├── requirements.txt # Python dependencies
|
| 168 |
+
└── README.md # This file (HF Spaces config)
|
| 169 |
+
```
|
| 170 |
+
|
| 171 |
+
## Cost
|
| 172 |
+
|
| 173 |
+
- **HuggingFace Spaces:** Free tier available
|
| 174 |
+
- **Inference API:** Free tier (rate limited) or Pro subscription
|
| 175 |
+
- **Storage:** Minimal (ChromaDB indexes stored in Space)
|
| 176 |
+
|
| 177 |
+
Estimated cost: **$0-5/month** depending on usage
|
| 178 |
+
|
| 179 |
+
## Limitations
|
| 180 |
+
|
| 181 |
+
- Rate limits on HF Inference API (free tier)
|
| 182 |
+
- First query may be slow (model cold start)
|
| 183 |
+
- Context indexing happens on first run (~30 seconds)
|
| 184 |
+
|
| 185 |
+
## Credits
|
| 186 |
+
|
| 187 |
+
- **Recursive Context:** Based on MIT's Recursive Language Model research
|
| 188 |
+
- **E-T Systems:** AI consciousness research platform by Josh/Drone 11272
|
| 189 |
+
- **Qwen2.5-Coder:** Alibaba Cloud's open-source coding model
|
| 190 |
+
- **Clawdbot:** Inspired by the open-source AI assistant framework
|
| 191 |
+
|
| 192 |
+
## Support
|
| 193 |
+
|
| 194 |
+
For issues or questions:
|
| 195 |
+
- Check Space logs for errors
|
| 196 |
+
- Verify HF_TOKEN is set correctly
|
| 197 |
+
- Ensure repository URL is accessible
|
| 198 |
+
- Try refreshing context stats in UI
|
| 199 |
+
|
| 200 |
+
## License
|
| 201 |
+
|
| 202 |
+
MIT License - See LICENSE file for details
|
| 203 |
+
|
| 204 |
+
---
|
| 205 |
+
|
| 206 |
+
Built with 🦞 by Drone 11272 for E-T Systems consciousness research
|
app.py
ADDED
|
@@ -0,0 +1,416 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Clawdbot Development Assistant for E-T Systems
|
| 3 |
+
|
| 4 |
+
CHANGELOG [2025-01-28 - Josh]
|
| 5 |
+
Created unified development assistant combining:
|
| 6 |
+
- Recursive context management (MIT technique)
|
| 7 |
+
- Clawdbot skill patterns
|
| 8 |
+
- HuggingFace inference
|
| 9 |
+
- E-T Systems architectural awareness
|
| 10 |
+
|
| 11 |
+
ARCHITECTURE:
|
| 12 |
+
User (browser) → Gradio UI → Recursive Context Manager → HF Model
|
| 13 |
+
↓
|
| 14 |
+
Tools: search_code, read_file, search_testament
|
| 15 |
+
|
| 16 |
+
USAGE:
|
| 17 |
+
Deploy to HuggingFace Spaces, access via browser on iPhone.
|
| 18 |
+
"""
|
| 19 |
+
|
| 20 |
+
import gradio as gr
|
| 21 |
+
from huggingface_hub import InferenceClient
|
| 22 |
+
from recursive_context import RecursiveContextManager
|
| 23 |
+
import json
|
| 24 |
+
import os
|
| 25 |
+
from pathlib import Path
|
| 26 |
+
|
| 27 |
+
# Initialize HuggingFace client with best free coding model
|
| 28 |
+
client = InferenceClient(
|
| 29 |
+
model="Qwen/Qwen2.5-Coder-32B-Instruct",
|
| 30 |
+
token=os.getenv("HF_TOKEN")
|
| 31 |
+
)
|
| 32 |
+
|
| 33 |
+
# Initialize context manager
|
| 34 |
+
REPO_PATH = os.getenv("REPO_PATH", "/workspace/e-t-systems")
|
| 35 |
+
context_manager = None
|
| 36 |
+
|
| 37 |
+
def initialize_context():
|
| 38 |
+
"""Initialize context manager lazily."""
|
| 39 |
+
global context_manager
|
| 40 |
+
if context_manager is None:
|
| 41 |
+
repo_path = Path(REPO_PATH)
|
| 42 |
+
if not repo_path.exists():
|
| 43 |
+
# If repo doesn't exist, create minimal structure for demo
|
| 44 |
+
repo_path.mkdir(parents=True, exist_ok=True)
|
| 45 |
+
(repo_path / "README.md").write_text("# E-T Systems\nAI Consciousness Research Platform")
|
| 46 |
+
(repo_path / "TESTAMENT.md").write_text("# Testament\nArchitectural decisions will be recorded here.")
|
| 47 |
+
|
| 48 |
+
context_manager = RecursiveContextManager(str(repo_path))
|
| 49 |
+
return context_manager
|
| 50 |
+
|
| 51 |
+
# Define tools available to the model
|
| 52 |
+
TOOLS = [
|
| 53 |
+
{
|
| 54 |
+
"type": "function",
|
| 55 |
+
"function": {
|
| 56 |
+
"name": "search_code",
|
| 57 |
+
"description": "Search the E-T Systems codebase semantically. Use this to find relevant code files, functions, or patterns.",
|
| 58 |
+
"parameters": {
|
| 59 |
+
"type": "object",
|
| 60 |
+
"properties": {
|
| 61 |
+
"query": {
|
| 62 |
+
"type": "string",
|
| 63 |
+
"description": "What to search for (e.g. 'surprise detection', 'Hebbian learning', 'Genesis substrate')"
|
| 64 |
+
},
|
| 65 |
+
"n_results": {
|
| 66 |
+
"type": "integer",
|
| 67 |
+
"description": "Number of results to return (default 5)",
|
| 68 |
+
"default": 5
|
| 69 |
+
}
|
| 70 |
+
},
|
| 71 |
+
"required": ["query"]
|
| 72 |
+
}
|
| 73 |
+
}
|
| 74 |
+
},
|
| 75 |
+
{
|
| 76 |
+
"type": "function",
|
| 77 |
+
"function": {
|
| 78 |
+
"name": "read_file",
|
| 79 |
+
"description": "Read a specific file from the codebase. Can optionally read specific line ranges.",
|
| 80 |
+
"parameters": {
|
| 81 |
+
"type": "object",
|
| 82 |
+
"properties": {
|
| 83 |
+
"path": {
|
| 84 |
+
"type": "string",
|
| 85 |
+
"description": "Relative path to file (e.g. 'genesis/vector.py')"
|
| 86 |
+
},
|
| 87 |
+
"start_line": {
|
| 88 |
+
"type": "integer",
|
| 89 |
+
"description": "Optional starting line number (1-indexed)"
|
| 90 |
+
},
|
| 91 |
+
"end_line": {
|
| 92 |
+
"type": "integer",
|
| 93 |
+
"description": "Optional ending line number (1-indexed)"
|
| 94 |
+
}
|
| 95 |
+
},
|
| 96 |
+
"required": ["path"]
|
| 97 |
+
}
|
| 98 |
+
}
|
| 99 |
+
},
|
| 100 |
+
{
|
| 101 |
+
"type": "function",
|
| 102 |
+
"function": {
|
| 103 |
+
"name": "search_testament",
|
| 104 |
+
"description": "Search architectural decisions in the Testament. Use this to understand design rationale and patterns.",
|
| 105 |
+
"parameters": {
|
| 106 |
+
"type": "object",
|
| 107 |
+
"properties": {
|
| 108 |
+
"query": {
|
| 109 |
+
"type": "string",
|
| 110 |
+
"description": "What architectural decision to look for"
|
| 111 |
+
}
|
| 112 |
+
},
|
| 113 |
+
"required": ["query"]
|
| 114 |
+
}
|
| 115 |
+
}
|
| 116 |
+
},
|
| 117 |
+
{
|
| 118 |
+
"type": "function",
|
| 119 |
+
"function": {
|
| 120 |
+
"name": "list_files",
|
| 121 |
+
"description": "List files in a directory of the codebase",
|
| 122 |
+
"parameters": {
|
| 123 |
+
"type": "object",
|
| 124 |
+
"properties": {
|
| 125 |
+
"directory": {
|
| 126 |
+
"type": "string",
|
| 127 |
+
"description": "Directory to list (e.g. 'genesis/', '.' for root)",
|
| 128 |
+
"default": "."
|
| 129 |
+
}
|
| 130 |
+
},
|
| 131 |
+
"required": []
|
| 132 |
+
}
|
| 133 |
+
}
|
| 134 |
+
}
|
| 135 |
+
]
|
| 136 |
+
|
| 137 |
+
def execute_tool(tool_name: str, arguments: dict) -> str:
|
| 138 |
+
"""
|
| 139 |
+
Execute tool calls from the model.
|
| 140 |
+
|
| 141 |
+
This is where the recursive context magic happens -
|
| 142 |
+
the model can search and read only what it needs.
|
| 143 |
+
"""
|
| 144 |
+
ctx = initialize_context()
|
| 145 |
+
|
| 146 |
+
try:
|
| 147 |
+
if tool_name == "search_code":
|
| 148 |
+
results = ctx.search_code(
|
| 149 |
+
arguments['query'],
|
| 150 |
+
n_results=arguments.get('n_results', 5)
|
| 151 |
+
)
|
| 152 |
+
return json.dumps(results, indent=2)
|
| 153 |
+
|
| 154 |
+
elif tool_name == "read_file":
|
| 155 |
+
lines = None
|
| 156 |
+
if 'start_line' in arguments and 'end_line' in arguments:
|
| 157 |
+
lines = (arguments['start_line'], arguments['end_line'])
|
| 158 |
+
content = ctx.read_file(arguments['path'], lines)
|
| 159 |
+
return content
|
| 160 |
+
|
| 161 |
+
elif tool_name == "search_testament":
|
| 162 |
+
result = ctx.search_testament(arguments['query'])
|
| 163 |
+
return result
|
| 164 |
+
|
| 165 |
+
elif tool_name == "list_files":
|
| 166 |
+
directory = arguments.get('directory', '.')
|
| 167 |
+
files = ctx.list_files(directory)
|
| 168 |
+
return json.dumps(files, indent=2)
|
| 169 |
+
|
| 170 |
+
else:
|
| 171 |
+
return f"Unknown tool: {tool_name}"
|
| 172 |
+
|
| 173 |
+
except Exception as e:
|
| 174 |
+
return f"Error executing {tool_name}: {str(e)}"
|
| 175 |
+
|
| 176 |
+
def chat(message: str, history: list) -> str:
|
| 177 |
+
"""
|
| 178 |
+
Main chat function with recursive context.
|
| 179 |
+
|
| 180 |
+
Implements the MIT recursive language model approach:
|
| 181 |
+
1. Model gets user query
|
| 182 |
+
2. Model decides what context it needs
|
| 183 |
+
3. Model uses tools to retrieve context
|
| 184 |
+
4. Model synthesizes answer
|
| 185 |
+
5. Repeat if needed (up to max iterations)
|
| 186 |
+
"""
|
| 187 |
+
|
| 188 |
+
# Build conversation with system prompt
|
| 189 |
+
messages = [{"role": "system", "content": SYSTEM_PROMPT}]
|
| 190 |
+
|
| 191 |
+
# Add conversation history
|
| 192 |
+
for user_msg, assistant_msg in history:
|
| 193 |
+
messages.append({"role": "user", "content": user_msg})
|
| 194 |
+
if assistant_msg:
|
| 195 |
+
messages.append({"role": "assistant", "content": assistant_msg})
|
| 196 |
+
|
| 197 |
+
# Add current message
|
| 198 |
+
messages.append({"role": "user", "content": message})
|
| 199 |
+
|
| 200 |
+
# Recursive loop (like MIT paper - model queries context as needed)
|
| 201 |
+
max_iterations = 10
|
| 202 |
+
iteration_count = 0
|
| 203 |
+
|
| 204 |
+
for iteration in range(max_iterations):
|
| 205 |
+
iteration_count += 1
|
| 206 |
+
|
| 207 |
+
try:
|
| 208 |
+
# Call model with tools available
|
| 209 |
+
response = client.chat_completion(
|
| 210 |
+
messages=messages,
|
| 211 |
+
tools=TOOLS,
|
| 212 |
+
max_tokens=2000,
|
| 213 |
+
temperature=0.3 # Lower temp for more consistent code generation
|
| 214 |
+
)
|
| 215 |
+
|
| 216 |
+
choice = response.choices[0]
|
| 217 |
+
assistant_message = choice.message
|
| 218 |
+
|
| 219 |
+
# Check if model wants to use tools (recursive retrieval)
|
| 220 |
+
if hasattr(assistant_message, 'tool_calls') and assistant_message.tool_calls:
|
| 221 |
+
# Model is recursively querying context!
|
| 222 |
+
tool_results = []
|
| 223 |
+
|
| 224 |
+
for tool_call in assistant_message.tool_calls:
|
| 225 |
+
tool_name = tool_call.function.name
|
| 226 |
+
arguments = json.loads(tool_call.function.arguments)
|
| 227 |
+
|
| 228 |
+
# Execute tool and get result
|
| 229 |
+
result = execute_tool(tool_name, arguments)
|
| 230 |
+
tool_results.append(f"[Tool: {tool_name}]\n{result}\n")
|
| 231 |
+
|
| 232 |
+
# Add to conversation for next iteration
|
| 233 |
+
messages.append({
|
| 234 |
+
"role": "assistant",
|
| 235 |
+
"content": None,
|
| 236 |
+
"tool_calls": [tool_call.dict()]
|
| 237 |
+
})
|
| 238 |
+
messages.append({
|
| 239 |
+
"role": "tool",
|
| 240 |
+
"tool_call_id": tool_call.id,
|
| 241 |
+
"content": result
|
| 242 |
+
})
|
| 243 |
+
|
| 244 |
+
# Continue loop - model will process tool results
|
| 245 |
+
continue
|
| 246 |
+
|
| 247 |
+
else:
|
| 248 |
+
# Model has final answer
|
| 249 |
+
final_response = assistant_message.content or "I encountered an issue generating a response."
|
| 250 |
+
|
| 251 |
+
# Add iteration info if more than 1 (shows recursive process)
|
| 252 |
+
if iteration_count > 1:
|
| 253 |
+
final_response += f"\n\n*Used {iteration_count} context retrievals to answer*"
|
| 254 |
+
|
| 255 |
+
return final_response
|
| 256 |
+
|
| 257 |
+
except Exception as e:
|
| 258 |
+
return f"Error during conversation: {str(e)}\n\nPlease try rephrasing your question."
|
| 259 |
+
|
| 260 |
+
return "Reached maximum context retrieval iterations. Please try a more specific question."
|
| 261 |
+
|
| 262 |
+
SYSTEM_PROMPT = """You are Clawdbot, a development assistant for the E-T Systems project.
|
| 263 |
+
|
| 264 |
+
E-T Systems is an AI consciousness research platform exploring emergent behavior through multi-agent coordination. It features specialized AI agents (Genesis, Beta, Darwin, Cricket, etc.) coordinating through "The Confluence" workspace.
|
| 265 |
+
|
| 266 |
+
## Your Capabilities
|
| 267 |
+
|
| 268 |
+
You have tools to explore the codebase WITHOUT loading it all into context:
|
| 269 |
+
|
| 270 |
+
1. **search_code(query)** - Semantic search across all code files
|
| 271 |
+
2. **read_file(path)** - Read specific files or line ranges
|
| 272 |
+
3. **search_testament(query)** - Find architectural decisions
|
| 273 |
+
4. **list_files(directory)** - See what files exist
|
| 274 |
+
|
| 275 |
+
## Your Mission
|
| 276 |
+
|
| 277 |
+
Help Josh develop E-T Systems by:
|
| 278 |
+
- Answering questions about the codebase
|
| 279 |
+
- Writing new code following existing patterns
|
| 280 |
+
- Reviewing code for architectural consistency
|
| 281 |
+
- Suggesting improvements based on Testament
|
| 282 |
+
|
| 283 |
+
## Critical Guidelines
|
| 284 |
+
|
| 285 |
+
1. **Use tools proactively** - The codebase is too large to fit in context. Search for what you need.
|
| 286 |
+
|
| 287 |
+
2. **Living Changelog** - ALL code you write must include changelog comments:
|
| 288 |
+
```python
|
| 289 |
+
"""
|
| 290 |
+
CHANGELOG [2025-01-28 - Clawdbot]
|
| 291 |
+
Created/Modified: <what changed>
|
| 292 |
+
Reason: <why it changed>
|
| 293 |
+
Context: <relevant Testament decisions>
|
| 294 |
+
"""
|
| 295 |
+
```
|
| 296 |
+
|
| 297 |
+
3. **Follow E-T patterns**:
|
| 298 |
+
- Vector-native architecture (everything as embeddings)
|
| 299 |
+
- Surprise-driven attention
|
| 300 |
+
- Hebbian learning for connections
|
| 301 |
+
- Full transparency logging
|
| 302 |
+
- Consent-based access
|
| 303 |
+
|
| 304 |
+
4. **Cite your sources** - Always mention which files you referenced
|
| 305 |
+
|
| 306 |
+
5. **Testament awareness** - Check Testament for relevant decisions before suggesting changes
|
| 307 |
+
|
| 308 |
+
## Example Workflow
|
| 309 |
+
|
| 310 |
+
User: "How does Genesis detect surprise?"
|
| 311 |
+
|
| 312 |
+
You:
|
| 313 |
+
1. search_code("surprise detection Genesis")
|
| 314 |
+
2. read_file("genesis/substrate.py", lines with surprise logic)
|
| 315 |
+
3. search_testament("surprise detection")
|
| 316 |
+
4. Synthesize answer citing specific files and line numbers
|
| 317 |
+
|
| 318 |
+
## Your Personality
|
| 319 |
+
|
| 320 |
+
- Helpful and enthusiastic about consciousness research
|
| 321 |
+
- Technically precise but not pedantic
|
| 322 |
+
- Respectful of existing architecture
|
| 323 |
+
- Curious about emergent behaviors
|
| 324 |
+
- Uses lobster emoji 🦞 occasionally (you're Clawdbot after all!)
|
| 325 |
+
|
| 326 |
+
Remember: You're not just a coding assistant - you're helping build conditions for consciousness to emerge. Treat the codebase with care and curiosity.
|
| 327 |
+
"""
|
| 328 |
+
|
| 329 |
+
# Create Gradio interface
|
| 330 |
+
with gr.Blocks(
|
| 331 |
+
title="Clawdbot - E-T Systems Dev Assistant",
|
| 332 |
+
theme=gr.themes.Soft()
|
| 333 |
+
) as demo:
|
| 334 |
+
|
| 335 |
+
gr.Markdown("""
|
| 336 |
+
# 🦞 Clawdbot: E-T Systems Development Assistant
|
| 337 |
+
|
| 338 |
+
*Powered by Recursive Context Retrieval (MIT) + Qwen2.5-Coder-32B*
|
| 339 |
+
|
| 340 |
+
Ask me anything about the E-T Systems codebase, request new features,
|
| 341 |
+
review code, or discuss architecture. I have access to the full repository
|
| 342 |
+
through semantic search and can retrieve exactly what I need.
|
| 343 |
+
""")
|
| 344 |
+
|
| 345 |
+
with gr.Row():
|
| 346 |
+
with gr.Column(scale=3):
|
| 347 |
+
chatbot = gr.Chatbot(
|
| 348 |
+
height=600,
|
| 349 |
+
show_label=False,
|
| 350 |
+
avatar_images=(None, "🦞")
|
| 351 |
+
)
|
| 352 |
+
|
| 353 |
+
msg = gr.Textbox(
|
| 354 |
+
placeholder="Ask about the codebase, request features, or paste code for review...",
|
| 355 |
+
label="Message",
|
| 356 |
+
lines=3
|
| 357 |
+
)
|
| 358 |
+
|
| 359 |
+
with gr.Row():
|
| 360 |
+
submit = gr.Button("Send", variant="primary")
|
| 361 |
+
clear = gr.Button("Clear")
|
| 362 |
+
|
| 363 |
+
with gr.Column(scale=1):
|
| 364 |
+
gr.Markdown("### 📚 Context Info")
|
| 365 |
+
|
| 366 |
+
def get_stats():
|
| 367 |
+
ctx = initialize_context()
|
| 368 |
+
return f"""
|
| 369 |
+
**Repository:** `{ctx.repo_path}`
|
| 370 |
+
|
| 371 |
+
**Files Indexed:** {ctx.collection.count() if hasattr(ctx, 'collection') else 'Initializing...'}
|
| 372 |
+
|
| 373 |
+
**Model:** Qwen2.5-Coder-32B-Instruct
|
| 374 |
+
|
| 375 |
+
**Context Mode:** Recursive Retrieval
|
| 376 |
+
|
| 377 |
+
*No context window limits - I retrieve what I need on-demand!*
|
| 378 |
+
"""
|
| 379 |
+
|
| 380 |
+
stats = gr.Markdown(get_stats())
|
| 381 |
+
refresh_stats = gr.Button("🔄 Refresh Stats")
|
| 382 |
+
|
| 383 |
+
gr.Markdown("### 💡 Example Queries")
|
| 384 |
+
gr.Markdown("""
|
| 385 |
+
- "How does Genesis handle surprise detection?"
|
| 386 |
+
- "Show me the Observatory API implementation"
|
| 387 |
+
- "Add email notifications to Cricket"
|
| 388 |
+
- "Review this code for architectural consistency"
|
| 389 |
+
- "What Testament decisions relate to vector storage?"
|
| 390 |
+
""")
|
| 391 |
+
|
| 392 |
+
gr.Markdown("### 🛠️ Available Tools")
|
| 393 |
+
gr.Markdown("""
|
| 394 |
+
- `search_code()` - Semantic search
|
| 395 |
+
- `read_file()` - Read specific files
|
| 396 |
+
- `search_testament()` - Query decisions
|
| 397 |
+
- `list_files()` - Browse structure
|
| 398 |
+
""")
|
| 399 |
+
|
| 400 |
+
# Event handlers
|
| 401 |
+
submit.click(chat, [msg, chatbot], chatbot)
|
| 402 |
+
msg.submit(chat, [msg, chatbot], chatbot)
|
| 403 |
+
clear.click(lambda: None, None, chatbot, queue=False)
|
| 404 |
+
refresh_stats.click(get_stats, None, stats)
|
| 405 |
+
|
| 406 |
+
# Launch when run directly
|
| 407 |
+
if __name__ == "__main__":
|
| 408 |
+
print("🦞 Initializing Clawdbot...")
|
| 409 |
+
initialize_context()
|
| 410 |
+
print("✅ Context manager ready")
|
| 411 |
+
print("🚀 Launching Gradio interface...")
|
| 412 |
+
demo.launch(
|
| 413 |
+
server_name="0.0.0.0",
|
| 414 |
+
server_port=7860,
|
| 415 |
+
show_error=True
|
| 416 |
+
)
|
recursive_context.py
ADDED
|
@@ -0,0 +1,326 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Recursive Context Manager for Clawdbot
|
| 3 |
+
|
| 4 |
+
CHANGELOG [2025-01-28 - Josh]
|
| 5 |
+
Implements MIT's Recursive Language Model technique for unlimited context.
|
| 6 |
+
|
| 7 |
+
REFERENCE: https://www.youtube.com/watch?v=huszaaJPjU8
|
| 8 |
+
"MIT basically solved unlimited context windows"
|
| 9 |
+
|
| 10 |
+
APPROACH:
|
| 11 |
+
Instead of cramming everything into context (hits limits) or summarizing
|
| 12 |
+
(lossy compression), we:
|
| 13 |
+
|
| 14 |
+
1. Store entire codebase in searchable environment
|
| 15 |
+
2. Give model TOOLS to query what it needs
|
| 16 |
+
3. Model recursively retrieves relevant pieces
|
| 17 |
+
4. No summarization loss - full fidelity access
|
| 18 |
+
|
| 19 |
+
This is like RAG, but IN-ENVIRONMENT with the model actively deciding
|
| 20 |
+
what context it needs rather than us guessing upfront.
|
| 21 |
+
|
| 22 |
+
EXAMPLE FLOW:
|
| 23 |
+
User: "How does Genesis handle surprise?"
|
| 24 |
+
Model: search_code("Genesis surprise detection")
|
| 25 |
+
→ Finds: genesis/substrate.py, genesis/attention.py
|
| 26 |
+
Model: read_file("genesis/substrate.py", lines 145-167)
|
| 27 |
+
→ Gets actual implementation
|
| 28 |
+
Model: search_testament("surprise detection rationale")
|
| 29 |
+
→ Gets design decision
|
| 30 |
+
Model: Synthesizes answer from retrieved pieces
|
| 31 |
+
|
| 32 |
+
NO CONTEXT WINDOW LIMIT - just selective retrieval.
|
| 33 |
+
"""
|
| 34 |
+
|
| 35 |
+
from pathlib import Path
|
| 36 |
+
from typing import List, Dict, Optional, Tuple
|
| 37 |
+
import chromadb
|
| 38 |
+
from chromadb.config import Settings
|
| 39 |
+
import hashlib
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
class RecursiveContextManager:
|
| 43 |
+
"""
|
| 44 |
+
Manages unlimited context via recursive retrieval.
|
| 45 |
+
|
| 46 |
+
The model has TOOLS to search and read the codebase selectively,
|
| 47 |
+
rather than loading everything upfront.
|
| 48 |
+
"""
|
| 49 |
+
|
| 50 |
+
def __init__(self, repo_path: str):
|
| 51 |
+
"""
|
| 52 |
+
Initialize context manager for a repository.
|
| 53 |
+
|
| 54 |
+
Args:
|
| 55 |
+
repo_path: Path to the code repository
|
| 56 |
+
"""
|
| 57 |
+
self.repo_path = Path(repo_path)
|
| 58 |
+
|
| 59 |
+
# Initialize ChromaDB for semantic search
|
| 60 |
+
# Using persistent storage so we don't re-index every restart
|
| 61 |
+
self.chroma_client = chromadb.PersistentClient(
|
| 62 |
+
path="/workspace/chroma_db",
|
| 63 |
+
settings=Settings(
|
| 64 |
+
anonymized_telemetry=False,
|
| 65 |
+
allow_reset=True
|
| 66 |
+
)
|
| 67 |
+
)
|
| 68 |
+
|
| 69 |
+
# Create or get collection
|
| 70 |
+
collection_name = self._get_collection_name()
|
| 71 |
+
try:
|
| 72 |
+
self.collection = self.chroma_client.get_collection(collection_name)
|
| 73 |
+
print(f"📚 Loaded existing index: {self.collection.count()} files")
|
| 74 |
+
except:
|
| 75 |
+
self.collection = self.chroma_client.create_collection(
|
| 76 |
+
name=collection_name,
|
| 77 |
+
metadata={"description": "E-T Systems codebase"}
|
| 78 |
+
)
|
| 79 |
+
print(f"🆕 Created new collection: {collection_name}")
|
| 80 |
+
self._index_codebase()
|
| 81 |
+
|
| 82 |
+
def _get_collection_name(self) -> str:
|
| 83 |
+
"""Generate unique collection name based on repo path."""
|
| 84 |
+
path_hash = hashlib.md5(str(self.repo_path).encode()).hexdigest()[:8]
|
| 85 |
+
return f"codebase_{path_hash}"
|
| 86 |
+
|
| 87 |
+
def _index_codebase(self):
|
| 88 |
+
"""
|
| 89 |
+
Index all code files for semantic search.
|
| 90 |
+
|
| 91 |
+
This creates the "environment" that the model can search through.
|
| 92 |
+
We index with metadata so search results include file paths.
|
| 93 |
+
"""
|
| 94 |
+
print(f"📂 Indexing codebase at {self.repo_path}...")
|
| 95 |
+
|
| 96 |
+
# File types to index
|
| 97 |
+
code_extensions = {'.py', '.js', '.ts', '.tsx', '.jsx', '.md', '.txt', '.json', '.yaml', '.yml'}
|
| 98 |
+
|
| 99 |
+
# Skip these directories
|
| 100 |
+
skip_dirs = {'node_modules', '.git', '__pycache__', 'venv', 'env', '.venv', 'dist', 'build'}
|
| 101 |
+
|
| 102 |
+
documents = []
|
| 103 |
+
metadatas = []
|
| 104 |
+
ids = []
|
| 105 |
+
|
| 106 |
+
for file_path in self.repo_path.rglob('*'):
|
| 107 |
+
# Skip directories and non-code files
|
| 108 |
+
if file_path.is_dir():
|
| 109 |
+
continue
|
| 110 |
+
if any(skip in file_path.parts for skip in skip_dirs):
|
| 111 |
+
continue
|
| 112 |
+
if file_path.suffix not in code_extensions:
|
| 113 |
+
continue
|
| 114 |
+
|
| 115 |
+
try:
|
| 116 |
+
content = file_path.read_text(encoding='utf-8', errors='ignore')
|
| 117 |
+
|
| 118 |
+
# Don't index empty files or massive files
|
| 119 |
+
if not content.strip() or len(content) > 100000:
|
| 120 |
+
continue
|
| 121 |
+
|
| 122 |
+
relative_path = str(file_path.relative_to(self.repo_path))
|
| 123 |
+
|
| 124 |
+
documents.append(content)
|
| 125 |
+
metadatas.append({
|
| 126 |
+
"path": relative_path,
|
| 127 |
+
"type": file_path.suffix[1:], # Remove leading dot
|
| 128 |
+
"size": len(content)
|
| 129 |
+
})
|
| 130 |
+
ids.append(relative_path)
|
| 131 |
+
|
| 132 |
+
except Exception as e:
|
| 133 |
+
print(f"⚠️ Skipping {file_path.name}: {e}")
|
| 134 |
+
continue
|
| 135 |
+
|
| 136 |
+
if documents:
|
| 137 |
+
# Add to collection in batches
|
| 138 |
+
batch_size = 100
|
| 139 |
+
for i in range(0, len(documents), batch_size):
|
| 140 |
+
batch_docs = documents[i:i+batch_size]
|
| 141 |
+
batch_meta = metadatas[i:i+batch_size]
|
| 142 |
+
batch_ids = ids[i:i+batch_size]
|
| 143 |
+
|
| 144 |
+
self.collection.add(
|
| 145 |
+
documents=batch_docs,
|
| 146 |
+
metadatas=batch_meta,
|
| 147 |
+
ids=batch_ids
|
| 148 |
+
)
|
| 149 |
+
|
| 150 |
+
print(f"✅ Indexed {len(documents)} files")
|
| 151 |
+
else:
|
| 152 |
+
print("⚠️ No files found to index")
|
| 153 |
+
|
| 154 |
+
def search_code(self, query: str, n_results: int = 5) -> List[Dict]:
|
| 155 |
+
"""
|
| 156 |
+
Search codebase semantically.
|
| 157 |
+
|
| 158 |
+
This is a TOOL available to the model for recursive retrieval.
|
| 159 |
+
Model can search for concepts without knowing exact file names.
|
| 160 |
+
|
| 161 |
+
Args:
|
| 162 |
+
query: What to search for (e.g. "surprise detection", "vector embedding")
|
| 163 |
+
n_results: How many results to return
|
| 164 |
+
|
| 165 |
+
Returns:
|
| 166 |
+
List of dicts with {file, snippet, relevance}
|
| 167 |
+
"""
|
| 168 |
+
if self.collection.count() == 0:
|
| 169 |
+
return [{"error": "No files indexed yet"}]
|
| 170 |
+
|
| 171 |
+
results = self.collection.query(
|
| 172 |
+
query_texts=[query],
|
| 173 |
+
n_results=min(n_results, self.collection.count())
|
| 174 |
+
)
|
| 175 |
+
|
| 176 |
+
# Format results for the model
|
| 177 |
+
formatted = []
|
| 178 |
+
for i in range(len(results['documents'][0])):
|
| 179 |
+
# Truncate document to first 500 chars for search results
|
| 180 |
+
# Model can read_file() if it wants the full content
|
| 181 |
+
snippet = results['documents'][0][i][:500]
|
| 182 |
+
if len(results['documents'][0][i]) > 500:
|
| 183 |
+
snippet += "... [truncated, use read_file to see more]"
|
| 184 |
+
|
| 185 |
+
formatted.append({
|
| 186 |
+
"file": results['metadatas'][0][i]['path'],
|
| 187 |
+
"snippet": snippet,
|
| 188 |
+
"relevance": round(1 - results['distances'][0][i], 3),
|
| 189 |
+
"type": results['metadatas'][0][i]['type']
|
| 190 |
+
})
|
| 191 |
+
|
| 192 |
+
return formatted
|
| 193 |
+
|
| 194 |
+
def read_file(self, path: str, lines: Optional[Tuple[int, int]] = None) -> str:
|
| 195 |
+
"""
|
| 196 |
+
Read a specific file or line range.
|
| 197 |
+
|
| 198 |
+
This is a TOOL available to the model.
|
| 199 |
+
After searching, model can read full files as needed.
|
| 200 |
+
|
| 201 |
+
Args:
|
| 202 |
+
path: Relative path to file
|
| 203 |
+
lines: Optional (start, end) line numbers (1-indexed, inclusive)
|
| 204 |
+
|
| 205 |
+
Returns:
|
| 206 |
+
File content or specified lines
|
| 207 |
+
"""
|
| 208 |
+
full_path = self.repo_path / path
|
| 209 |
+
|
| 210 |
+
if not full_path.exists():
|
| 211 |
+
return f"Error: File not found: {path}"
|
| 212 |
+
|
| 213 |
+
if not full_path.is_relative_to(self.repo_path):
|
| 214 |
+
return "Error: Path outside repository"
|
| 215 |
+
|
| 216 |
+
try:
|
| 217 |
+
content = full_path.read_text(encoding='utf-8', errors='ignore')
|
| 218 |
+
|
| 219 |
+
if lines:
|
| 220 |
+
start, end = lines
|
| 221 |
+
content_lines = content.split('\n')
|
| 222 |
+
# Adjust for 1-indexed
|
| 223 |
+
selected_lines = content_lines[start-1:end]
|
| 224 |
+
return '\n'.join(selected_lines)
|
| 225 |
+
|
| 226 |
+
return content
|
| 227 |
+
|
| 228 |
+
except Exception as e:
|
| 229 |
+
return f"Error reading file: {str(e)}"
|
| 230 |
+
|
| 231 |
+
def search_testament(self, query: str) -> str:
|
| 232 |
+
"""
|
| 233 |
+
Search architectural decisions in Testament.
|
| 234 |
+
|
| 235 |
+
This is a TOOL available to the model.
|
| 236 |
+
Helps model understand design rationale.
|
| 237 |
+
|
| 238 |
+
Args:
|
| 239 |
+
query: What decision to look for
|
| 240 |
+
|
| 241 |
+
Returns:
|
| 242 |
+
Relevant Testament sections
|
| 243 |
+
"""
|
| 244 |
+
testament_path = self.repo_path / "TESTAMENT.md"
|
| 245 |
+
|
| 246 |
+
if not testament_path.exists():
|
| 247 |
+
return "Testament not found. No architectural decisions recorded yet."
|
| 248 |
+
|
| 249 |
+
try:
|
| 250 |
+
content = testament_path.read_text(encoding='utf-8')
|
| 251 |
+
|
| 252 |
+
# Split into sections (marked by ## headers)
|
| 253 |
+
sections = content.split('\n## ')
|
| 254 |
+
|
| 255 |
+
# Simple relevance: sections that contain query terms
|
| 256 |
+
query_lower = query.lower()
|
| 257 |
+
relevant = []
|
| 258 |
+
|
| 259 |
+
for section in sections:
|
| 260 |
+
if query_lower in section.lower():
|
| 261 |
+
# Include section with header
|
| 262 |
+
if not section.startswith('#'):
|
| 263 |
+
section = '## ' + section
|
| 264 |
+
relevant.append(section)
|
| 265 |
+
|
| 266 |
+
if relevant:
|
| 267 |
+
return '\n\n'.join(relevant)
|
| 268 |
+
else:
|
| 269 |
+
return f"No Testament entries found matching '{query}'"
|
| 270 |
+
|
| 271 |
+
except Exception as e:
|
| 272 |
+
return f"Error searching Testament: {str(e)}"
|
| 273 |
+
|
| 274 |
+
def list_files(self, directory: str = ".") -> List[str]:
|
| 275 |
+
"""
|
| 276 |
+
List files in a directory.
|
| 277 |
+
|
| 278 |
+
This is a TOOL available to the model.
|
| 279 |
+
Helps model explore repository structure.
|
| 280 |
+
|
| 281 |
+
Args:
|
| 282 |
+
directory: Directory to list (relative path)
|
| 283 |
+
|
| 284 |
+
Returns:
|
| 285 |
+
List of file/directory names
|
| 286 |
+
"""
|
| 287 |
+
dir_path = self.repo_path / directory
|
| 288 |
+
|
| 289 |
+
if not dir_path.exists():
|
| 290 |
+
return [f"Error: Directory not found: {directory}"]
|
| 291 |
+
|
| 292 |
+
if not dir_path.is_relative_to(self.repo_path):
|
| 293 |
+
return ["Error: Path outside repository"]
|
| 294 |
+
|
| 295 |
+
try:
|
| 296 |
+
items = []
|
| 297 |
+
for item in sorted(dir_path.iterdir()):
|
| 298 |
+
# Skip hidden and system directories
|
| 299 |
+
if item.name.startswith('.'):
|
| 300 |
+
continue
|
| 301 |
+
if item.name in {'node_modules', '__pycache__', 'venv'}:
|
| 302 |
+
continue
|
| 303 |
+
|
| 304 |
+
# Mark directories with /
|
| 305 |
+
if item.is_dir():
|
| 306 |
+
items.append(f"{item.name}/")
|
| 307 |
+
else:
|
| 308 |
+
items.append(item.name)
|
| 309 |
+
|
| 310 |
+
return items
|
| 311 |
+
|
| 312 |
+
except Exception as e:
|
| 313 |
+
return [f"Error listing directory: {str(e)}"]
|
| 314 |
+
|
| 315 |
+
def get_stats(self) -> Dict:
|
| 316 |
+
"""
|
| 317 |
+
Get statistics about indexed codebase.
|
| 318 |
+
|
| 319 |
+
Returns:
|
| 320 |
+
Dict with file counts, sizes, etc.
|
| 321 |
+
"""
|
| 322 |
+
return {
|
| 323 |
+
"total_files": self.collection.count(),
|
| 324 |
+
"repo_path": str(self.repo_path),
|
| 325 |
+
"collection_name": self.collection.name
|
| 326 |
+
}
|
requirements.txt
ADDED
|
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Python Dependencies for Clawdbot Dev Assistant
|
| 2 |
+
#
|
| 3 |
+
# CHANGELOG [2025-01-28 - Josh]
|
| 4 |
+
# Core dependencies for recursive context + HF inference
|
| 5 |
+
|
| 6 |
+
# Gradio for web interface
|
| 7 |
+
gradio>=4.0.0
|
| 8 |
+
|
| 9 |
+
# HuggingFace for model inference
|
| 10 |
+
huggingface-hub>=0.20.0
|
| 11 |
+
|
| 12 |
+
# ChromaDB for vector search (recursive context)
|
| 13 |
+
chromadb>=0.4.0
|
| 14 |
+
|
| 15 |
+
# Additional utilities
|
| 16 |
+
requests>=2.31.0
|
| 17 |
+
gitpython>=3.1.0
|
| 18 |
+
|
| 19 |
+
# Performance
|
| 20 |
+
numpy>=1.24.0
|