Spaces:
Running
Running
File size: 11,160 Bytes
0eebcd6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 | # π€ Autonomous Python Coding Agent
> **A production-grade, self-healing multi-agent pipeline that doesn't just generate Python code β it autonomously writes, validates, tests, secures, benchmarks, and reflects on its own output before shipping.**
[](https://python.org)
[](https://github.com/langchain-ai/langgraph)
[](https://groq.com)
[](https://chromadb.com)
[](https://streamlit.io)
[](LICENSE)
[](https://huggingface.co/spaces/krishpatel/autonomous-coding-agent)
---
## π Live Demo
**[βΆ Try it on Hugging Face Spaces](https://huggingface.co/spaces/krishpatel/autonomous-coding-agent)**
---
## πΈ Demo

---
## π₯ What makes this different from just using ChatGPT?
| Feature | ChatGPT / Basic Agent | This Agent |
|---|---|---|
| Code generation | β
| β
|
| Syntax validation | β Run and hope | β
AST parse before running |
| Test cases | β Manual | β
Auto-generated by agent |
| Stress testing | β | β
500+ random inputs via Hypothesis |
| Memory | β Stateless | β
ChromaDB learns from past bugs |
| Security audit | β | β
Detects eval, exec, hardcoded keys |
| Performance check | β | β
Benchmarks 1000 runs, rejects slow code |
| Self-review | β | β
Agent scores own confidence 1-10 |
| Self-healing | β | β
Loops back and fixes failures automatically |
| Separate retry counters | β | β
Per-node counters prevent pipeline blockage |
---
## π Key Metrics
| Metric | Value |
|---|---|
| Pipeline nodes | 13 |
| Verification layers | 5 (AST β Tests β Hypothesis β Security β Complexity) |
| Max retries (debugger) | 3 |
| Max retries (security, complexity) | 2 each β independent counters |
| Hypothesis test cases | 500+ random inputs per run |
| Benchmark iterations | 1,000 runs |
| Performance threshold | < 5ms per call |
| Memory backend | ChromaDB vector similarity search |
| LLM | Llama 3.1 8B Instant via Groq |
| Avg pipeline runtime | ~20β40 seconds |
| Lines of code | ~600 across 5 files |
---
## ποΈ Architecture β 13-Node Pipeline
```
User Input (Python Task)
β
βΌ
βββββββββββ
β Planner β ββ Breaks task into blueprint
ββββββ¬βββββ
β
βΌ
βββββββββ
β Coder β ββ Writes code using plan + ChromaDB memory
ββββββ¬βββ
β
βΌ
βββββββββββββββββ
β AST Validator β ββ Syntax + hallucinated imports + type hints
ββββββββ¬βββββββββ (no execution needed β milliseconds)
β
Pass β Fail βββΊ Debugger βββΊ back to AST
βΌ
ββββββββββββββββββ
β Test Generator β ββ Auto-generates pytest-style test cases
βββββββββ¬βββββββββ
β
βΌ
ββββββββββ
β Tester β ββ Runs code + generated tests in sandbox
βββββ¬βββββ
β
Pass β Fail βββΊ Debugger (max 3 retries)
βΌ
ββββββββββββββ
β Hypothesis β ββ 500+ random inputs, property-based testing
βββββββ¬βββββββ (never blocks pipeline β informational only)
β
βΌ
βββββββββββββ
β Benchmark β ββ Runs 1000x, rejects if > 5ms/call
βββββββ¬ββββββ
β
βΌ
ββββββββββββ
β Security β ββ Detects eval/exec/hardcoded secrets
βββββββ¬βββββ (own retry counter β max 2)
β
βΌ
ββββββββββββββ
β Complexity β ββ Line count + nesting depth + LLM score/10
ββββββββ¬ββββββ (own retry counter β max 2)
β
βΌ
βββββββββββββββββββ
β Self Reflection β ββ Agent scores own confidence 1-10
ββββββββββ¬βββββββββ Rewrites if confidence < 7
β
βΌ
ββββββββββββ
β Reviewer β ββ Polishes + docstrings + type hints
βββββββ¬βββββ
β
βΌ
ββββββββββββ
βExplainer β ββ Writes human-readable explanation
βββββββ¬βββββ
β
βΌ
OUTPUT
Final Code + Explanation
```
---
## π Project Structure
```
autonomous-coding-agent/
βββ app.py β Streamlit UI
βββ main.py β Graph builder + entry point
βββ state.py β Shared TypedDict state (whiteboard)
βββ nodes.py β All 13 node functions + LLM + ChromaDB
βββ edges.py β All 7 conditional route functions
βββ requirements.txt β Dependencies
βββ README.md
```
---
## β‘ Run Locally
### Prerequisites
- Python 3.11+
- Groq API key β get free at [console.groq.com](https://console.groq.com)
### Step 1 β Clone the repo
```bash
git clone https://github.com/krishpatel/autonomous-coding-agent.git
cd autonomous-coding-agent
```
### Step 2 β Create virtual environment
```bash
python -m venv venv
# Mac/Linux
source venv/bin/activate
# Windows
venv\Scripts\activate
```
### Step 3 β Install dependencies
```bash
pip install -r requirements.txt
```
### Step 4 β Set your API key
```bash
# Mac/Linux
export GROQ_API_KEY=your_groq_api_key_here
# Windows
set GROQ_API_KEY=your_groq_api_key_here
```
Or create a `.env` file:
```bash
echo "GROQ_API_KEY=your_groq_api_key_here" > .env
```
### Step 5 β Run CLI (no UI)
```bash
python main.py
```
### Step 6 β Run Streamlit UI
```bash
streamlit run app.py
```
Open [http://localhost:8501](http://localhost:8501) in your browser.
---
## π³ Run with Docker (optional)
```dockerfile
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "app.py", "--server.port=8501"]
```
```bash
# Build
docker build -t coding-agent .
# Run
docker run -e GROQ_API_KEY=your_key -p 8501:8501 coding-agent
```
---
## π Deploy to Hugging Face Spaces
```bash
# Install HF CLI
pip install huggingface_hub
# Login
huggingface-cli login
# Create space and push
huggingface-cli repo create autonomous-coding-agent --type space --space_sdk streamlit
git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/autonomous-coding-agent
git push hf main
```
Then add your secret in HF Spaces Settings:
```
GROQ_API_KEY = your_key_here
```
---
## π οΈ Tech Stack
```
LangGraph β Stateful multi-agent graph orchestration
Groq API β LLM inference (Llama 3.1 8B Instant)
ChromaDB β Vector database for bug fix memory
Hypothesis β Property-based stress testing
Streamlit β Production UI
subprocess β Sandboxed isolated code execution
ast β Static code analysis without execution
hashlib β Deterministic ChromaDB IDs
importlib β Real-time import hallucination detection
```
---
## π‘ Key Engineering Decisions
### Why LangGraph over plain LangChain?
LangGraph handles **cyclic workflows** β when tests fail, the agent loops back through the debugger and restarts verification from AST. LangChain's linear chains can't do this cleanly.
### Why AST validation before running?
Running broken code wastes subprocess time. AST parsing catches syntax errors in **milliseconds** without execution β like a proofreader checking spelling before printing.
### Why Hypothesis for testing?
Hand-written tests only cover cases you think of. Hypothesis **auto-generates 500+ random inputs** and verifies properties that should always hold. Catches edge cases no human would write.
### Why separate retry counters per node?
One shared counter caused security failing 3 times to kill the entire pipeline before the debugger got its attempts. Separate counters for security and complexity mean each node fails independently without blocking others.
### Why hashlib instead of Python's hash()?
Python's `hash()` is **randomized every session** for security. Same error β different ChromaDB ID β agent can never retrieve past fixes. `hashlib.md5` is deterministic across all sessions.
### Why combined Reviewer + Explainer?
Two separate LLM calls for polishing and explaining wasted ~8 seconds. One combined call with structured output (`FINAL_CODE:` / `EXPLANATION:`) saves an entire API round trip.
---
## π Real Bugs Found and Fixed
**Bug 1 β False Positive in Tester**
`returncode == 0` doesn't mean the function was called. A file that only defines functions exits successfully but prints nothing. Fixed by checking `stdout` is not empty after successful run.
**Bug 2 β ChromaDB Hash Randomization**
Python's `hash()` is session-randomized. Same bug β different ID every run β memory retrieval never works. Fixed with `hashlib.md5().hexdigest()[:8]` for deterministic cross-session IDs.
**Bug 3 β Python 3.11 F-string Backslash**
Python 3.11 doesn't allow backslashes inside f-string expressions. Benchmark node embedded code inside f-strings. Fixed using string concatenation instead.
**Bug 4 β Shared Retry Counter**
One `retries` counter shared across all nodes caused security/complexity failures to consume the debugger's retry budget. Fixed by adding `security_retries` and `complexity_retries` as independent counters.
---
## π Environment Variables
| Variable | Required | Description |
|---|---|---|
| `GROQ_API_KEY` | β
Yes | Get free at console.groq.com |
| `GITHUB_TOKEN` | β No | Only needed for AutoReview AI project |
---
## π Resume Line
> **Autonomous Python Coding Agent** | LangGraph Β· Groq Β· ChromaDB Β· Streamlit
> Built a 13-node self-healing pipeline with 5-layer verification β AST validation, auto-generated tests, Hypothesis property testing (500+ random inputs), security audit, and self-reflection confidence scoring. ChromaDB vector memory enables cross-session bug fix learning. Deployed on Hugging Face Spaces.
---
## π¨βπ» Author
**Krish Patel** β AI Engineer
[GitHub](https://github.com/krishpatel) Β· [LinkedIn](https://linkedin.com/in/krishpatel) Β· [Live Demo](https://huggingface.co/spaces/krishpatel/autonomous-coding-agent)
---
*Built as part of AI Engineer internship portfolio β Bangalore, 2026*
|