# πŸ€– Autonomous Python Coding Agent > **A production-grade, self-healing multi-agent pipeline that doesn't just generate Python code β€” it autonomously writes, validates, tests, secures, benchmarks, and reflects on its own output before shipping.** [![Python](https://img.shields.io/badge/Python-3.11-blue?style=flat-square&logo=python)](https://python.org) [![LangGraph](https://img.shields.io/badge/LangGraph-0.2.0-green?style=flat-square)](https://github.com/langchain-ai/langgraph) [![Groq](https://img.shields.io/badge/Groq-Llama%203.1-orange?style=flat-square)](https://groq.com) [![ChromaDB](https://img.shields.io/badge/ChromaDB-0.5.0-purple?style=flat-square)](https://chromadb.com) [![Streamlit](https://img.shields.io/badge/Streamlit-1.35-red?style=flat-square)](https://streamlit.io) [![License](https://img.shields.io/badge/License-MIT-lightgrey?style=flat-square)](LICENSE) [![Live Demo](https://img.shields.io/badge/πŸ€—%20Live%20Demo-HuggingFace-yellow?style=flat-square)](https://huggingface.co/spaces/krishpatel/autonomous-coding-agent) --- ## πŸš€ Live Demo **[β–Ά Try it on Hugging Face Spaces](https://huggingface.co/spaces/krishpatel/autonomous-coding-agent)** --- ## πŸ“Έ Demo ![Agent Demo](demo.gif) --- ## πŸ”₯ What makes this different from just using ChatGPT? | Feature | ChatGPT / Basic Agent | This Agent | |---|---|---| | Code generation | βœ… | βœ… | | Syntax validation | ❌ Run and hope | βœ… AST parse before running | | Test cases | ❌ Manual | βœ… Auto-generated by agent | | Stress testing | ❌ | βœ… 500+ random inputs via Hypothesis | | Memory | ❌ Stateless | βœ… ChromaDB learns from past bugs | | Security audit | ❌ | βœ… Detects eval, exec, hardcoded keys | | Performance check | ❌ | βœ… Benchmarks 1000 runs, rejects slow code | | Self-review | ❌ | βœ… Agent scores own confidence 1-10 | | Self-healing | ❌ | βœ… Loops back and fixes failures automatically | | Separate retry counters | ❌ | βœ… Per-node counters prevent pipeline blockage | --- ## πŸ“Š Key Metrics | Metric | Value | |---|---| | Pipeline nodes | 13 | | Verification layers | 5 (AST β†’ Tests β†’ Hypothesis β†’ Security β†’ Complexity) | | Max retries (debugger) | 3 | | Max retries (security, complexity) | 2 each β€” independent counters | | Hypothesis test cases | 500+ random inputs per run | | Benchmark iterations | 1,000 runs | | Performance threshold | < 5ms per call | | Memory backend | ChromaDB vector similarity search | | LLM | Llama 3.1 8B Instant via Groq | | Avg pipeline runtime | ~20–40 seconds | | Lines of code | ~600 across 5 files | --- ## πŸ—οΈ Architecture β€” 13-Node Pipeline ``` User Input (Python Task) β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Planner β”‚ ── Breaks task into blueprint β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β” β”‚ Coder β”‚ ── Writes code using plan + ChromaDB memory β””β”€β”€β”€β”€β”¬β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ AST Validator β”‚ ── Syntax + hallucinated imports + type hints β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ (no execution needed β€” milliseconds) β”‚ Pass β”‚ Fail ──► Debugger ──► back to AST β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Test Generator β”‚ ── Auto-generates pytest-style test cases β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Tester β”‚ ── Runs code + generated tests in sandbox β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β”‚ Pass β”‚ Fail ──► Debugger (max 3 retries) β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Hypothesis β”‚ ── 500+ random inputs, property-based testing β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ (never blocks pipeline β€” informational only) β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Benchmark β”‚ ── Runs 1000x, rejects if > 5ms/call β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Security β”‚ ── Detects eval/exec/hardcoded secrets β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ (own retry counter β€” max 2) β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Complexity β”‚ ── Line count + nesting depth + LLM score/10 β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ (own retry counter β€” max 2) β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Self Reflection β”‚ ── Agent scores own confidence 1-10 β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Rewrites if confidence < 7 β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Reviewer β”‚ ── Polishes + docstrings + type hints β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚Explainer β”‚ ── Writes human-readable explanation β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β”‚ β–Ό OUTPUT Final Code + Explanation ``` --- ## πŸ“ Project Structure ``` autonomous-coding-agent/ β”œβ”€β”€ app.py ← Streamlit UI β”œβ”€β”€ main.py ← Graph builder + entry point β”œβ”€β”€ state.py ← Shared TypedDict state (whiteboard) β”œβ”€β”€ nodes.py ← All 13 node functions + LLM + ChromaDB β”œβ”€β”€ edges.py ← All 7 conditional route functions β”œβ”€β”€ requirements.txt ← Dependencies └── README.md ``` --- ## ⚑ Run Locally ### Prerequisites - Python 3.11+ - Groq API key β€” get free at [console.groq.com](https://console.groq.com) ### Step 1 β€” Clone the repo ```bash git clone https://github.com/krishpatel/autonomous-coding-agent.git cd autonomous-coding-agent ``` ### Step 2 β€” Create virtual environment ```bash python -m venv venv # Mac/Linux source venv/bin/activate # Windows venv\Scripts\activate ``` ### Step 3 β€” Install dependencies ```bash pip install -r requirements.txt ``` ### Step 4 β€” Set your API key ```bash # Mac/Linux export GROQ_API_KEY=your_groq_api_key_here # Windows set GROQ_API_KEY=your_groq_api_key_here ``` Or create a `.env` file: ```bash echo "GROQ_API_KEY=your_groq_api_key_here" > .env ``` ### Step 5 β€” Run CLI (no UI) ```bash python main.py ``` ### Step 6 β€” Run Streamlit UI ```bash streamlit run app.py ``` Open [http://localhost:8501](http://localhost:8501) in your browser. --- ## 🐳 Run with Docker (optional) ```dockerfile # Dockerfile FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . EXPOSE 8501 CMD ["streamlit", "run", "app.py", "--server.port=8501"] ``` ```bash # Build docker build -t coding-agent . # Run docker run -e GROQ_API_KEY=your_key -p 8501:8501 coding-agent ``` --- ## 🌐 Deploy to Hugging Face Spaces ```bash # Install HF CLI pip install huggingface_hub # Login huggingface-cli login # Create space and push huggingface-cli repo create autonomous-coding-agent --type space --space_sdk streamlit git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/autonomous-coding-agent git push hf main ``` Then add your secret in HF Spaces Settings: ``` GROQ_API_KEY = your_key_here ``` --- ## πŸ› οΈ Tech Stack ``` LangGraph β€” Stateful multi-agent graph orchestration Groq API β€” LLM inference (Llama 3.1 8B Instant) ChromaDB β€” Vector database for bug fix memory Hypothesis β€” Property-based stress testing Streamlit β€” Production UI subprocess β€” Sandboxed isolated code execution ast β€” Static code analysis without execution hashlib β€” Deterministic ChromaDB IDs importlib β€” Real-time import hallucination detection ``` --- ## πŸ’‘ Key Engineering Decisions ### Why LangGraph over plain LangChain? LangGraph handles **cyclic workflows** β€” when tests fail, the agent loops back through the debugger and restarts verification from AST. LangChain's linear chains can't do this cleanly. ### Why AST validation before running? Running broken code wastes subprocess time. AST parsing catches syntax errors in **milliseconds** without execution β€” like a proofreader checking spelling before printing. ### Why Hypothesis for testing? Hand-written tests only cover cases you think of. Hypothesis **auto-generates 500+ random inputs** and verifies properties that should always hold. Catches edge cases no human would write. ### Why separate retry counters per node? One shared counter caused security failing 3 times to kill the entire pipeline before the debugger got its attempts. Separate counters for security and complexity mean each node fails independently without blocking others. ### Why hashlib instead of Python's hash()? Python's `hash()` is **randomized every session** for security. Same error β†’ different ChromaDB ID β†’ agent can never retrieve past fixes. `hashlib.md5` is deterministic across all sessions. ### Why combined Reviewer + Explainer? Two separate LLM calls for polishing and explaining wasted ~8 seconds. One combined call with structured output (`FINAL_CODE:` / `EXPLANATION:`) saves an entire API round trip. --- ## πŸ› Real Bugs Found and Fixed **Bug 1 β€” False Positive in Tester** `returncode == 0` doesn't mean the function was called. A file that only defines functions exits successfully but prints nothing. Fixed by checking `stdout` is not empty after successful run. **Bug 2 β€” ChromaDB Hash Randomization** Python's `hash()` is session-randomized. Same bug β†’ different ID every run β†’ memory retrieval never works. Fixed with `hashlib.md5().hexdigest()[:8]` for deterministic cross-session IDs. **Bug 3 β€” Python 3.11 F-string Backslash** Python 3.11 doesn't allow backslashes inside f-string expressions. Benchmark node embedded code inside f-strings. Fixed using string concatenation instead. **Bug 4 β€” Shared Retry Counter** One `retries` counter shared across all nodes caused security/complexity failures to consume the debugger's retry budget. Fixed by adding `security_retries` and `complexity_retries` as independent counters. --- ## πŸ”‘ Environment Variables | Variable | Required | Description | |---|---|---| | `GROQ_API_KEY` | βœ… Yes | Get free at console.groq.com | | `GITHUB_TOKEN` | ❌ No | Only needed for AutoReview AI project | --- ## πŸ“ Resume Line > **Autonomous Python Coding Agent** | LangGraph Β· Groq Β· ChromaDB Β· Streamlit > Built a 13-node self-healing pipeline with 5-layer verification β€” AST validation, auto-generated tests, Hypothesis property testing (500+ random inputs), security audit, and self-reflection confidence scoring. ChromaDB vector memory enables cross-session bug fix learning. Deployed on Hugging Face Spaces. --- ## πŸ‘¨β€πŸ’» Author **Krish Patel** β€” AI Engineer [GitHub](https://github.com/krishpatel) Β· [LinkedIn](https://linkedin.com/in/krishpatel) Β· [Live Demo](https://huggingface.co/spaces/krishpatel/autonomous-coding-agent) --- *Built as part of AI Engineer internship portfolio β€” Bangalore, 2026*