Spaces:
Running
Running
| # π€ Autonomous Python Coding Agent | |
| > **A production-grade, self-healing multi-agent pipeline that doesn't just generate Python code β it autonomously writes, validates, tests, secures, benchmarks, and reflects on its own output before shipping.** | |
| [](https://python.org) | |
| [](https://github.com/langchain-ai/langgraph) | |
| [](https://groq.com) | |
| [](https://chromadb.com) | |
| [](https://streamlit.io) | |
| [](LICENSE) | |
| [](https://huggingface.co/spaces/krishpatel/autonomous-coding-agent) | |
| --- | |
| ## π Live Demo | |
| **[βΆ Try it on Hugging Face Spaces](https://huggingface.co/spaces/krishpatel/autonomous-coding-agent)** | |
| --- | |
| ## πΈ Demo | |
|  | |
| --- | |
| ## π₯ What makes this different from just using ChatGPT? | |
| | Feature | ChatGPT / Basic Agent | This Agent | | |
| |---|---|---| | |
| | Code generation | β | β | | |
| | Syntax validation | β Run and hope | β AST parse before running | | |
| | Test cases | β Manual | β Auto-generated by agent | | |
| | Stress testing | β | β 500+ random inputs via Hypothesis | | |
| | Memory | β Stateless | β ChromaDB learns from past bugs | | |
| | Security audit | β | β Detects eval, exec, hardcoded keys | | |
| | Performance check | β | β Benchmarks 1000 runs, rejects slow code | | |
| | Self-review | β | β Agent scores own confidence 1-10 | | |
| | Self-healing | β | β Loops back and fixes failures automatically | | |
| | Separate retry counters | β | β Per-node counters prevent pipeline blockage | | |
| --- | |
| ## π Key Metrics | |
| | Metric | Value | | |
| |---|---| | |
| | Pipeline nodes | 13 | | |
| | Verification layers | 5 (AST β Tests β Hypothesis β Security β Complexity) | | |
| | Max retries (debugger) | 3 | | |
| | Max retries (security, complexity) | 2 each β independent counters | | |
| | Hypothesis test cases | 500+ random inputs per run | | |
| | Benchmark iterations | 1,000 runs | | |
| | Performance threshold | < 5ms per call | | |
| | Memory backend | ChromaDB vector similarity search | | |
| | LLM | Llama 3.1 8B Instant via Groq | | |
| | Avg pipeline runtime | ~20β40 seconds | | |
| | Lines of code | ~600 across 5 files | | |
| --- | |
| ## ποΈ Architecture β 13-Node Pipeline | |
| ``` | |
| User Input (Python Task) | |
| β | |
| βΌ | |
| βββββββββββ | |
| β Planner β ββ Breaks task into blueprint | |
| ββββββ¬βββββ | |
| β | |
| βΌ | |
| βββββββββ | |
| β Coder β ββ Writes code using plan + ChromaDB memory | |
| ββββββ¬βββ | |
| β | |
| βΌ | |
| βββββββββββββββββ | |
| β AST Validator β ββ Syntax + hallucinated imports + type hints | |
| ββββββββ¬βββββββββ (no execution needed β milliseconds) | |
| β | |
| Pass β Fail βββΊ Debugger βββΊ back to AST | |
| βΌ | |
| ββββββββββββββββββ | |
| β Test Generator β ββ Auto-generates pytest-style test cases | |
| βββββββββ¬βββββββββ | |
| β | |
| βΌ | |
| ββββββββββ | |
| β Tester β ββ Runs code + generated tests in sandbox | |
| βββββ¬βββββ | |
| β | |
| Pass β Fail βββΊ Debugger (max 3 retries) | |
| βΌ | |
| ββββββββββββββ | |
| β Hypothesis β ββ 500+ random inputs, property-based testing | |
| βββββββ¬βββββββ (never blocks pipeline β informational only) | |
| β | |
| βΌ | |
| βββββββββββββ | |
| β Benchmark β ββ Runs 1000x, rejects if > 5ms/call | |
| βββββββ¬ββββββ | |
| β | |
| βΌ | |
| ββββββββββββ | |
| β Security β ββ Detects eval/exec/hardcoded secrets | |
| βββββββ¬βββββ (own retry counter β max 2) | |
| β | |
| βΌ | |
| ββββββββββββββ | |
| β Complexity β ββ Line count + nesting depth + LLM score/10 | |
| ββββββββ¬ββββββ (own retry counter β max 2) | |
| β | |
| βΌ | |
| βββββββββββββββββββ | |
| β Self Reflection β ββ Agent scores own confidence 1-10 | |
| ββββββββββ¬βββββββββ Rewrites if confidence < 7 | |
| β | |
| βΌ | |
| ββββββββββββ | |
| β Reviewer β ββ Polishes + docstrings + type hints | |
| βββββββ¬βββββ | |
| β | |
| βΌ | |
| ββββββββββββ | |
| βExplainer β ββ Writes human-readable explanation | |
| βββββββ¬βββββ | |
| β | |
| βΌ | |
| OUTPUT | |
| Final Code + Explanation | |
| ``` | |
| --- | |
| ## π Project Structure | |
| ``` | |
| autonomous-coding-agent/ | |
| βββ app.py β Streamlit UI | |
| βββ main.py β Graph builder + entry point | |
| βββ state.py β Shared TypedDict state (whiteboard) | |
| βββ nodes.py β All 13 node functions + LLM + ChromaDB | |
| βββ edges.py β All 7 conditional route functions | |
| βββ requirements.txt β Dependencies | |
| βββ README.md | |
| ``` | |
| --- | |
| ## β‘ Run Locally | |
| ### Prerequisites | |
| - Python 3.11+ | |
| - Groq API key β get free at [console.groq.com](https://console.groq.com) | |
| ### Step 1 β Clone the repo | |
| ```bash | |
| git clone https://github.com/krishpatel/autonomous-coding-agent.git | |
| cd autonomous-coding-agent | |
| ``` | |
| ### Step 2 β Create virtual environment | |
| ```bash | |
| python -m venv venv | |
| # Mac/Linux | |
| source venv/bin/activate | |
| # Windows | |
| venv\Scripts\activate | |
| ``` | |
| ### Step 3 β Install dependencies | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ### Step 4 β Set your API key | |
| ```bash | |
| # Mac/Linux | |
| export GROQ_API_KEY=your_groq_api_key_here | |
| # Windows | |
| set GROQ_API_KEY=your_groq_api_key_here | |
| ``` | |
| Or create a `.env` file: | |
| ```bash | |
| echo "GROQ_API_KEY=your_groq_api_key_here" > .env | |
| ``` | |
| ### Step 5 β Run CLI (no UI) | |
| ```bash | |
| python main.py | |
| ``` | |
| ### Step 6 β Run Streamlit UI | |
| ```bash | |
| streamlit run app.py | |
| ``` | |
| Open [http://localhost:8501](http://localhost:8501) in your browser. | |
| --- | |
| ## π³ Run with Docker (optional) | |
| ```dockerfile | |
| # Dockerfile | |
| FROM python:3.11-slim | |
| WORKDIR /app | |
| COPY requirements.txt . | |
| RUN pip install -r requirements.txt | |
| COPY . . | |
| EXPOSE 8501 | |
| CMD ["streamlit", "run", "app.py", "--server.port=8501"] | |
| ``` | |
| ```bash | |
| # Build | |
| docker build -t coding-agent . | |
| # Run | |
| docker run -e GROQ_API_KEY=your_key -p 8501:8501 coding-agent | |
| ``` | |
| --- | |
| ## π Deploy to Hugging Face Spaces | |
| ```bash | |
| # Install HF CLI | |
| pip install huggingface_hub | |
| # Login | |
| huggingface-cli login | |
| # Create space and push | |
| huggingface-cli repo create autonomous-coding-agent --type space --space_sdk streamlit | |
| git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/autonomous-coding-agent | |
| git push hf main | |
| ``` | |
| Then add your secret in HF Spaces Settings: | |
| ``` | |
| GROQ_API_KEY = your_key_here | |
| ``` | |
| --- | |
| ## π οΈ Tech Stack | |
| ``` | |
| LangGraph β Stateful multi-agent graph orchestration | |
| Groq API β LLM inference (Llama 3.1 8B Instant) | |
| ChromaDB β Vector database for bug fix memory | |
| Hypothesis β Property-based stress testing | |
| Streamlit β Production UI | |
| subprocess β Sandboxed isolated code execution | |
| ast β Static code analysis without execution | |
| hashlib β Deterministic ChromaDB IDs | |
| importlib β Real-time import hallucination detection | |
| ``` | |
| --- | |
| ## π‘ Key Engineering Decisions | |
| ### Why LangGraph over plain LangChain? | |
| LangGraph handles **cyclic workflows** β when tests fail, the agent loops back through the debugger and restarts verification from AST. LangChain's linear chains can't do this cleanly. | |
| ### Why AST validation before running? | |
| Running broken code wastes subprocess time. AST parsing catches syntax errors in **milliseconds** without execution β like a proofreader checking spelling before printing. | |
| ### Why Hypothesis for testing? | |
| Hand-written tests only cover cases you think of. Hypothesis **auto-generates 500+ random inputs** and verifies properties that should always hold. Catches edge cases no human would write. | |
| ### Why separate retry counters per node? | |
| One shared counter caused security failing 3 times to kill the entire pipeline before the debugger got its attempts. Separate counters for security and complexity mean each node fails independently without blocking others. | |
| ### Why hashlib instead of Python's hash()? | |
| Python's `hash()` is **randomized every session** for security. Same error β different ChromaDB ID β agent can never retrieve past fixes. `hashlib.md5` is deterministic across all sessions. | |
| ### Why combined Reviewer + Explainer? | |
| Two separate LLM calls for polishing and explaining wasted ~8 seconds. One combined call with structured output (`FINAL_CODE:` / `EXPLANATION:`) saves an entire API round trip. | |
| --- | |
| ## π Real Bugs Found and Fixed | |
| **Bug 1 β False Positive in Tester** | |
| `returncode == 0` doesn't mean the function was called. A file that only defines functions exits successfully but prints nothing. Fixed by checking `stdout` is not empty after successful run. | |
| **Bug 2 β ChromaDB Hash Randomization** | |
| Python's `hash()` is session-randomized. Same bug β different ID every run β memory retrieval never works. Fixed with `hashlib.md5().hexdigest()[:8]` for deterministic cross-session IDs. | |
| **Bug 3 β Python 3.11 F-string Backslash** | |
| Python 3.11 doesn't allow backslashes inside f-string expressions. Benchmark node embedded code inside f-strings. Fixed using string concatenation instead. | |
| **Bug 4 β Shared Retry Counter** | |
| One `retries` counter shared across all nodes caused security/complexity failures to consume the debugger's retry budget. Fixed by adding `security_retries` and `complexity_retries` as independent counters. | |
| --- | |
| ## π Environment Variables | |
| | Variable | Required | Description | | |
| |---|---|---| | |
| | `GROQ_API_KEY` | β Yes | Get free at console.groq.com | | |
| | `GITHUB_TOKEN` | β No | Only needed for AutoReview AI project | | |
| --- | |
| ## π Resume Line | |
| > **Autonomous Python Coding Agent** | LangGraph Β· Groq Β· ChromaDB Β· Streamlit | |
| > Built a 13-node self-healing pipeline with 5-layer verification β AST validation, auto-generated tests, Hypothesis property testing (500+ random inputs), security audit, and self-reflection confidence scoring. ChromaDB vector memory enables cross-session bug fix learning. Deployed on Hugging Face Spaces. | |
| --- | |
| ## π¨βπ» Author | |
| **Krish Patel** β AI Engineer | |
| [GitHub](https://github.com/krishpatel) Β· [LinkedIn](https://linkedin.com/in/krishpatel) Β· [Live Demo](https://huggingface.co/spaces/krishpatel/autonomous-coding-agent) | |
| --- | |
| *Built as part of AI Engineer internship portfolio β Bangalore, 2026* | |