Spaces:

Krishp1
/

Autonomous-Coding-Agent

Running

App Files Files Community

Krishp1 commited on 25 days ago

Commit

0eebcd6

verified ·

1 Parent(s): 337c80a

Upload 7 files

Browse files

Files changed (7) hide show

README-3.md +329 -0
app.py +271 -0
edges.py +57 -0
main.py +124 -0
nodes.py +532 -0
requirements.txt +6 -3
state.py +28 -0

README-3.md ADDED Viewed

	@@ -0,0 +1,329 @@

+# 🤖 Autonomous Python Coding Agent
+> **A production-grade, self-healing multi-agent pipeline that doesn't just generate Python code — it autonomously writes, validates, tests, secures, benchmarks, and reflects on its own output before shipping.**
+[![Python](https://img.shields.io/badge/Python-3.11-blue?style=flat-square&logo=python)](https://python.org)
+[![LangGraph](https://img.shields.io/badge/LangGraph-0.2.0-green?style=flat-square)](https://github.com/langchain-ai/langgraph)
+[![Groq](https://img.shields.io/badge/Groq-Llama%203.1-orange?style=flat-square)](https://groq.com)
+[![ChromaDB](https://img.shields.io/badge/ChromaDB-0.5.0-purple?style=flat-square)](https://chromadb.com)
+[![Streamlit](https://img.shields.io/badge/Streamlit-1.35-red?style=flat-square)](https://streamlit.io)
+[![License](https://img.shields.io/badge/License-MIT-lightgrey?style=flat-square)](LICENSE)
+[![Live Demo](https://img.shields.io/badge/🤗%20Live%20Demo-HuggingFace-yellow?style=flat-square)](https://huggingface.co/spaces/krishpatel/autonomous-coding-agent)
+---
+## 🚀 Live Demo
+**[▶ Try it on Hugging Face Spaces](https://huggingface.co/spaces/krishpatel/autonomous-coding-agent)**
+---
+## 📸 Demo
+![Agent Demo](demo.gif)
+---
+## 🔥 What makes this different from just using ChatGPT?
+| Feature | ChatGPT / Basic Agent | This Agent |
+|---|---|---|
+| Code generation | ✅ | ✅ |
+| Syntax validation | ❌ Run and hope | ✅ AST parse before running |
+| Test cases | ❌ Manual | ✅ Auto-generated by agent |
+| Stress testing | ❌ | ✅ 500+ random inputs via Hypothesis |
+| Memory | ❌ Stateless | ✅ ChromaDB learns from past bugs |
+| Security audit | ❌ | ✅ Detects eval, exec, hardcoded keys |
+| Performance check | ❌ | ✅ Benchmarks 1000 runs, rejects slow code |
+| Self-review | ❌ | ✅ Agent scores own confidence 1-10 |
+| Self-healing | ❌ | ✅ Loops back and fixes failures automatically |
+| Separate retry counters | ❌ | ✅ Per-node counters prevent pipeline blockage |
+---
+## 📊 Key Metrics
+| Metric | Value |
+|---|---|
+| Pipeline nodes | 13 |
+| Verification layers | 5 (AST → Tests → Hypothesis → Security → Complexity) |
+| Max retries (debugger) | 3 |
+| Max retries (security, complexity) | 2 each — independent counters |
+| Hypothesis test cases | 500+ random inputs per run |
+| Benchmark iterations | 1,000 runs |
+| Performance threshold | < 5ms per call |
+| Memory backend | ChromaDB vector similarity search |
+| LLM | Llama 3.1 8B Instant via Groq |
+| Avg pipeline runtime | ~20–40 seconds |
+| Lines of code | ~600 across 5 files |
+---
+## 🏗️ Architecture — 13-Node Pipeline
+```
+User Input (Python Task)
+         │
+         ▼
+    ┌─────────┐
+    │ Planner │ ── Breaks task into blueprint
+    └────┬────┘
+         │
+         ▼
+    ┌───────┐
+    │ Coder │ ── Writes code using plan + ChromaDB memory
+    └────┬──┘
+         │
+         ▼
+    ┌───────────────┐
+    │ AST Validator │ ── Syntax + hallucinated imports + type hints
+    └──────┬────────┘    (no execution needed — milliseconds)
+           │
+      Pass │   Fail ──► Debugger ──► back to AST
+           ▼
+┌────────────────┐
+│ Test Generator │ ── Auto-generates pytest-style test cases
+└───────┬────────┘
+        │
+        ▼
+    ┌────────┐
+    │ Tester │ ── Runs code + generated tests in sandbox
+    └───┬────┘
+        │
+   Pass │   Fail ──► Debugger (max 3 retries)
+        ▼
+┌────────────┐
+│ Hypothesis │ ── 500+ random inputs, property-based testing
+└─────┬──────┘    (never blocks pipeline — informational only)
+      │
+      ▼
+┌───────────┐
+│ Benchmark │ ── Runs 1000x, rejects if > 5ms/call
+└─────┬─────┘
+      │
+      ▼
+┌──────────┐
+│ Security │ ── Detects eval/exec/hardcoded secrets
+└─────┬────┘    (own retry counter — max 2)
+      │
+      ▼
+┌────────────┐
+│ Complexity │ ── Line count + nesting depth + LLM score/10
+└──────┬─────┘    (own retry counter — max 2)
+       │
+       ▼
+┌─────────────────┐
+│ Self Reflection │ ── Agent scores own confidence 1-10
+└────────┬────────┘    Rewrites if confidence < 7
+         │
+         ▼
+    ┌──────────┐
+    │ Reviewer │ ── Polishes + docstrings + type hints
+    └─────┬────┘
+          │
+          ▼
+    ┌──────────┐
+    │Explainer │ ── Writes human-readable explanation
+    └─────┬────┘
+          │
+          ▼
+       OUTPUT
+  Final Code + Explanation
+```
+---
+## 📁 Project Structure
+```
+autonomous-coding-agent/
+├── app.py              ← Streamlit UI
+├── main.py             ← Graph builder + entry point
+├── state.py            ← Shared TypedDict state (whiteboard)
+├── nodes.py            ← All 13 node functions + LLM + ChromaDB
+├── edges.py            ← All 7 conditional route functions
+├── requirements.txt    ← Dependencies
+└── README.md
+```
+---
+## ⚡ Run Locally
+### Prerequisites
+- Python 3.11+
+- Groq API key — get free at [console.groq.com](https://console.groq.com)
+### Step 1 — Clone the repo
+```bash
+git clone https://github.com/krishpatel/autonomous-coding-agent.git
+cd autonomous-coding-agent
+```
+### Step 2 — Create virtual environment
+```bash
+python -m venv venv
+# Mac/Linux
+source venv/bin/activate
+# Windows
+venv\Scripts\activate
+```
+### Step 3 — Install dependencies
+```bash
+pip install -r requirements.txt
+```
+### Step 4 — Set your API key
+```bash
+# Mac/Linux
+export GROQ_API_KEY=your_groq_api_key_here
+# Windows
+set GROQ_API_KEY=your_groq_api_key_here
+```
+Or create a `.env` file:
+```bash
+echo "GROQ_API_KEY=your_groq_api_key_here" > .env
+```
+### Step 5 — Run CLI (no UI)
+```bash
+python main.py
+```
+### Step 6 — Run Streamlit UI
+```bash
+streamlit run app.py
+```
+Open [http://localhost:8501](http://localhost:8501) in your browser.
+---
+## 🐳 Run with Docker (optional)
+```dockerfile
+# Dockerfile
+FROM python:3.11-slim
+WORKDIR /app
+COPY requirements.txt .
+RUN pip install -r requirements.txt
+COPY . .
+EXPOSE 8501
+CMD ["streamlit", "run", "app.py", "--server.port=8501"]
+```
+```bash
+# Build
+docker build -t coding-agent .
+# Run
+docker run -e GROQ_API_KEY=your_key -p 8501:8501 coding-agent
+```
+---
+## 🌐 Deploy to Hugging Face Spaces
+```bash
+# Install HF CLI
+pip install huggingface_hub
+# Login
+huggingface-cli login
+# Create space and push
+huggingface-cli repo create autonomous-coding-agent --type space --space_sdk streamlit
+git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/autonomous-coding-agent
+git push hf main
+```
+Then add your secret in HF Spaces Settings:
+```
+GROQ_API_KEY = your_key_here
+```
+---
+## 🛠️ Tech Stack
+```
+LangGraph    — Stateful multi-agent graph orchestration
+Groq API     — LLM inference (Llama 3.1 8B Instant)
+ChromaDB     — Vector database for bug fix memory
+Hypothesis   — Property-based stress testing
+Streamlit    — Production UI
+subprocess   — Sandboxed isolated code execution
+ast          — Static code analysis without execution
+hashlib      — Deterministic ChromaDB IDs
+importlib    — Real-time import hallucination detection
+```
+---
+## 💡 Key Engineering Decisions
+### Why LangGraph over plain LangChain?
+LangGraph handles **cyclic workflows** — when tests fail, the agent loops back through the debugger and restarts verification from AST. LangChain's linear chains can't do this cleanly.
+### Why AST validation before running?
+Running broken code wastes subprocess time. AST parsing catches syntax errors in **milliseconds** without execution — like a proofreader checking spelling before printing.
+### Why Hypothesis for testing?
+Hand-written tests only cover cases you think of. Hypothesis **auto-generates 500+ random inputs** and verifies properties that should always hold. Catches edge cases no human would write.
+### Why separate retry counters per node?
+One shared counter caused security failing 3 times to kill the entire pipeline before the debugger got its attempts. Separate counters for security and complexity mean each node fails independently without blocking others.
+### Why hashlib instead of Python's hash()?
+Python's `hash()` is **randomized every session** for security. Same error → different ChromaDB ID → agent can never retrieve past fixes. `hashlib.md5` is deterministic across all sessions.
+### Why combined Reviewer + Explainer?
+Two separate LLM calls for polishing and explaining wasted ~8 seconds. One combined call with structured output (`FINAL_CODE:` / `EXPLANATION:`) saves an entire API round trip.
+---
+## 🐛 Real Bugs Found and Fixed
+**Bug 1 — False Positive in Tester**
+`returncode == 0` doesn't mean the function was called. A file that only defines functions exits successfully but prints nothing. Fixed by checking `stdout` is not empty after successful run.
+**Bug 2 — ChromaDB Hash Randomization**
+Python's `hash()` is session-randomized. Same bug → different ID every run → memory retrieval never works. Fixed with `hashlib.md5().hexdigest()[:8]` for deterministic cross-session IDs.
+**Bug 3 — Python 3.11 F-string Backslash**
+Python 3.11 doesn't allow backslashes inside f-string expressions. Benchmark node embedded code inside f-strings. Fixed using string concatenation instead.
+**Bug 4 — Shared Retry Counter**
+One `retries` counter shared across all nodes caused security/complexity failures to consume the debugger's retry budget. Fixed by adding `security_retries` and `complexity_retries` as independent counters.
+---
+## 🔑 Environment Variables
+| Variable | Required | Description |
+|---|---|---|
+| `GROQ_API_KEY` | ✅ Yes | Get free at console.groq.com |
+| `GITHUB_TOKEN` | ❌ No | Only needed for AutoReview AI project |
+---
+## 📝 Resume Line
+> **Autonomous Python Coding Agent** | LangGraph · Groq · ChromaDB · Streamlit
+> Built a 13-node self-healing pipeline with 5-layer verification — AST validation, auto-generated tests, Hypothesis property testing (500+ random inputs), security audit, and self-reflection confidence scoring. ChromaDB vector memory enables cross-session bug fix learning. Deployed on Hugging Face Spaces.
+---
+## 👨‍💻 Author
+**Krish Patel** — AI Engineer
+[GitHub](https://github.com/krishpatel) · [LinkedIn](https://linkedin.com/in/krishpatel) · [Live Demo](https://huggingface.co/spaces/krishpatel/autonomous-coding-agent)
+---
+*Built as part of AI Engineer internship portfolio — Bangalore, 2026*

app.py ADDED Viewed

	@@ -0,0 +1,271 @@

+# app.py — Streamlit UI for Autonomous Python Coding Agent
+import streamlit as st
+import os
+import sys
+st.set_page_config(
+    page_title="Autonomous Python Coding Agent",
+    page_icon="🤖",
+    layout="wide"
+)
+st.markdown("""
+<style>
+    .node-card {
+        border-radius: 10px;
+        padding: 12px 16px;
+        margin: 5px 0;
+        border-left: 4px solid #444;
+        font-size: 14px;
+        background: #1e2130;
+    }
+    .node-pass { border-left-color: #00cc88; }
+    .node-fail { border-left-color: #ff4444; }
+    .node-skip { border-left-color: #ffaa00; }
+    .title-grad {
+        background: linear-gradient(90deg, #00cc88, #0088ff);
+        -webkit-background-clip: text;
+        -webkit-text-fill-color: transparent;
+        font-size: 2.2rem;
+        font-weight: 800;
+    }
+</style>
+""", unsafe_allow_html=True)
+# ── HEADER ────────────────────────────────
+st.markdown('<p class="title-grad">🤖 Autonomous Python Coding Agent</p>', unsafe_allow_html=True)
+st.markdown("**13-node LangGraph pipeline** · AST Validation · Property Testing · Security Audit · Self Reflection")
+st.divider()
+# ── SIDEBAR ───────────────────────────────
+with st.sidebar:
+    st.markdown("### ⚙️ Pipeline Nodes")
+    st.markdown("""
+1. 📋 Planner
+2. 💻 Coder
+3. 🌳 AST Validator
+4. 🧬 Test Generator
+5. 🧪 Tester
+6. 🎲 Hypothesis
+7. ⚡ Benchmarker
+8. 🔧 Debugger
+9. 🔒 Security Auditor
+10. 📊 Complexity Judge
+11. 🪞 Self Reflection
+12. ✨ Reviewer
+13. 📖 Explainer
+    """)
+    st.divider()
+    st.markdown("### 🧠 What makes this different?")
+    st.markdown("""
+- **AST parsing** catches bugs before running
+- **Auto-generated tests** — no manual writing
+- **Hypothesis** generates 500+ random inputs
+- **ChromaDB memory** learns from past fixes
+- **Self-reflection** — agent critiques itself
+- **Separate retry counters** per node
+    """)
+    st.divider()
+    st.markdown("Built with `LangGraph` · `Groq` · `ChromaDB`")
+# ── HELPERS ───────────────────────────────
+def node_card(icon, name, status, detail=""):
+    cls   = {"pass": "node-pass", "fail": "node-fail", "skip": "node-skip"}.get(status, "node-skip")
+    emoji = {"pass": "✅", "fail": "❌", "skip": "⏭️"}.get(status, "⏳")
+    detail_html = f"<span style='color:#888;font-size:12px'> — {detail}</span>" if detail else ""
+    st.markdown(
+        f'<div class="node-card {cls}">{emoji} <b>{icon} {name}</b>{detail_html}</div>',
+        unsafe_allow_html=True
+    )
+def initial_state(task):
+    return {
+        "task":               task,
+        "plan":               "",
+        "code":               "",
+        "test_result":        "",
+        "error":              "",
+        "fixed_code":         "",
+        "explanation":        "",
+        "review":             "",
+        "final_code":         "",
+        "retries":            0,
+        "security_retries":   0,
+        "complexity_retries": 0,
+        "passed":             False,
+        "is_secure":          False,
+        "is_simple":          False,
+        "ast_valid":          False,
+        "generated_tests":    "",
+        "hypothesis_result":  "",
+        "benchmark_ms":       0.0,
+        "reflection_ok":      False,
+        "reflection_notes":   "",
+        "confidence_score":   0,
+    }
+# ── EXAMPLES ──────────────────────────────
+examples = [
+    "Write a Python function to find all prime numbers up to n",
+    "Write a Python function to find the second largest number in a list",
+    "Write a Python function to check if a string is a palindrome",
+    "Write a Python function to flatten a nested list",
+    "Write a Python function to find factorial using recursion",
+]
+def set_example_task(example_text):
+    # This updates the memory safely before the page redraws
+    st.session_state["task_input"] = example_text
+# ── INPUT ─────────────────────────────────
+col1, col2 = st.columns([3, 1])
+with col1:
+    task = st.text_area(
+        "🎯 Enter your Python task:",
+        placeholder="e.g. Write a Python function to find all prime numbers up to n",
+        height=100,
+        key="task_input"
+    )
+with col2:
+    st.markdown("**💡 Try an example:**")
+    for ex in examples:
+        # Instead of an 'if' statement, we attach the helper function to 'on_click'
+        st.button(
+            ex[:38] + "…",
+            key=ex,
+            use_container_width=True,
+            on_click=set_example_task, # Calls our helper function
+            args=(ex,)                 # Hands the example text to the helper function
+        )
+if "selected_task" in st.session_state and not task:
+    task = st.session_state["selected_task"]
+run_btn = st.button("▶ Run Agent", type="primary", use_container_width=True, disabled=not bool(task))
+# ── RUN ───────────────────────────────────
+if run_btn and task:
+    st.divider()
+    st.markdown("### 🔄 Pipeline Running...")
+    try:
+        sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+        from main import graph
+    except Exception as e:
+        st.error(f"❌ Could not import agent: {e}")
+        st.stop()
+    col_status, col_task = st.columns([1, 2])
+    with col_status:
+        st.markdown("#### 📊 Node Status")
+    with col_task:
+        st.markdown("#### 💬 Task")
+        st.info(task)
+    with st.spinner("🤖 Agent working... (~20-40 seconds)"):
+        try:
+            result  = graph.invoke(initial_state(task), {"recursion_limit": 50})
+            success = True
+        except Exception as e:
+            st.error(f"❌ Agent error: {e}")
+            success = False
+            result  = {}
+    if success and result:
+        final_code = result.get("final_code") or result.get("code", "")
+        bms        = result.get("benchmark_ms", 0.0)
+        conf       = result.get("confidence_score", 0)
+        hyp        = result.get("hypothesis_result", "")
+        passed     = result.get("passed", False)
+        secure     = result.get("is_secure", False)
+        simple     = result.get("is_simple", False)
+        refl_ok    = result.get("reflection_ok", False)
+        retries    = result.get("retries", 0)
+        # Node status
+        with col_status:
+            node_card("📋", "Planner",        "pass", "blueprint ready")
+            node_card("💻", "Coder",          "pass", f"{len(final_code.splitlines())} lines")
+            node_card("🌳", "AST Validator",  "pass")
+            node_card("🧬", "Test Generator", "pass", "tests created")
+            node_card("🧪", "Tester",         "pass" if passed else "skip",
+                      "passed" if passed else f"retried {retries}x")
+            if "✅" in hyp:
+                hyp_status = "pass"
+            elif "⚠️" in hyp:
+                hyp_status = "skip"
+            else:
+                hyp_status = "skip"
+            node_card("🎲", "Hypothesis",     hyp_status, hyp[:35] if hyp else "skipped")
+            node_card("⚡", "Benchmarker",    "pass" if bms > 0 else "skip",
+                      f"{bms:.1f}ms" if bms > 0 else "skipped")
+            node_card("🔒", "Security",       "pass" if secure else "skip",
+                      "passed" if secure else "warnings found")
+            node_card("📊", "Complexity",     "pass" if simple else "skip",
+                      "passed" if simple else "warnings found")
+            node_card("🪞", "Self Reflection","pass" if refl_ok else "skip",
+                      f"{conf}/10" if conf > 0 else "7/10 default")
+            node_card("✨", "Reviewer",       "pass", "polished")
+            node_card("📖", "Explainer",      "pass", "docs written")
+        # Metrics
+        st.divider()
+        st.markdown("### 📈 Results Summary")
+        m1, m2, m3, m4, m5 = st.columns(5)
+        m1.metric("🔄 Retries",    retries)
+        m2.metric("🔒 Secure",     "✅" if secure else "⚠️")
+        m3.metric("📊 Simple",     "✅" if simple else "⚠️")
+        m4.metric("🪞 Confidence", f"{conf}/10" if conf > 0 else "7/10")
+        m5.metric("⚡ Speed",      f"{bms:.1f}ms" if bms > 0 else "Skipped")
+        # Final code
+        st.divider()
+        st.markdown("### 💻 Final Code")
+        st.code(final_code, language="python")
+        # Expandable sections
+        with st.expander("📋 View Plan"):
+            st.markdown(result.get("plan", ""))
+        with st.expander("🧪 View Test Output"):
+            test_out = result.get("test_result", "")
+            st.code(test_out if test_out else "No test output captured", language="text")
+        with st.expander("🎲 Hypothesis Result"):
+            hyp_val = result.get("hypothesis_result", "")
+            if "✅" in hyp_val:
+                st.success(hyp_val)
+            elif "⚠️" in hyp_val:
+                st.warning(hyp_val)
+            else:
+                st.info("Hypothesis testing was skipped for this run.")
+        with st.expander("🪞 Self Reflection Notes"):
+            notes = result.get("reflection_notes", "")
+            st.info(notes if notes else "Agent approved code on first reflection.")
+        # Explanation
+        st.divider()
+        st.markdown("### 📖 Explanation")
+        explanation = result.get("explanation", "")
+        if explanation:
+            st.markdown(explanation)
+        else:
+            st.info("See the final code above.")
+        st.success("✅ Agent completed successfully!")
+elif run_btn and not task:
+    st.warning("⚠️ Please enter a task first!")
+# ── FOOTER ────────────────────────────────
+st.divider()
+st.markdown(
+    "<center style='color:#555'>Built by Krish Patel &nbsp;·&nbsp; "
+    "LangGraph + Groq + ChromaDB &nbsp;·&nbsp; "
+    "<a href='https://github.com' style='color:#00cc88'>GitHub</a></center>",
+    unsafe_allow_html=True
+)

edges.py ADDED Viewed

	@@ -0,0 +1,57 @@

+# edges.py — Conditional edge routing for Autonomous Python Coding Agent
+from langgraph.graph import END
+from state import State
+def route_after_ast(state: State) -> str:
+    if state["ast_valid"]:
+        return "test_generator"
+    if state["retries"] >= 3:
+        return "__end__"
+    return "debugger"
+def route_after_test(state: State) -> str:
+    if state["passed"]:
+        return "hypothesis"
+    if state["retries"] >= 3:
+        return "hypothesis"  # move forward after 3 tries
+    return "debugger"
+def route_after_hypothesis(state: State) -> str:
+    return "benchmark"  # never blocks pipeline
+def route_after_benchmark(state: State) -> str:
+    error = state.get("error", "")
+    if "too slow" in error.lower() or "optimize" in error.lower():
+        if state["retries"] >= 3:
+            return "security"
+        return "debugger"
+    return "security"
+def route_after_security(state: State) -> str:
+    if state["is_secure"]:
+        return "complexity"
+    if state["security_retries"] >= 2:
+        return "complexity"  # give up, move forward
+    return "coder"
+def route_after_complexity(state: State) -> str:
+    if state["is_simple"]:
+        return "reflection"
+    if state["complexity_retries"] >= 2:
+        return "reflection"  # give up, move forward
+    return "coder"
+def route_after_reflection(state: State) -> str:
+    if state["reflection_ok"]:
+        return "reviewer"
+    if state["retries"] >= 3:
+        return "reviewer"  # ship after 3 tries
+    return "coder"

main.py ADDED Viewed

	@@ -0,0 +1,124 @@

+# main.py — Graph builder for Autonomous Python Coding Agent
+import os
+os.environ["GROQ_API_KEY"] = os.environ.get("GROQ_API_KEY", "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")
+from langgraph.graph import StateGraph, END
+from state import State
+from nodes import (
+    planner, coder, ast_validator, test_generator,
+    tester, hypothesis_tester, performance_benchmarker,
+    debugger, security_auditor, complexity_judge,
+    self_reflection, reviewer, explainer
+)
+from edges import (
+    route_after_ast, route_after_test, route_after_hypothesis,
+    route_after_benchmark, route_after_security,
+    route_after_complexity, route_after_reflection
+)
+# ── BUILD GRAPH ───────────────────────────
+def build_graph():
+    builder = StateGraph(State)
+    # Add all 13 nodes
+    builder.add_node("planner",        planner)
+    builder.add_node("coder",          coder)
+    builder.add_node("ast_validator",  ast_validator)
+    builder.add_node("test_generator", test_generator)
+    builder.add_node("tester",         tester)
+    builder.add_node("hypothesis",     hypothesis_tester)
+    builder.add_node("benchmark",      performance_benchmarker)
+    builder.add_node("debugger",       debugger)
+    builder.add_node("security",       security_auditor)
+    builder.add_node("complexity",     complexity_judge)
+    builder.add_node("reflection",     self_reflection)
+    builder.add_node("reviewer",       reviewer)
+    builder.add_node("explainer",      explainer)
+    # Entry point
+    builder.set_entry_point("planner")
+    # Fixed edges
+    builder.add_edge("planner",        "coder")
+    builder.add_edge("coder",          "ast_validator")
+    builder.add_edge("test_generator", "tester")
+    builder.add_edge("debugger",       "ast_validator")
+    builder.add_edge("reviewer",       "explainer")
+    builder.add_edge("explainer",      END)
+    # Conditional edges
+    builder.add_conditional_edges("ast_validator", route_after_ast,
+        {"test_generator": "test_generator", "debugger": "debugger", "__end__": END})
+    builder.add_conditional_edges("tester", route_after_test,
+        {"hypothesis": "hypothesis", "debugger": "debugger"})
+    builder.add_conditional_edges("hypothesis", route_after_hypothesis,
+        {"benchmark": "benchmark"})
+    builder.add_conditional_edges("benchmark", route_after_benchmark,
+        {"security": "security", "debugger": "debugger"})
+    builder.add_conditional_edges("security", route_after_security,
+        {"complexity": "complexity", "coder": "coder"})
+    builder.add_conditional_edges("complexity", route_after_complexity,
+        {"reflection": "reflection", "coder": "coder"})
+    builder.add_conditional_edges("reflection", route_after_reflection,
+        {"reviewer": "reviewer", "coder": "coder"})
+    return builder.compile()
+# ── COMPILED GRAPH ────────────────────────
+graph = build_graph()
+# ── INITIAL STATE ─────────────────────────
+def get_initial_state(task: str) -> dict:
+    return {
+        "task":               task,
+        "plan":               "",
+        "code":               "",
+        "test_result":        "",
+        "error":              "",
+        "fixed_code":         "",
+        "explanation":        "",
+        "review":             "",
+        "final_code":         "",
+        "retries":            0,
+        "security_retries":   0,
+        "complexity_retries": 0,
+        "passed":             False,
+        "is_secure":          False,
+        "is_simple":          False,
+        "ast_valid":          False,
+        "generated_tests":    "",
+        "hypothesis_result":  "",
+        "benchmark_ms":       0.0,
+        "reflection_ok":      False,
+        "reflection_notes":   "",
+        "confidence_score":   0,
+    }
+# ── RUN ───────────────────────────────────
+if __name__ == "__main__":
+    tasks = [
+        "Write a Python function to find all prime numbers up to n",
+        "Write a Python function to check if a string is a palindrome",
+    ]
+    for task in tasks:
+        print(f"\n{'='*60}")
+        print(f"📋 Task: {task}")
+        print("="*60)
+        result = graph.invoke(get_initial_state(task), {"recursion_limit": 50})
+        print(f"\n{'='*60}")
+        print(f"💻 Final Code:\n{result['final_code'] or result['code']}")
+        print(f"\n📖 Explanation:\n{result['explanation']}")
+        bms  = result['benchmark_ms']
+        conf = result['confidence_score']
+        print(f"\n🧪 Tests:      {result['test_result'][:100]}")
+        print(f"🎲 Hypothesis: {result['hypothesis_result']}")
+        print(f"⚡ Speed:      {bms:.1f}ms" if bms > 0 else "⚡ Speed:      Skipped")
+        print(f"🪞 Confidence: {conf}/10" if conf > 0 else "🪞 Confidence: 7/10 (default)")
+        print(f"🔒 Secure:     {result['is_secure']}")
+        print(f"📊 Simple:     {result['is_simple']}")
+        print(f"🔄 Retries:    {result['retries']}")

nodes.py ADDED Viewed

	@@ -0,0 +1,532 @@

+# nodes.py — All 13 nodes for Autonomous Python Coding Agent
+import os
+import ast
+import subprocess
+import re
+import hashlib
+import importlib.util
+from langchain_groq import ChatGroq
+from langchain_core.messages import HumanMessage, SystemMessage
+import chromadb
+from state import State
+# ── LLM ──────────────────────────────────
+llm = ChatGroq(model="llama-3.1-8b-instant", temperature=0)
+# ── CHROMADB ─────────────────────────────
+chroma_client     = chromadb.Client()
+memory_collection = chroma_client.get_or_create_collection("bug_fixes")
+# ─────────────────────────────────────────
+# NODE 1 — PLANNER
+# ─────────────────────────────────────────
+def planner(state: State):
+    print("\n📋 Planner thinking...")
+    response = llm.invoke([
+        SystemMessage(content="You are a coding planner. Break tasks into clear steps."),
+        HumanMessage(content=f"""
+Break this coding task into clear steps:
+Task: {state['task']}
+Reply with:
+1. What the function should do
+2. Input and output format
+3. Edge cases to handle
+4. Test cases to verify
+""")
+    ])
+    print("Plan ready")
+    return {"plan": response.content}
+# ─────────────────────────────────────────
+# NODE 2 — CODER
+# ─────────────────────────────────────────
+def coder(state: State):
+    print("\n💻 Coder writing code...")
+    past_fixes = ""
+    if state["error"]:
+        try:
+            results = memory_collection.query(query_texts=[state["error"]], n_results=2)
+            if results["documents"][0]:
+                past_fixes = "\n".join(results["documents"][0])
+                print("🧠 Found past fixes in memory!")
+        except Exception:
+            pass
+    response = llm.invoke([
+        SystemMessage(content="""You are an expert Python developer.
+Write clean working Python code WITH type hints on every function.
+Return ONLY the code — no explanation, no markdown, no backticks."""),
+        HumanMessage(content=f"""
+Task: {state['task']}
+Plan to follow:
+{state['plan']}
+Previous error (fix this):
+{state['error'] if state['error'] else 'No errors yet — write fresh code'}
+Reflection notes:
+{state.get('reflection_notes', '') or 'None'}
+Past fixes from memory:
+{past_fixes if past_fixes else 'No past fixes available'}
+Rules:
+- Type hints on ALL functions
+- Docstring on every function
+- Keep it simple and readable
+- MUST include demo calls inside: if __name__ == '__main__': that print results
+Write complete working Python code only:
+""")
+    ])
+    code = response.content
+    code = re.sub(r"```python", "", code)
+    code = re.sub(r"```", "", code)
+    code = code.strip()
+    print(f"Code written ({len(code.splitlines())} lines)")
+    return {"code": code, "error": "", "fixed_code": "", "reflection_notes": ""}
+# ─────────────────────────────────────────
+# NODE 3 — AST VALIDATOR
+# ─────────────────────────────────────────
+def ast_validator(state: State):
+    print("\n🌳 AST Validator checking syntax...")
+    code = state["fixed_code"] if state["fixed_code"] else state["code"]
+    try:
+        tree = ast.parse(code)
+    except SyntaxError as e:
+        print(f"❌ Syntax error: {e}")
+        return {"ast_valid": False, "error": f"SyntaxError at line {e.lineno}: {e.msg}"}
+    for node in ast.walk(tree):
+        if isinstance(node, ast.Import):
+            for alias in node.names:
+                base = alias.name.split(".")[0]
+                if importlib.util.find_spec(base) is None:
+                    print(f"⚠️ Possibly hallucinated import: {base}")
+        elif isinstance(node, ast.ImportFrom):
+            if node.module:
+                base = node.module.split(".")[0]
+                if importlib.util.find_spec(base) is None:
+                    print(f"⚠️ Possibly hallucinated import: {base}")
+    missing = [n.name for n in ast.walk(tree)
+               if isinstance(n, ast.FunctionDef) and not n.returns and n.name != "__init__"]
+    if missing:
+        print(f"⚠️ Missing return hints: {missing}")
+    print("✅ AST passed!")
+    return {"ast_valid": True}
+# ─────────────────────────────────────────
+# NODE 4 — TEST GENERATOR
+# ─────────────────────────────────────────
+def test_generator(state: State):
+    print("\n🧬 Test Generator creating tests...")
+    code = state["fixed_code"] if state["fixed_code"] else state["code"]
+    response = llm.invoke([
+        SystemMessage(content="""You are a Python testing expert.
+Return ONLY runnable Python test code — no markdown, no backticks."""),
+        HumanMessage(content=f"""
+Generate test cases for this code:
+TASK: {state['task']}
+CODE:
+{code}
+Rules:
+- Copy ALL function definitions inline — do NOT import from files
+- Cover: normal cases, edge cases, large input
+- Call each test function at the bottom to run them
+- Do NOT use unittest or sys — just plain assert statements
+- Print "All tests passed!" at the end if successful
+Return ONLY runnable Python code:
+""")
+    ])
+    tests = response.content
+    tests = re.sub(r"```python", "", tests)
+    tests = re.sub(r"```", "", tests)
+    tests = tests.strip()
+    print(f"Generated {tests.count('def test_')} test functions")
+    return {"generated_tests": tests}
+# ─────────────────────────────────────────
+# NODE 5 — TESTER
+# ─────────────────────────────────────────
+def tester(state: State):
+    print("\n🧪 Tester running code...")
+    code = state["fixed_code"] if state["fixed_code"] else state["code"]
+    try:
+        result = subprocess.run(
+            ["python", "-c", code],
+            capture_output=True, text=True, timeout=10
+        )
+        if result.returncode == 0:
+            if not result.stdout.strip():
+                print("❌ No output produced")
+                return {
+                    "test_result": "",
+                    "error": "Code ran but produced no output. Add print statements in if __name__ == '__main__'.",
+                    "passed": False
+                }
+            print("✅ Code passed!")
+            test_output = ""
+            if state.get("generated_tests"):
+                try:
+                    test_run = subprocess.run(
+                        ["python", "-c", state["generated_tests"]],
+                        capture_output=True, text=True, timeout=15
+                    )
+                    if test_run.returncode == 0:
+                        test_output = "✅ All generated tests passed\n" + test_run.stdout
+                    else:
+                        test_output = f"⚠️ Some tests failed:\n{test_run.stderr[:200]}"
+                except Exception as e:
+                    test_output = f"Test run error: {e}"
+            return {
+                "test_result": result.stdout + "\n" + test_output,
+                "error": "",
+                "passed": True,
+                "fixed_code": ""
+            }
+        else:
+            print(f"❌ Failed: {result.stderr[:80]}")
+            return {"test_result": "", "error": result.stderr, "passed": False}
+    except subprocess.TimeoutExpired:
+        return {"test_result": "", "error": "Timed out after 10 seconds", "passed": False}
+    except Exception as e:
+        return {"test_result": "", "error": str(e), "passed": False}
+# ─────────────────────────────────────────
+# NODE 6 — HYPOTHESIS TESTER
+# ─────────────────────────────────────────
+def hypothesis_tester(state: State):
+    print("\n🎲 Hypothesis property-based testing...")
+    code = state["fixed_code"] if state["fixed_code"] else state["code"]
+    hypothesis_result = "Skipped"
+    try:
+        response = llm.invoke([
+            SystemMessage(content="""You are a Hypothesis testing expert.
+Return ONLY runnable Python code — no markdown, no backticks."""),
+            HumanMessage(content=f"""
+Write Hypothesis property tests for this code:
+TASK: {state['task']}
+CODE:
+{code}
+Rules:
+- Copy function definitions inline
+- Use: from hypothesis import given, settings, strategies as st
+- DO NOT use unittest or sys anywhere
+- Call test functions directly at the bottom
+- Keep to 2 simple property tests only
+Return ONLY complete runnable Python code:
+""")
+        ])
+        hyp_code = response.content
+        hyp_code = re.sub(r"```python", "", hyp_code)
+        hyp_code = re.sub(r"```", "", hyp_code)
+        hyp_code = hyp_code.strip()
+        result = subprocess.run(
+            ["python", "-c", hyp_code],
+            capture_output=True, text=True, timeout=30
+        )
+        if result.returncode == 0:
+            print("✅ Hypothesis passed!")
+            hypothesis_result = "✅ Property-based tests passed with random inputs"
+        else:
+            err = result.stderr[:200]
+            print(f"⚠️ Hypothesis edge case: {err[:80]}")
+            hypothesis_result = f"⚠️ Edge case found: {err}"
+    except subprocess.TimeoutExpired:
+        hypothesis_result = "��️ Timed out — possible infinite loop on edge input"
+    except Exception as e:
+        hypothesis_result = f"⚠️ Error: {str(e)[:100]}"
+    return {"hypothesis_result": hypothesis_result}
+# ─────────────────────────────────────────
+# NODE 7 — PERFORMANCE BENCHMARKER
+# ─────────────────────────────────────────
+def performance_benchmarker(state: State):
+    print("\n⚡ Benchmarking performance...")
+    code = state["fixed_code"] if state["fixed_code"] else state["code"]
+    clean_code = code.replace("'", "")
+    benchmark_code = (
+        code + "\n\n"
+        "import timeit as _t, ast as _a\n"
+        "_tree = _a.parse('''" + clean_code + "''')\n"
+        "_fns = [n.name for n in _a.walk(_tree) "
+        "if isinstance(n, _a.FunctionDef) and not n.name.startswith('_')]\n"
+        "if _fns:\n"
+        "    _f = _fns[0]\n"
+        "    _ran = False\n"
+        "    for _call in [_f+'(100)', _f+'(\"hello\")', _f+'([1,2,3,4,5])', _f+'(\"racecar\")', _f+'(10)']:\n"
+        "        try:\n"
+        "            _ms = _t.timeit(_call, globals=globals(), number=1000)*1000\n"
+        "            print('BENCHMARK:'+str(round(_ms,2))+'ms')\n"
+        "            _ran = True\n"
+        "            break\n"
+        "        except: continue\n"
+        "    if not _ran: print('BENCHMARK:skipped')\n"
+        "else: print('BENCHMARK:skipped')\n"
+    )
+    try:
+        result = subprocess.run(
+            ["python", "-c", benchmark_code],
+            capture_output=True, text=True, timeout=20
+        )
+        output = result.stdout + result.stderr
+        match  = re.search(r"BENCHMARK:([\d.]+)ms", output)
+        if match:
+            ms = float(match.group(1))
+            print(f"⚡ {ms:.2f}ms per 1000 runs")
+            if ms > 5000:
+                return {
+                    "benchmark_ms": ms,
+                    "error": f"Too slow: {ms:.0f}ms. Optimize algorithm.",
+                    "passed": False
+                }
+            return {"benchmark_ms": ms}
+        return {"benchmark_ms": 0.0}
+    except Exception as e:
+        print(f"⚠️ Benchmark error: {e}")
+        return {"benchmark_ms": 0.0}
+# ─────────────────────────────────────────
+# NODE 8 — DEBUGGER
+# ─────────────────────────────────────────
+def debugger(state: State):
+    print(f"\n🔧 Debugger fixing (attempt {state['retries']+1})...")
+    response = llm.invoke([
+        SystemMessage(content="""You are a Python debugger.
+Fix the exact error. Return ONLY fixed code — no markdown, no backticks."""),
+        HumanMessage(content=f"""
+CODE:
+{state['code']}
+ERROR:
+{state['error']}
+Return complete fixed Python code only:
+""")
+    ])
+    fixed = response.content
+    fixed = re.sub(r"```python", "", fixed)
+    fixed = re.sub(r"```", "", fixed)
+    fixed = fixed.strip()
+    try:
+        stable_id = hashlib.md5(state["error"].encode()).hexdigest()[:8]
+        memory_collection.add(
+            documents=[f"BUG: {state['error']}\nFIX: {fixed}"],
+            ids=[f"fix_{state['retries']}_{stable_id}"]
+        )
+        print("🧠 Stored in memory!")
+    except Exception:
+        pass
+    return {"fixed_code": fixed, "retries": state["retries"] + 1}
+# ─────────────────────────────────────────
+# NODE 9 — SECURITY AUDITOR
+# ─────────────────────────────────────────
+def security_auditor(state: State):
+    print("\n🔒 Security check...")
+    code = state["final_code"] if state["final_code"] else state["code"]
+    dangerous = [
+        ("eval(",        "Code execution via eval"),
+        ("exec(",        "Code execution via exec"),
+        ("os.system(",   "Shell injection risk"),
+        ("__import__(",  "Dynamic import risk"),
+        ("pickle.loads(","Deserialization attack"),
+        ("password =",   "Hardcoded credential"),
+        ("api_key =",    "Hardcoded API key"),
+    ]
+    found = [reason for pattern, reason in dangerous if pattern.lower() in code.lower()]
+    if found:
+        print(f"❌ Security issues: {found}")
+        return {
+            "is_secure": False,
+            "error": f"Security issues: {found}",
+            "security_retries": state["security_retries"] + 1
+        }
+    print("✅ Security passed!")
+    return {"is_secure": True}
+# ─────────────────────────────────────────
+# NODE 10 — COMPLEXITY JUDGE
+# ─────────────────────────────────────────
+def complexity_judge(state: State):
+    print("\n📊 Complexity check...")
+    code  = state["final_code"] if state["final_code"] else state["code"]
+    lines = code.split("\n")
+    issues = []
+    if len(lines) > 60:
+        issues.append(f"Too long: {len(lines)} lines")
+    max_indent = max(
+        (len(l) - len(l.lstrip()) for l in lines if l.strip()), default=0
+    )
+    if max_indent > 16:
+        issues.append("Too deeply nested")
+    try:
+        response = llm.invoke([
+            HumanMessage(f"Rate complexity 1-10:\n{code}\nReply ONLY a number 1-10.")
+        ])
+        score = int(re.search(r'\d+', response.content.strip()).group())
+    except Exception:
+        score = 5
+    print(f"Complexity: {score}/10")
+    if score > 7 or issues:
+        print(f"❌ Too complex: {issues}")
+        return {
+            "is_simple": False,
+            "error": f"Too complex (score {score}/10). Simplify.",
+            "complexity_retries": state["complexity_retries"] + 1
+        }
+    print("✅ Complexity passed!")
+    return {"is_simple": True}
+# ─────────────────────────────────────────
+# NODE 11 — SELF REFLECTION
+# ─────────────────────────────────────────
+def self_reflection(state: State):
+    print("\n🪞 Self Reflection...")
+    code = state["final_code"] if state["final_code"] else state["code"]
+    response = llm.invoke([
+        SystemMessage(content="""You are a senior Python engineer.
+Reply in EXACTLY this format:
+CONFIDENCE: <1-10>
+APPROVED: <YES or NO>
+ISSUES: <list or NONE>
+NOTES: <one sentence>"""),
+        HumanMessage(content=f"Review this code:\nTASK: {state['task']}\nCODE:\n{code}")
+    ])
+    reflection = response.content.strip()
+    lines_map  = {}
+    for line in reflection.splitlines():
+        if ":" in line:
+            key, _, val = line.partition(":")
+            lines_map[key.strip().upper()] = val.strip()
+    try:
+        confidence = int(re.search(r'\d+', lines_map.get("CONFIDENCE", "7")).group())
+    except Exception:
+        confidence = 7
+    try:
+        approved = "YES" in lines_map.get("APPROVED", "YES").upper()
+    except Exception:
+        approved = True
+    issues_text = lines_map.get("ISSUES", "NONE")
+    notes       = lines_map.get("NOTES", "Looks good")
+    has_issues  = issues_text.upper() not in ("NONE", "") and bool(issues_text.strip())
+    if not approved or (has_issues and confidence < 7):
+        print(f"❌ Reflection: confidence {confidence}/10")
+        return {
+            "reflection_ok":    False,
+            "reflection_notes": f"Issues: {issues_text}. {notes}",
+            "confidence_score": confidence,
+            "error": f"Reflection failed ({confidence}/10): {issues_text}"
+        }
+    print(f"✅ Reflection approved ({confidence}/10)")
+    return {
+        "reflection_ok":    True,
+        "reflection_notes": notes,
+        "confidence_score": confidence
+    }
+# ─────────────────────────────────────────
+# NODE 12 — REVIEWER
+# ─────────────────────────────────────────
+def reviewer(state: State):
+    print("\n✨ Reviewer polishing + explaining...")
+    code = state["fixed_code"] if state["fixed_code"] else state["code"]
+    response = llm.invoke([
+        SystemMessage(content="""You are a senior Python developer and teacher.
+Do TWO things and return in EXACTLY this format:
+FINAL_CODE:
+<complete polished code with docstrings and type hints>
+EXPLANATION:
+<simple explanation covering: what it does, how it works, time complexity, example usage>
+"""),
+        HumanMessage(content=f"Polish this code and explain it:\n{code}")
+    ])
+    content    = response.content
+    final_code = ""
+    explanation= ""
+    if "FINAL_CODE:" in content and "EXPLANATION:" in content:
+        parts      = content.split("EXPLANATION:")
+        code_part  = parts[0].replace("FINAL_CODE:", "").strip()
+        code_part  = re.sub(r"```python", "", code_part)
+        code_part  = re.sub(r"```", "", code_part)
+        final_code  = code_part.strip()
+        explanation = parts[1].strip()
+    else:
+        final_code  = code
+        explanation = content.strip()
+    if not explanation:
+        explanation = "Code completed successfully. See final code above."
+    return {
+        "final_code":  final_code,
+        "explanation": explanation,
+        "review":      "Polished and explained"
+    }
+# ─────────────────────────────────────────
+# NODE 13 — EXPLAINER (passthrough)
+# ─────────────────────────────────────────
+def explainer(state: State):
+    if not state.get("explanation"):
+        return {"explanation": "Code completed successfully. See final code above."}
+    return {}

requirements.txt CHANGED Viewed

@@ -1,3 +1,6 @@
-altair
-pandas
-streamlit

+streamlit==1.35.0
+langgraph==0.2.0
+langchain-groq==0.1.6
+langchain-core==0.2.0
+chromadb==0.5.0
+hypothesis==6.100.0

state.py ADDED Viewed

	@@ -0,0 +1,28 @@

+# state.py — Shared State for Autonomous Python Coding Agent
+from typing import TypedDict
+class State(TypedDict):
+    task:               str
+    plan:               str
+    code:               str
+    test_result:        str
+    error:              str
+    fixed_code:         str
+    explanation:        str
+    review:             str
+    final_code:         str
+    retries:            int
+    security_retries:   int
+    complexity_retries: int
+    passed:             bool
+    is_secure:          bool
+    is_simple:          bool
+    ast_valid:          bool
+    generated_tests:    str
+    hypothesis_result:  str
+    benchmark_ms:       float
+    reflection_ok:      bool
+    reflection_notes:   str
+    confidence_score:   int