--- title: AI Code Review Agent emoji: 🚀 sdk: docker app_port: 7860 pinned: false --- # 🔬 AI Code Review Agent > An autonomous LLM-powered agent that reviews GitHub Pull Requests — fetches diffs, analyzes code changes, and returns structured feedback with severity ratings and interactive visualizations. [![Python](https://img.shields.io/badge/Python-3.10+-blue.svg)](https://python.org) [![LangChain](https://img.shields.io/badge/LangChain-0.2.16-green.svg)](https://langchain.com) [![Streamlit](https://img.shields.io/badge/Streamlit-1.36.0-red.svg)](https://streamlit.io) [![HuggingFace](https://img.shields.io/badge/HuggingFace-Space-yellow.svg)](https://huggingface.co) --- ## 📌 What This Project Is This is **Project 3** of my AI/ML portfolio — an autonomous agent built with LangChain's tool-calling framework. Unlike a simple LLM wrapper that just sends code to a model, this is a true agent: it **decides which tools to call, in what order, and reasons over intermediate results** before producing a final review. The agent follows the **ReAct pattern** (Reason + Act): ``` User pastes PR URL ↓ Agent calls get_pr_metadata() → understands intent of PR ↓ Agent calls get_pr_diff() → fetches what changed ↓ Agent calls get_file_content() → gets surrounding context if needed ↓ LLM reasons over all collected data ↓ Structured review: issues, severity, fixes ``` No hardcoded tool sequence — the LLM decides autonomously. --- ## ✨ Features - **Autonomous agent loop** — LLM decides which GitHub API tools to call and when - **3 custom tools** — PR metadata, code diff, file content fetcher - **Interactive diff visualization** — bar chart showing additions/deletions per file - **Severity breakdown chart** — donut chart of High/Medium/Low issues - **Inline diff viewer** — color-coded added/removed lines per file - **Structured review cards** — each issue has location, severity, problem, and fix - **Multi-provider support** — works with Groq, Google Gemini, or OpenAI - **Session-persisted keys** — save API keys once per session, reuse across reviews - **Free-text model input** — type any model name, no dropdown restrictions --- ## 🧠 How The Agent Works (Technical) ### Tool-Calling Architecture The LLM does not execute code. It outputs **structured JSON** requesting a tool call: ```json { "tool": "get_pr_diff", "arguments": {"pr_url": "https://github.com/owner/repo/pull/123"} } ``` LangChain reads this output, executes the Python function, returns the result to the LLM as context, and the loop continues until the LLM decides it has enough information. ### The Three Tools | Tool | Purpose | |------|---------| | `get_pr_metadata` | Fetches PR title, author, description, branch info | | `get_pr_diff` | Fetches file-level diffs — what lines were added/removed | | `get_file_content` | Fetches full file content for deeper context | ### Why This Is Not Just a Wrapper A wrapper sends one prompt and gets one response. This agent: - Makes **multiple sequential API calls** based on intermediate findings - **Decides dynamically** whether to fetch file content based on what it sees in the diff - Produces **structured output** parsed into visual components — not just a text blob --- ## 🗂️ Project Structure ``` ai-code-review-agent/ ├── app.py # Streamlit app — production ready ├── app_colab.ipynb # Development notebook with all blocks ├── requirements.txt # Pinned dependencies └── README.md ``` --- ## ⚙️ Setup & Installation ### Local (VS Code) ```bash # Clone the repo git clone https://github.com/aneebnaqvi15/ai-code-review-agent cd ai-code-review-agent # Create virtual environment python -m venv venv venv\Scripts\activate # Windows source venv/bin/activate # Mac/Linux # Install dependencies pip install -r requirements.txt # Run the app streamlit run app.py ``` App opens at `http://localhost:8501` ### Google Colab Open `app_colab.ipynb` and run all cells. Uses ngrok for public URL tunneling. --- ## 🔑 API Keys Required | Key | Where to Get | Purpose | |-----|-------------|---------| | Groq API Key | [console.groq.com](https://console.groq.com) | LLM inference (free) | | Google Gemini Key | [aistudio.google.com](https://aistudio.google.com) | Alternative LLM (free tier) | | GitHub Token | GitHub → Settings → Developer Settings → PAT | Fetch PR data | **GitHub Token permissions needed:** `repo` (read only) --- ## 📦 Dependencies ``` langchain==0.2.16 langchain-google-genai==1.0.10 langchain-community==0.2.16 langchain-groq==0.1.10 google-generativeai==0.7.2 pygithub==2.3.0 pydantic==2.7.4 streamlit==1.36.0 plotly python-dotenv==1.0.1 ``` --- ## 📊 Example Output Given a real PR URL, the agent produces: **Metrics:** - Total issues found, broken down by severity **Charts:** - Bar chart: lines added vs deleted per file - Donut chart: High / Medium / Low issue distribution **Per-issue cards:** ``` ISSUE: Missing input validation FILE: auth.py SEVERITY: High PROBLEM: Function accepts raw user input without sanitization FIX: Add input validation using pydantic or manual type checks before processing ``` --- ## 🔭 What I Learned - **Agents vs wrappers** — the difference is dynamic tool selection, not just chaining prompts - **Tool definitions are prompts** — how you describe a tool directly affects whether the LLM calls it correctly - **Structured output matters** — getting the LLM to return parseable, consistent format requires careful prompt design - **Context window management** — large diffs need truncation strategy or the LLM loses coherence - **LangChain version pinning** — newer LangChain versions break tool-calling with older Gemini/Groq integrations --- ## 🔮 What I'd Build Next - [ ] Add evaluation metric: measure review quality against human-written reviews - [ ] Support for multi-file reasoning across a full repo - [ ] Webhook integration — auto-review every new PR on push - [ ] Fine-tuned reviewer model trained on accepted/rejected PR feedback --- ## 🗺️ Portfolio Context This is part of a 3-project AI/ML portfolio showing progression: | Project | Skill Demonstrated | |---------|-------------------| | [Banking77 Intent Classifier](https://github.com/aneebnaqvi15/banking77-intent-classifier) | Fine-tuning (DistilBERT + LoRA) | | [Multi-Doc RAG Assistant](#) | Retrieval systems (LangChain + ChromaDB) | | **AI Code Review Agent** (this) | Autonomous agents + tool-calling | --- ## 👨‍💻 Author **Aneeb Naqvi** — CS Graduate, Full-Stack & AI Engineer [![GitHub](https://img.shields.io/badge/GitHub-aneebnaqvi15-black)](https://github.com/aneebnaqvi15)