Spaces:
Sleeping
Sleeping
| title: AI Code Review Agent | |
| emoji: ๐ | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| # ๐ฌ AI Code Review Agent | |
| > An autonomous LLM-powered agent that reviews GitHub Pull Requests โ fetches diffs, analyzes code changes, and returns structured feedback with severity ratings and interactive visualizations. | |
| [](https://python.org) | |
| [](https://langchain.com) | |
| [](https://streamlit.io) | |
| [](https://huggingface.co) | |
| --- | |
| ## ๐ What This Project Is | |
| This is **Project 3** of my AI/ML portfolio โ an autonomous agent built with LangChain's tool-calling framework. Unlike a simple LLM wrapper that just sends code to a model, this is a true agent: it **decides which tools to call, in what order, and reasons over intermediate results** before producing a final review. | |
| The agent follows the **ReAct pattern** (Reason + Act): | |
| ``` | |
| User pastes PR URL | |
| โ | |
| Agent calls get_pr_metadata() โ understands intent of PR | |
| โ | |
| Agent calls get_pr_diff() โ fetches what changed | |
| โ | |
| Agent calls get_file_content() โ gets surrounding context if needed | |
| โ | |
| LLM reasons over all collected data | |
| โ | |
| Structured review: issues, severity, fixes | |
| ``` | |
| No hardcoded tool sequence โ the LLM decides autonomously. | |
| --- | |
| ## โจ Features | |
| - **Autonomous agent loop** โ LLM decides which GitHub API tools to call and when | |
| - **3 custom tools** โ PR metadata, code diff, file content fetcher | |
| - **Interactive diff visualization** โ bar chart showing additions/deletions per file | |
| - **Severity breakdown chart** โ donut chart of High/Medium/Low issues | |
| - **Inline diff viewer** โ color-coded added/removed lines per file | |
| - **Structured review cards** โ each issue has location, severity, problem, and fix | |
| - **Multi-provider support** โ works with Groq, Google Gemini, or OpenAI | |
| - **Session-persisted keys** โ save API keys once per session, reuse across reviews | |
| - **Free-text model input** โ type any model name, no dropdown restrictions | |
| --- | |
| ## ๐ง How The Agent Works (Technical) | |
| ### Tool-Calling Architecture | |
| The LLM does not execute code. It outputs **structured JSON** requesting a tool call: | |
| ```json | |
| { | |
| "tool": "get_pr_diff", | |
| "arguments": {"pr_url": "https://github.com/owner/repo/pull/123"} | |
| } | |
| ``` | |
| LangChain reads this output, executes the Python function, returns the result to the LLM as context, and the loop continues until the LLM decides it has enough information. | |
| ### The Three Tools | |
| | Tool | Purpose | | |
| |------|---------| | |
| | `get_pr_metadata` | Fetches PR title, author, description, branch info | | |
| | `get_pr_diff` | Fetches file-level diffs โ what lines were added/removed | | |
| | `get_file_content` | Fetches full file content for deeper context | | |
| ### Why This Is Not Just a Wrapper | |
| A wrapper sends one prompt and gets one response. This agent: | |
| - Makes **multiple sequential API calls** based on intermediate findings | |
| - **Decides dynamically** whether to fetch file content based on what it sees in the diff | |
| - Produces **structured output** parsed into visual components โ not just a text blob | |
| --- | |
| ## ๐๏ธ Project Structure | |
| ``` | |
| ai-code-review-agent/ | |
| โโโ app.py # Streamlit app โ production ready | |
| โโโ app_colab.ipynb # Development notebook with all blocks | |
| โโโ requirements.txt # Pinned dependencies | |
| โโโ README.md | |
| ``` | |
| --- | |
| ## โ๏ธ Setup & Installation | |
| ### Local (VS Code) | |
| ```bash | |
| # Clone the repo | |
| git clone https://github.com/aneebnaqvi15/ai-code-review-agent | |
| cd ai-code-review-agent | |
| # Create virtual environment | |
| python -m venv venv | |
| venv\Scripts\activate # Windows | |
| source venv/bin/activate # Mac/Linux | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Run the app | |
| streamlit run app.py | |
| ``` | |
| App opens at `http://localhost:8501` | |
| ### Google Colab | |
| Open `app_colab.ipynb` and run all cells. Uses ngrok for public URL tunneling. | |
| --- | |
| ## ๐ API Keys Required | |
| | Key | Where to Get | Purpose | | |
| |-----|-------------|---------| | |
| | Groq API Key | [console.groq.com](https://console.groq.com) | LLM inference (free) | | |
| | Google Gemini Key | [aistudio.google.com](https://aistudio.google.com) | Alternative LLM (free tier) | | |
| | GitHub Token | GitHub โ Settings โ Developer Settings โ PAT | Fetch PR data | | |
| **GitHub Token permissions needed:** `repo` (read only) | |
| --- | |
| ## ๐ฆ Dependencies | |
| ``` | |
| langchain==0.2.16 | |
| langchain-google-genai==1.0.10 | |
| langchain-community==0.2.16 | |
| langchain-groq==0.1.10 | |
| google-generativeai==0.7.2 | |
| pygithub==2.3.0 | |
| pydantic==2.7.4 | |
| streamlit==1.36.0 | |
| plotly | |
| python-dotenv==1.0.1 | |
| ``` | |
| --- | |
| ## ๐ Example Output | |
| Given a real PR URL, the agent produces: | |
| **Metrics:** | |
| - Total issues found, broken down by severity | |
| **Charts:** | |
| - Bar chart: lines added vs deleted per file | |
| - Donut chart: High / Medium / Low issue distribution | |
| **Per-issue cards:** | |
| ``` | |
| ISSUE: Missing input validation | |
| FILE: auth.py | |
| SEVERITY: High | |
| PROBLEM: Function accepts raw user input without sanitization | |
| FIX: Add input validation using pydantic or manual type checks before processing | |
| ``` | |
| --- | |
| ## ๐ญ What I Learned | |
| - **Agents vs wrappers** โ the difference is dynamic tool selection, not just chaining prompts | |
| - **Tool definitions are prompts** โ how you describe a tool directly affects whether the LLM calls it correctly | |
| - **Structured output matters** โ getting the LLM to return parseable, consistent format requires careful prompt design | |
| - **Context window management** โ large diffs need truncation strategy or the LLM loses coherence | |
| - **LangChain version pinning** โ newer LangChain versions break tool-calling with older Gemini/Groq integrations | |
| --- | |
| ## ๐ฎ What I'd Build Next | |
| - [ ] Add evaluation metric: measure review quality against human-written reviews | |
| - [ ] Support for multi-file reasoning across a full repo | |
| - [ ] Webhook integration โ auto-review every new PR on push | |
| - [ ] Fine-tuned reviewer model trained on accepted/rejected PR feedback | |
| --- | |
| ## ๐บ๏ธ Portfolio Context | |
| This is part of a 3-project AI/ML portfolio showing progression: | |
| | Project | Skill Demonstrated | | |
| |---------|-------------------| | |
| | [Banking77 Intent Classifier](https://github.com/aneebnaqvi15/banking77-intent-classifier) | Fine-tuning (DistilBERT + LoRA) | | |
| | [Multi-Doc RAG Assistant](#) | Retrieval systems (LangChain + ChromaDB) | | |
| | **AI Code Review Agent** (this) | Autonomous agents + tool-calling | | |
| --- | |
| ## ๐จโ๐ป Author | |
| **Aneeb Naqvi** โ CS Graduate, Full-Stack & AI Engineer | |
| [](https://github.com/aneebnaqvi15) | |