aneeb15's picture
Switch to Docker SDK with Python 3.10 to fix build errors
3e00265
---
title: AI Code Review Agent
emoji: ๐Ÿš€
sdk: docker
app_port: 7860
pinned: false
---
# ๐Ÿ”ฌ AI Code Review Agent
> An autonomous LLM-powered agent that reviews GitHub Pull Requests โ€” fetches diffs, analyzes code changes, and returns structured feedback with severity ratings and interactive visualizations.
[![Python](https://img.shields.io/badge/Python-3.10+-blue.svg)](https://python.org)
[![LangChain](https://img.shields.io/badge/LangChain-0.2.16-green.svg)](https://langchain.com)
[![Streamlit](https://img.shields.io/badge/Streamlit-1.36.0-red.svg)](https://streamlit.io)
[![HuggingFace](https://img.shields.io/badge/HuggingFace-Space-yellow.svg)](https://huggingface.co)
---
## ๐Ÿ“Œ What This Project Is
This is **Project 3** of my AI/ML portfolio โ€” an autonomous agent built with LangChain's tool-calling framework. Unlike a simple LLM wrapper that just sends code to a model, this is a true agent: it **decides which tools to call, in what order, and reasons over intermediate results** before producing a final review.
The agent follows the **ReAct pattern** (Reason + Act):
```
User pastes PR URL
โ†“
Agent calls get_pr_metadata() โ†’ understands intent of PR
โ†“
Agent calls get_pr_diff() โ†’ fetches what changed
โ†“
Agent calls get_file_content() โ†’ gets surrounding context if needed
โ†“
LLM reasons over all collected data
โ†“
Structured review: issues, severity, fixes
```
No hardcoded tool sequence โ€” the LLM decides autonomously.
---
## โœจ Features
- **Autonomous agent loop** โ€” LLM decides which GitHub API tools to call and when
- **3 custom tools** โ€” PR metadata, code diff, file content fetcher
- **Interactive diff visualization** โ€” bar chart showing additions/deletions per file
- **Severity breakdown chart** โ€” donut chart of High/Medium/Low issues
- **Inline diff viewer** โ€” color-coded added/removed lines per file
- **Structured review cards** โ€” each issue has location, severity, problem, and fix
- **Multi-provider support** โ€” works with Groq, Google Gemini, or OpenAI
- **Session-persisted keys** โ€” save API keys once per session, reuse across reviews
- **Free-text model input** โ€” type any model name, no dropdown restrictions
---
## ๐Ÿง  How The Agent Works (Technical)
### Tool-Calling Architecture
The LLM does not execute code. It outputs **structured JSON** requesting a tool call:
```json
{
"tool": "get_pr_diff",
"arguments": {"pr_url": "https://github.com/owner/repo/pull/123"}
}
```
LangChain reads this output, executes the Python function, returns the result to the LLM as context, and the loop continues until the LLM decides it has enough information.
### The Three Tools
| Tool | Purpose |
|------|---------|
| `get_pr_metadata` | Fetches PR title, author, description, branch info |
| `get_pr_diff` | Fetches file-level diffs โ€” what lines were added/removed |
| `get_file_content` | Fetches full file content for deeper context |
### Why This Is Not Just a Wrapper
A wrapper sends one prompt and gets one response. This agent:
- Makes **multiple sequential API calls** based on intermediate findings
- **Decides dynamically** whether to fetch file content based on what it sees in the diff
- Produces **structured output** parsed into visual components โ€” not just a text blob
---
## ๐Ÿ—‚๏ธ Project Structure
```
ai-code-review-agent/
โ”œโ”€โ”€ app.py # Streamlit app โ€” production ready
โ”œโ”€โ”€ app_colab.ipynb # Development notebook with all blocks
โ”œโ”€โ”€ requirements.txt # Pinned dependencies
โ””โ”€โ”€ README.md
```
---
## โš™๏ธ Setup & Installation
### Local (VS Code)
```bash
# Clone the repo
git clone https://github.com/aneebnaqvi15/ai-code-review-agent
cd ai-code-review-agent
# Create virtual environment
python -m venv venv
venv\Scripts\activate # Windows
source venv/bin/activate # Mac/Linux
# Install dependencies
pip install -r requirements.txt
# Run the app
streamlit run app.py
```
App opens at `http://localhost:8501`
### Google Colab
Open `app_colab.ipynb` and run all cells. Uses ngrok for public URL tunneling.
---
## ๐Ÿ”‘ API Keys Required
| Key | Where to Get | Purpose |
|-----|-------------|---------|
| Groq API Key | [console.groq.com](https://console.groq.com) | LLM inference (free) |
| Google Gemini Key | [aistudio.google.com](https://aistudio.google.com) | Alternative LLM (free tier) |
| GitHub Token | GitHub โ†’ Settings โ†’ Developer Settings โ†’ PAT | Fetch PR data |
**GitHub Token permissions needed:** `repo` (read only)
---
## ๐Ÿ“ฆ Dependencies
```
langchain==0.2.16
langchain-google-genai==1.0.10
langchain-community==0.2.16
langchain-groq==0.1.10
google-generativeai==0.7.2
pygithub==2.3.0
pydantic==2.7.4
streamlit==1.36.0
plotly
python-dotenv==1.0.1
```
---
## ๐Ÿ“Š Example Output
Given a real PR URL, the agent produces:
**Metrics:**
- Total issues found, broken down by severity
**Charts:**
- Bar chart: lines added vs deleted per file
- Donut chart: High / Medium / Low issue distribution
**Per-issue cards:**
```
ISSUE: Missing input validation
FILE: auth.py
SEVERITY: High
PROBLEM: Function accepts raw user input without sanitization
FIX: Add input validation using pydantic or manual type checks before processing
```
---
## ๐Ÿ”ญ What I Learned
- **Agents vs wrappers** โ€” the difference is dynamic tool selection, not just chaining prompts
- **Tool definitions are prompts** โ€” how you describe a tool directly affects whether the LLM calls it correctly
- **Structured output matters** โ€” getting the LLM to return parseable, consistent format requires careful prompt design
- **Context window management** โ€” large diffs need truncation strategy or the LLM loses coherence
- **LangChain version pinning** โ€” newer LangChain versions break tool-calling with older Gemini/Groq integrations
---
## ๐Ÿ”ฎ What I'd Build Next
- [ ] Add evaluation metric: measure review quality against human-written reviews
- [ ] Support for multi-file reasoning across a full repo
- [ ] Webhook integration โ€” auto-review every new PR on push
- [ ] Fine-tuned reviewer model trained on accepted/rejected PR feedback
---
## ๐Ÿ—บ๏ธ Portfolio Context
This is part of a 3-project AI/ML portfolio showing progression:
| Project | Skill Demonstrated |
|---------|-------------------|
| [Banking77 Intent Classifier](https://github.com/aneebnaqvi15/banking77-intent-classifier) | Fine-tuning (DistilBERT + LoRA) |
| [Multi-Doc RAG Assistant](#) | Retrieval systems (LangChain + ChromaDB) |
| **AI Code Review Agent** (this) | Autonomous agents + tool-calling |
---
## ๐Ÿ‘จโ€๐Ÿ’ป Author
**Aneeb Naqvi** โ€” CS Graduate, Full-Stack & AI Engineer
[![GitHub](https://img.shields.io/badge/GitHub-aneebnaqvi15-black)](https://github.com/aneebnaqvi15)