Spaces:
Sleeping
title: AI Code Review Agent
emoji: ๐
sdk: docker
app_port: 7860
pinned: false
๐ฌ AI Code Review Agent
An autonomous LLM-powered agent that reviews GitHub Pull Requests โ fetches diffs, analyzes code changes, and returns structured feedback with severity ratings and interactive visualizations.
๐ What This Project Is
This is Project 3 of my AI/ML portfolio โ an autonomous agent built with LangChain's tool-calling framework. Unlike a simple LLM wrapper that just sends code to a model, this is a true agent: it decides which tools to call, in what order, and reasons over intermediate results before producing a final review.
The agent follows the ReAct pattern (Reason + Act):
User pastes PR URL
โ
Agent calls get_pr_metadata() โ understands intent of PR
โ
Agent calls get_pr_diff() โ fetches what changed
โ
Agent calls get_file_content() โ gets surrounding context if needed
โ
LLM reasons over all collected data
โ
Structured review: issues, severity, fixes
No hardcoded tool sequence โ the LLM decides autonomously.
โจ Features
- Autonomous agent loop โ LLM decides which GitHub API tools to call and when
- 3 custom tools โ PR metadata, code diff, file content fetcher
- Interactive diff visualization โ bar chart showing additions/deletions per file
- Severity breakdown chart โ donut chart of High/Medium/Low issues
- Inline diff viewer โ color-coded added/removed lines per file
- Structured review cards โ each issue has location, severity, problem, and fix
- Multi-provider support โ works with Groq, Google Gemini, or OpenAI
- Session-persisted keys โ save API keys once per session, reuse across reviews
- Free-text model input โ type any model name, no dropdown restrictions
๐ง How The Agent Works (Technical)
Tool-Calling Architecture
The LLM does not execute code. It outputs structured JSON requesting a tool call:
{
"tool": "get_pr_diff",
"arguments": {"pr_url": "https://github.com/owner/repo/pull/123"}
}
LangChain reads this output, executes the Python function, returns the result to the LLM as context, and the loop continues until the LLM decides it has enough information.
The Three Tools
| Tool | Purpose |
|---|---|
get_pr_metadata |
Fetches PR title, author, description, branch info |
get_pr_diff |
Fetches file-level diffs โ what lines were added/removed |
get_file_content |
Fetches full file content for deeper context |
Why This Is Not Just a Wrapper
A wrapper sends one prompt and gets one response. This agent:
- Makes multiple sequential API calls based on intermediate findings
- Decides dynamically whether to fetch file content based on what it sees in the diff
- Produces structured output parsed into visual components โ not just a text blob
๐๏ธ Project Structure
ai-code-review-agent/
โโโ app.py # Streamlit app โ production ready
โโโ app_colab.ipynb # Development notebook with all blocks
โโโ requirements.txt # Pinned dependencies
โโโ README.md
โ๏ธ Setup & Installation
Local (VS Code)
# Clone the repo
git clone https://github.com/aneebnaqvi15/ai-code-review-agent
cd ai-code-review-agent
# Create virtual environment
python -m venv venv
venv\Scripts\activate # Windows
source venv/bin/activate # Mac/Linux
# Install dependencies
pip install -r requirements.txt
# Run the app
streamlit run app.py
App opens at http://localhost:8501
Google Colab
Open app_colab.ipynb and run all cells. Uses ngrok for public URL tunneling.
๐ API Keys Required
| Key | Where to Get | Purpose |
|---|---|---|
| Groq API Key | console.groq.com | LLM inference (free) |
| Google Gemini Key | aistudio.google.com | Alternative LLM (free tier) |
| GitHub Token | GitHub โ Settings โ Developer Settings โ PAT | Fetch PR data |
GitHub Token permissions needed: repo (read only)
๐ฆ Dependencies
langchain==0.2.16
langchain-google-genai==1.0.10
langchain-community==0.2.16
langchain-groq==0.1.10
google-generativeai==0.7.2
pygithub==2.3.0
pydantic==2.7.4
streamlit==1.36.0
plotly
python-dotenv==1.0.1
๐ Example Output
Given a real PR URL, the agent produces:
Metrics:
- Total issues found, broken down by severity
Charts:
- Bar chart: lines added vs deleted per file
- Donut chart: High / Medium / Low issue distribution
Per-issue cards:
ISSUE: Missing input validation
FILE: auth.py
SEVERITY: High
PROBLEM: Function accepts raw user input without sanitization
FIX: Add input validation using pydantic or manual type checks before processing
๐ญ What I Learned
- Agents vs wrappers โ the difference is dynamic tool selection, not just chaining prompts
- Tool definitions are prompts โ how you describe a tool directly affects whether the LLM calls it correctly
- Structured output matters โ getting the LLM to return parseable, consistent format requires careful prompt design
- Context window management โ large diffs need truncation strategy or the LLM loses coherence
- LangChain version pinning โ newer LangChain versions break tool-calling with older Gemini/Groq integrations
๐ฎ What I'd Build Next
- Add evaluation metric: measure review quality against human-written reviews
- Support for multi-file reasoning across a full repo
- Webhook integration โ auto-review every new PR on push
- Fine-tuned reviewer model trained on accepted/rejected PR feedback
๐บ๏ธ Portfolio Context
This is part of a 3-project AI/ML portfolio showing progression:
| Project | Skill Demonstrated |
|---|---|
| Banking77 Intent Classifier | Fine-tuning (DistilBERT + LoRA) |
| Multi-Doc RAG Assistant | Retrieval systems (LangChain + ChromaDB) |
| AI Code Review Agent (this) | Autonomous agents + tool-calling |
๐จโ๐ป Author
Aneeb Naqvi โ CS Graduate, Full-Stack & AI Engineer