aneeb15's picture
Switch to Docker SDK with Python 3.10 to fix build errors
3e00265
metadata
title: AI Code Review Agent
emoji: ๐Ÿš€
sdk: docker
app_port: 7860
pinned: false

๐Ÿ”ฌ AI Code Review Agent

An autonomous LLM-powered agent that reviews GitHub Pull Requests โ€” fetches diffs, analyzes code changes, and returns structured feedback with severity ratings and interactive visualizations.

Python LangChain Streamlit HuggingFace


๐Ÿ“Œ What This Project Is

This is Project 3 of my AI/ML portfolio โ€” an autonomous agent built with LangChain's tool-calling framework. Unlike a simple LLM wrapper that just sends code to a model, this is a true agent: it decides which tools to call, in what order, and reasons over intermediate results before producing a final review.

The agent follows the ReAct pattern (Reason + Act):

User pastes PR URL
       โ†“
Agent calls get_pr_metadata()   โ†’ understands intent of PR
       โ†“
Agent calls get_pr_diff()       โ†’ fetches what changed
       โ†“
Agent calls get_file_content()  โ†’ gets surrounding context if needed
       โ†“
LLM reasons over all collected data
       โ†“
Structured review: issues, severity, fixes

No hardcoded tool sequence โ€” the LLM decides autonomously.


โœจ Features

  • Autonomous agent loop โ€” LLM decides which GitHub API tools to call and when
  • 3 custom tools โ€” PR metadata, code diff, file content fetcher
  • Interactive diff visualization โ€” bar chart showing additions/deletions per file
  • Severity breakdown chart โ€” donut chart of High/Medium/Low issues
  • Inline diff viewer โ€” color-coded added/removed lines per file
  • Structured review cards โ€” each issue has location, severity, problem, and fix
  • Multi-provider support โ€” works with Groq, Google Gemini, or OpenAI
  • Session-persisted keys โ€” save API keys once per session, reuse across reviews
  • Free-text model input โ€” type any model name, no dropdown restrictions

๐Ÿง  How The Agent Works (Technical)

Tool-Calling Architecture

The LLM does not execute code. It outputs structured JSON requesting a tool call:

{
  "tool": "get_pr_diff",
  "arguments": {"pr_url": "https://github.com/owner/repo/pull/123"}
}

LangChain reads this output, executes the Python function, returns the result to the LLM as context, and the loop continues until the LLM decides it has enough information.

The Three Tools

Tool Purpose
get_pr_metadata Fetches PR title, author, description, branch info
get_pr_diff Fetches file-level diffs โ€” what lines were added/removed
get_file_content Fetches full file content for deeper context

Why This Is Not Just a Wrapper

A wrapper sends one prompt and gets one response. This agent:

  • Makes multiple sequential API calls based on intermediate findings
  • Decides dynamically whether to fetch file content based on what it sees in the diff
  • Produces structured output parsed into visual components โ€” not just a text blob

๐Ÿ—‚๏ธ Project Structure

ai-code-review-agent/
โ”œโ”€โ”€ app.py                  # Streamlit app โ€” production ready
โ”œโ”€โ”€ app_colab.ipynb         # Development notebook with all blocks
โ”œโ”€โ”€ requirements.txt        # Pinned dependencies
โ””โ”€โ”€ README.md

โš™๏ธ Setup & Installation

Local (VS Code)

# Clone the repo
git clone https://github.com/aneebnaqvi15/ai-code-review-agent
cd ai-code-review-agent

# Create virtual environment
python -m venv venv
venv\Scripts\activate        # Windows
source venv/bin/activate     # Mac/Linux

# Install dependencies
pip install -r requirements.txt

# Run the app
streamlit run app.py

App opens at http://localhost:8501

Google Colab

Open app_colab.ipynb and run all cells. Uses ngrok for public URL tunneling.


๐Ÿ”‘ API Keys Required

Key Where to Get Purpose
Groq API Key console.groq.com LLM inference (free)
Google Gemini Key aistudio.google.com Alternative LLM (free tier)
GitHub Token GitHub โ†’ Settings โ†’ Developer Settings โ†’ PAT Fetch PR data

GitHub Token permissions needed: repo (read only)


๐Ÿ“ฆ Dependencies

langchain==0.2.16
langchain-google-genai==1.0.10
langchain-community==0.2.16
langchain-groq==0.1.10
google-generativeai==0.7.2
pygithub==2.3.0
pydantic==2.7.4
streamlit==1.36.0
plotly
python-dotenv==1.0.1

๐Ÿ“Š Example Output

Given a real PR URL, the agent produces:

Metrics:

  • Total issues found, broken down by severity

Charts:

  • Bar chart: lines added vs deleted per file
  • Donut chart: High / Medium / Low issue distribution

Per-issue cards:

ISSUE: Missing input validation
FILE: auth.py
SEVERITY: High
PROBLEM: Function accepts raw user input without sanitization
FIX: Add input validation using pydantic or manual type checks before processing

๐Ÿ”ญ What I Learned

  • Agents vs wrappers โ€” the difference is dynamic tool selection, not just chaining prompts
  • Tool definitions are prompts โ€” how you describe a tool directly affects whether the LLM calls it correctly
  • Structured output matters โ€” getting the LLM to return parseable, consistent format requires careful prompt design
  • Context window management โ€” large diffs need truncation strategy or the LLM loses coherence
  • LangChain version pinning โ€” newer LangChain versions break tool-calling with older Gemini/Groq integrations

๐Ÿ”ฎ What I'd Build Next

  • Add evaluation metric: measure review quality against human-written reviews
  • Support for multi-file reasoning across a full repo
  • Webhook integration โ€” auto-review every new PR on push
  • Fine-tuned reviewer model trained on accepted/rejected PR feedback

๐Ÿ—บ๏ธ Portfolio Context

This is part of a 3-project AI/ML portfolio showing progression:

Project Skill Demonstrated
Banking77 Intent Classifier Fine-tuning (DistilBERT + LoRA)
Multi-Doc RAG Assistant Retrieval systems (LangChain + ChromaDB)
AI Code Review Agent (this) Autonomous agents + tool-calling

๐Ÿ‘จโ€๐Ÿ’ป Author

Aneeb Naqvi โ€” CS Graduate, Full-Stack & AI Engineer

GitHub