---
title: AI-CodeLens
emoji: 🔍
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: false
---

# 🔍 CodeLens — Agentic Code Intelligence

**CodeLens** is a **ReAct-style Agentic RAG system** for understanding any GitHub codebase. It goes beyond simple search — it *thinks*, *plans*, and *investigates* codebases like a senior developer would.

---

## 🚀 Key Features

*   **🧠 Agentic Brain (Plan & Execute)**: An LLM planner breaks your query into a multi-step investigation plan. An execution agent then iteratively calls tools, accumulates context, and synthesizes a final answer.
*   **🔀 Hybrid RAG Search**: Combines **ChromaDB** semantic similarity with **Grep-style** keyword matching for maximum code retrieval accuracy.
*   **🔧 Tool-Calling Agent Loop**: The agent autonomously uses `search_code`, `open_file`, and `list_files` tools — deciding which to use at each step based on what it has found.
*   **🐛 Bug Detection Mode**: Mention "bug", "error", or "security" in your query to trigger a specialized **Security Engineer mode** that produces a full vulnerability/bug report with recommended fixes.
*   **🤖 LLM Re-Ranker**: Filters retrieved code chunks using the LLM itself, prioritizing actual logic over boilerplate and UI wrappers.
*   **💾 Persistent Caching**: Saves vector store state to disk — re-loading a previously indexed repo is instant.
*   **🌐 Full Stack Support**: Python, JS/TS (TSX/JSX), Go, Rust, C++, Java, and more.

---

## 🏗️ Architecture

```
User Query
    ↓
Planner (LLM) → generates step-by-step investigation plan
    ↓
Execution Agent Loop:
    → Reason: decides which tool to use
    → Act: search_code / open_file / list_files
    → Observe: accumulates tool results as context
    ↓
Synthesizer (LLM) → final answer grounded in actual code
```

This is a **ReAct (Reason + Act)** pattern combined with **Plan & Execute** — the two most foundational agentic AI patterns.

---

## 🛠️ Technical Stack

*   **UI**: [Streamlit](https://streamlit.io/)
*   **Agent System**: Custom ReAct Agent (`planner.py` + `agent.py`)
*   **Tool System**: `tools.py` — `search_code`, `open_file`, `list_files`
*   **RAG Pipeline**: [LangChain](https://www.langchain.com/) with Hybrid Search
*   **Vector Database**: [ChromaDB](https://www.trychroma.com/)
*   **Embeddings**: `intfloat/e5-small-v2` or `all-MiniLM-L6-v2` (Local CPU)
*   **LLM Engine**: `LongCat-Flash-Lite` via OpenAI-compatible API
*   **Repo Loader**: `GitLoader` with multi-branch auto-detection (main/master)

---

## 📁 Project Structure

```
codelens/
├── app.py              # Streamlit UI with Quick Search + Agentic Search
├── rag_pipeline.py     # Hybrid RAG: ChromaDB + Grep search + LLM re-ranker
├── planner.py          # LLM-based query planner → JSON step list
├── agent.py            # ReAct execution loop + Bug Detection mode
├── tools.py            # search_code, open_file, list_files
├── utils_llm.py        # Shared LLM instance factory
└── requirements.txt
```

---

## 🛰️ Deployment on Hugging Face Spaces

### 1. Configure Environment Secrets
Go to your HF Space **Settings** → **Variables and secrets**:
-   **Key**: `OPENAI_API_KEY`
-   **Value**: *Your LongCat/OpenAI API Key*

### 2. Dockerfile
The project uses Docker SDK. The `Dockerfile` is already configured to run the Streamlit app on port 7860.

---

## 🛠️ Local Installation

1.  **Clone this repository**:
    ```bash
    git clone https://github.com/your-username/codelens.git
    cd codelens
    ```

2.  **Install dependencies**:
    ```bash
    pip install -r requirements.txt
    ```

3.  **Run the application**:
    ```bash
    streamlit run app.py
    ```

---

## 📖 How to Use

1.  **Enter a Public GitHub URL** in the sidebar — the bot will clone and index the repo.
2.  **Wait for Indexing** — ~2-5 min for 1,000 chunks on standard CPU. Cached on subsequent loads.
3.  **Quick Search** — Fast single-step hybrid RAG answer.
4.  **Agentic Search** — Multi-step investigation: the agent plans, searches, reads files, and synthesizes.
5.  **Bug Detection** — Include "bug", "error", "security", or "vulnerability" in your query for a full bug report.

### Example Queries
- *"Trace the full user authentication flow"*
- *"How is the database connected and what ORM is used?"*
- *"Find potential security bugs in the input validation"*
- *"What design patterns are used in this project?"*

---

## 💬 Interview Summary

> *"I built a ReAct-style agentic system where an LLM planner decomposes queries into steps, then an execution agent iteratively calls tools — semantic search, file reading, directory listing — accumulating context before synthesizing a final answer. It's grounded by a hybrid RAG pipeline using ChromaDB and grep-based keyword search, with an LLM re-ranker to filter results."*

---

### 🐱 Built with ❤️ using LangChain, ChromaDB, and Streamlit
*Intelligent Code Understanding, powered by Agentic AI.*