Spaces:

nnsohamnn
/

ai-codelens

Sleeping

App Files Files Community

ai-codelens / README.md

nnsohamnn

fix: update repo metadata to AI-CodeLens

e0bc5ed 2 months ago

preview code

raw

history blame contribute delete

5.06 kB

metadata

title: AI-CodeLens
emoji: 🔍
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: false

🔍 CodeLens — Agentic Code Intelligence

CodeLens is a ReAct-style Agentic RAG system for understanding any GitHub codebase. It goes beyond simple search — it thinks, plans, and investigates codebases like a senior developer would.

🚀 Key Features

🧠 Agentic Brain (Plan & Execute): An LLM planner breaks your query into a multi-step investigation plan. An execution agent then iteratively calls tools, accumulates context, and synthesizes a final answer.
🔀 Hybrid RAG Search: Combines ChromaDB semantic similarity with Grep-style keyword matching for maximum code retrieval accuracy.
🔧 Tool-Calling Agent Loop: The agent autonomously uses search_code, open_file, and list_files tools — deciding which to use at each step based on what it has found.
🐛 Bug Detection Mode: Mention "bug", "error", or "security" in your query to trigger a specialized Security Engineer mode that produces a full vulnerability/bug report with recommended fixes.
🤖 LLM Re-Ranker: Filters retrieved code chunks using the LLM itself, prioritizing actual logic over boilerplate and UI wrappers.
💾 Persistent Caching: Saves vector store state to disk — re-loading a previously indexed repo is instant.
🌐 Full Stack Support: Python, JS/TS (TSX/JSX), Go, Rust, C++, Java, and more.

🏗️ Architecture

User Query
    ↓
Planner (LLM) → generates step-by-step investigation plan
    ↓
Execution Agent Loop:
    → Reason: decides which tool to use
    → Act: search_code / open_file / list_files
    → Observe: accumulates tool results as context
    ↓
Synthesizer (LLM) → final answer grounded in actual code

This is a ReAct (Reason + Act) pattern combined with Plan & Execute — the two most foundational agentic AI patterns.

🛠️ Technical Stack

UI: Streamlit
Agent System: Custom ReAct Agent (planner.py + agent.py)
Tool System: tools.py — search_code, open_file, list_files
RAG Pipeline: LangChain with Hybrid Search
Vector Database: ChromaDB
Embeddings: intfloat/e5-small-v2 or all-MiniLM-L6-v2 (Local CPU)
LLM Engine: LongCat-Flash-Lite via OpenAI-compatible API
Repo Loader: GitLoader with multi-branch auto-detection (main/master)

📁 Project Structure

codelens/
├── app.py              # Streamlit UI with Quick Search + Agentic Search
├── rag_pipeline.py     # Hybrid RAG: ChromaDB + Grep search + LLM re-ranker
├── planner.py          # LLM-based query planner → JSON step list
├── agent.py            # ReAct execution loop + Bug Detection mode
├── tools.py            # search_code, open_file, list_files
├── utils_llm.py        # Shared LLM instance factory
└── requirements.txt

🛰️ Deployment on Hugging Face Spaces

1. Configure Environment Secrets

Go to your HF Space Settings → Variables and secrets:

Key: OPENAI_API_KEY
Value: Your LongCat/OpenAI API Key

2. Dockerfile

The project uses Docker SDK. The Dockerfile is already configured to run the Streamlit app on port 7860.

🛠️ Local Installation

Clone this repository:

git clone https://github.com/your-username/codelens.git
cd codelens

Install dependencies:
```
pip install -r requirements.txt
```
Run the application:
```
streamlit run app.py
```

📖 How to Use

Enter a Public GitHub URL in the sidebar — the bot will clone and index the repo.
Wait for Indexing — ~2-5 min for 1,000 chunks on standard CPU. Cached on subsequent loads.
Quick Search — Fast single-step hybrid RAG answer.
Agentic Search — Multi-step investigation: the agent plans, searches, reads files, and synthesizes.
Bug Detection — Include "bug", "error", "security", or "vulnerability" in your query for a full bug report.

Example Queries

"Trace the full user authentication flow"
"How is the database connected and what ORM is used?"
"Find potential security bugs in the input validation"
"What design patterns are used in this project?"

💬 Interview Summary

"I built a ReAct-style agentic system where an LLM planner decomposes queries into steps, then an execution agent iteratively calls tools — semantic search, file reading, directory listing — accumulating context before synthesizing a final answer. It's grounded by a hybrid RAG pipeline using ChromaDB and grep-based keyword search, with an LLM re-ranker to filter results."