--- title: AI-CodeLens emoji: 🔍 colorFrom: indigo colorTo: blue sdk: docker pinned: false --- # 🔍 CodeLens — Agentic Code Intelligence **CodeLens** is a **ReAct-style Agentic RAG system** for understanding any GitHub codebase. It goes beyond simple search — it *thinks*, *plans*, and *investigates* codebases like a senior developer would. --- ## 🚀 Key Features * **🧠 Agentic Brain (Plan & Execute)**: An LLM planner breaks your query into a multi-step investigation plan. An execution agent then iteratively calls tools, accumulates context, and synthesizes a final answer. * **🔀 Hybrid RAG Search**: Combines **ChromaDB** semantic similarity with **Grep-style** keyword matching for maximum code retrieval accuracy. * **🔧 Tool-Calling Agent Loop**: The agent autonomously uses `search_code`, `open_file`, and `list_files` tools — deciding which to use at each step based on what it has found. * **🐛 Bug Detection Mode**: Mention "bug", "error", or "security" in your query to trigger a specialized **Security Engineer mode** that produces a full vulnerability/bug report with recommended fixes. * **🤖 LLM Re-Ranker**: Filters retrieved code chunks using the LLM itself, prioritizing actual logic over boilerplate and UI wrappers. * **💾 Persistent Caching**: Saves vector store state to disk — re-loading a previously indexed repo is instant. * **🌐 Full Stack Support**: Python, JS/TS (TSX/JSX), Go, Rust, C++, Java, and more. --- ## 🏗️ Architecture ``` User Query ↓ Planner (LLM) → generates step-by-step investigation plan ↓ Execution Agent Loop: → Reason: decides which tool to use → Act: search_code / open_file / list_files → Observe: accumulates tool results as context ↓ Synthesizer (LLM) → final answer grounded in actual code ``` This is a **ReAct (Reason + Act)** pattern combined with **Plan & Execute** — the two most foundational agentic AI patterns. --- ## 🛠️ Technical Stack * **UI**: [Streamlit](https://streamlit.io/) * **Agent System**: Custom ReAct Agent (`planner.py` + `agent.py`) * **Tool System**: `tools.py` — `search_code`, `open_file`, `list_files` * **RAG Pipeline**: [LangChain](https://www.langchain.com/) with Hybrid Search * **Vector Database**: [ChromaDB](https://www.trychroma.com/) * **Embeddings**: `intfloat/e5-small-v2` or `all-MiniLM-L6-v2` (Local CPU) * **LLM Engine**: `LongCat-Flash-Lite` via OpenAI-compatible API * **Repo Loader**: `GitLoader` with multi-branch auto-detection (main/master) --- ## 📁 Project Structure ``` codelens/ ├── app.py # Streamlit UI with Quick Search + Agentic Search ├── rag_pipeline.py # Hybrid RAG: ChromaDB + Grep search + LLM re-ranker ├── planner.py # LLM-based query planner → JSON step list ├── agent.py # ReAct execution loop + Bug Detection mode ├── tools.py # search_code, open_file, list_files ├── utils_llm.py # Shared LLM instance factory └── requirements.txt ``` --- ## 🛰️ Deployment on Hugging Face Spaces ### 1. Configure Environment Secrets Go to your HF Space **Settings** → **Variables and secrets**: - **Key**: `OPENAI_API_KEY` - **Value**: *Your LongCat/OpenAI API Key* ### 2. Dockerfile The project uses Docker SDK. The `Dockerfile` is already configured to run the Streamlit app on port 7860. --- ## 🛠️ Local Installation 1. **Clone this repository**: ```bash git clone https://github.com/your-username/codelens.git cd codelens ``` 2. **Install dependencies**: ```bash pip install -r requirements.txt ``` 3. **Run the application**: ```bash streamlit run app.py ``` --- ## 📖 How to Use 1. **Enter a Public GitHub URL** in the sidebar — the bot will clone and index the repo. 2. **Wait for Indexing** — ~2-5 min for 1,000 chunks on standard CPU. Cached on subsequent loads. 3. **Quick Search** — Fast single-step hybrid RAG answer. 4. **Agentic Search** — Multi-step investigation: the agent plans, searches, reads files, and synthesizes. 5. **Bug Detection** — Include "bug", "error", "security", or "vulnerability" in your query for a full bug report. ### Example Queries - *"Trace the full user authentication flow"* - *"How is the database connected and what ORM is used?"* - *"Find potential security bugs in the input validation"* - *"What design patterns are used in this project?"* --- ## 💬 Interview Summary > *"I built a ReAct-style agentic system where an LLM planner decomposes queries into steps, then an execution agent iteratively calls tools — semantic search, file reading, directory listing — accumulating context before synthesizing a final answer. It's grounded by a hybrid RAG pipeline using ChromaDB and grep-based keyword search, with an LLM re-ranker to filter results."* --- ### 🐱 Built with ❤️ using LangChain, ChromaDB, and Streamlit *Intelligent Code Understanding, powered by Agentic AI.*