Spaces:
Sleeping
Sleeping
| title: AI-CodeLens | |
| emoji: π | |
| colorFrom: indigo | |
| colorTo: blue | |
| sdk: docker | |
| pinned: false | |
| # π CodeLens β Agentic Code Intelligence | |
| **CodeLens** is a **ReAct-style Agentic RAG system** for understanding any GitHub codebase. It goes beyond simple search β it *thinks*, *plans*, and *investigates* codebases like a senior developer would. | |
| --- | |
| ## π Key Features | |
| * **π§ Agentic Brain (Plan & Execute)**: An LLM planner breaks your query into a multi-step investigation plan. An execution agent then iteratively calls tools, accumulates context, and synthesizes a final answer. | |
| * **π Hybrid RAG Search**: Combines **ChromaDB** semantic similarity with **Grep-style** keyword matching for maximum code retrieval accuracy. | |
| * **π§ Tool-Calling Agent Loop**: The agent autonomously uses `search_code`, `open_file`, and `list_files` tools β deciding which to use at each step based on what it has found. | |
| * **π Bug Detection Mode**: Mention "bug", "error", or "security" in your query to trigger a specialized **Security Engineer mode** that produces a full vulnerability/bug report with recommended fixes. | |
| * **π€ LLM Re-Ranker**: Filters retrieved code chunks using the LLM itself, prioritizing actual logic over boilerplate and UI wrappers. | |
| * **πΎ Persistent Caching**: Saves vector store state to disk β re-loading a previously indexed repo is instant. | |
| * **π Full Stack Support**: Python, JS/TS (TSX/JSX), Go, Rust, C++, Java, and more. | |
| --- | |
| ## ποΈ Architecture | |
| ``` | |
| User Query | |
| β | |
| Planner (LLM) β generates step-by-step investigation plan | |
| β | |
| Execution Agent Loop: | |
| β Reason: decides which tool to use | |
| β Act: search_code / open_file / list_files | |
| β Observe: accumulates tool results as context | |
| β | |
| Synthesizer (LLM) β final answer grounded in actual code | |
| ``` | |
| This is a **ReAct (Reason + Act)** pattern combined with **Plan & Execute** β the two most foundational agentic AI patterns. | |
| --- | |
| ## π οΈ Technical Stack | |
| * **UI**: [Streamlit](https://streamlit.io/) | |
| * **Agent System**: Custom ReAct Agent (`planner.py` + `agent.py`) | |
| * **Tool System**: `tools.py` β `search_code`, `open_file`, `list_files` | |
| * **RAG Pipeline**: [LangChain](https://www.langchain.com/) with Hybrid Search | |
| * **Vector Database**: [ChromaDB](https://www.trychroma.com/) | |
| * **Embeddings**: `intfloat/e5-small-v2` or `all-MiniLM-L6-v2` (Local CPU) | |
| * **LLM Engine**: `LongCat-Flash-Lite` via OpenAI-compatible API | |
| * **Repo Loader**: `GitLoader` with multi-branch auto-detection (main/master) | |
| --- | |
| ## π Project Structure | |
| ``` | |
| codelens/ | |
| βββ app.py # Streamlit UI with Quick Search + Agentic Search | |
| βββ rag_pipeline.py # Hybrid RAG: ChromaDB + Grep search + LLM re-ranker | |
| βββ planner.py # LLM-based query planner β JSON step list | |
| βββ agent.py # ReAct execution loop + Bug Detection mode | |
| βββ tools.py # search_code, open_file, list_files | |
| βββ utils_llm.py # Shared LLM instance factory | |
| βββ requirements.txt | |
| ``` | |
| --- | |
| ## π°οΈ Deployment on Hugging Face Spaces | |
| ### 1. Configure Environment Secrets | |
| Go to your HF Space **Settings** β **Variables and secrets**: | |
| - **Key**: `OPENAI_API_KEY` | |
| - **Value**: *Your LongCat/OpenAI API Key* | |
| ### 2. Dockerfile | |
| The project uses Docker SDK. The `Dockerfile` is already configured to run the Streamlit app on port 7860. | |
| --- | |
| ## π οΈ Local Installation | |
| 1. **Clone this repository**: | |
| ```bash | |
| git clone https://github.com/your-username/codelens.git | |
| cd codelens | |
| ``` | |
| 2. **Install dependencies**: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 3. **Run the application**: | |
| ```bash | |
| streamlit run app.py | |
| ``` | |
| --- | |
| ## π How to Use | |
| 1. **Enter a Public GitHub URL** in the sidebar β the bot will clone and index the repo. | |
| 2. **Wait for Indexing** β ~2-5 min for 1,000 chunks on standard CPU. Cached on subsequent loads. | |
| 3. **Quick Search** β Fast single-step hybrid RAG answer. | |
| 4. **Agentic Search** β Multi-step investigation: the agent plans, searches, reads files, and synthesizes. | |
| 5. **Bug Detection** β Include "bug", "error", "security", or "vulnerability" in your query for a full bug report. | |
| ### Example Queries | |
| - *"Trace the full user authentication flow"* | |
| - *"How is the database connected and what ORM is used?"* | |
| - *"Find potential security bugs in the input validation"* | |
| - *"What design patterns are used in this project?"* | |
| --- | |
| ## π¬ Interview Summary | |
| > *"I built a ReAct-style agentic system where an LLM planner decomposes queries into steps, then an execution agent iteratively calls tools β semantic search, file reading, directory listing β accumulating context before synthesizing a final answer. It's grounded by a hybrid RAG pipeline using ChromaDB and grep-based keyword search, with an LLM re-ranker to filter results."* | |
| --- | |
| ### π± Built with β€οΈ using LangChain, ChromaDB, and Streamlit | |
| *Intelligent Code Understanding, powered by Agentic AI.* | |