Spaces:

Chirag20
/

RepoQA-RAG

Sleeping

App Files Files Community

Chirag20 commited on Apr 19

Commit

050d655

1 Parent(s): 72241b9

deployment_v1

Browse files

Files changed (14) hide show

Dockerfile +12 -0
LICENSE +201 -0
__pycache__/api.cpython-310.pyc +0 -0
__pycache__/embed_store.cpython-310.pyc +0 -0
__pycache__/ingest.cpython-310.pyc +0 -0
__pycache__/main.cpython-310.pyc +0 -0
__pycache__/query.cpython-310.pyc +0 -0
api.py +67 -0
embed_store.py +42 -0
ingest.py +199 -0
learnings.txt +4 -0
main.py +36 -0
query.py +273 -0
requirements.txt +189 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,12 @@

+FROM python:3.11-slim
+WORKDIR /app
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+EXPOSE 7860
+CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "7860"]

LICENSE ADDED Viewed

	@@ -0,0 +1,201 @@

+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright [yyyy] [name of copyright owner]
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.

__pycache__/api.cpython-310.pyc ADDED Viewed

Binary file (1.72 kB). View file

__pycache__/embed_store.cpython-310.pyc ADDED Viewed

Binary file (1.28 kB). View file

__pycache__/ingest.cpython-310.pyc ADDED Viewed

Binary file (4.87 kB). View file

__pycache__/main.cpython-310.pyc ADDED Viewed

Binary file (1.03 kB). View file

__pycache__/query.cpython-310.pyc ADDED Viewed

Binary file (6.93 kB). View file

api.py ADDED Viewed

	@@ -0,0 +1,67 @@

+from fastapi import FastAPI, HTTPException
+from pydantic import BaseModel
+from ingest import ingest_repository
+from query import (
+    VECTORSTORE_CACHE,
+    MEMORY_CACHE,
+    initialize_repo_caches,
+    ask_question,
+)
+app = FastAPI(title="RAG Backend", version="1.0.0")
+class LoadRepoRequest(BaseModel):
+    repo_url: str
+class AskRequest(BaseModel):
+    repo_name: str
+    question: str
+@app.post("/load_repo")
+def load_repo(payload: LoadRepoRequest):
+    repo_url = payload.repo_url.strip()
+    if not repo_url:
+        raise HTTPException(status_code=400, detail="repo_url is required")
+    repo_name = ingest_repository(repo_url)
+    initialize_repo_caches(repo_name)
+    print("AFTER LOAD:", VECTORSTORE_CACHE.keys(), MEMORY_CACHE.keys())
+    return {
+        "status": "success",
+        "repo": repo_name,
+    }
+@app.post("/ask")
+def ask(payload: AskRequest):
+    repo_name = payload.repo_name.strip()
+    question = payload.question.strip()
+    if not repo_name:
+        raise HTTPException(status_code=400, detail="repo_name is required")
+    if not question:
+        raise HTTPException(status_code=400, detail="question is required")
+    if repo_name not in VECTORSTORE_CACHE or repo_name not in MEMORY_CACHE:
+        raise HTTPException(status_code=400, detail="repo not loaded")
+    answer, docs = ask_question(question, repo_name)
+    sources = []
+    seen = set()
+    for doc in docs:
+        path = doc.metadata.get("path")
+        if path and path not in seen:
+            seen.add(path)
+            sources.append(path)
+    return {
+        "answer": answer,
+        "sources": sources,
+    }

embed_store.py ADDED Viewed

	@@ -0,0 +1,42 @@

+from langchain_community.embeddings import HuggingFaceEmbeddings
+from langchain_community.vectorstores import Qdrant
+from qdrant_client import QdrantClient
+def get_embeddings():
+    return HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
+def store_embeddings(chunks, embeddings):
+    client = QdrantClient(url="http://localhost:6333")
+    collection_name = "repo_docs"
+    # create collection manually (safe + explicit)
+    client.recreate_collection(
+        collection_name=collection_name,
+        vectors_config={
+            "size": 384,  # MiniLM embedding size
+            "distance": "Cosine"
+        }
+    )
+    texts = [c["content"] for c in chunks]
+    metadatas = [
+        {
+            "path": c["path"],
+            "type": c["type"],
+            "file_name": c["file_name"]
+        }
+        for c in chunks
+    ]
+    vectorstore = Qdrant(
+        client=client,
+        collection_name=collection_name,
+        embeddings=embeddings,
+    )
+    vectorstore.add_texts(texts, metadatas=metadatas)
+    return vectorstore

ingest.py ADDED Viewed

	@@ -0,0 +1,199 @@

+import os
+from git import Repo
+from qdrant_client import QdrantClient
+from qdrant_client.models import Distance, VectorParams
+from langchain_qdrant import QdrantVectorStore
+from langchain_text_splitters import RecursiveCharacterTextSplitter
+from langchain_community.embeddings import HuggingFaceEmbeddings
+def get_repo_name(repo_url):
+    cleaned = repo_url.rstrip("/")
+    name = cleaned.split("/")[-1]
+    if name.endswith(".git"):
+        name = name[:-4]
+    return name.replace("-", "_").replace(".", "_")
+def clone_repo(repo_url, local_path="cloned_repo"):
+    if os.path.exists(local_path):
+        return local_path
+    Repo.clone_from(repo_url, local_path)
+    return local_path
+def load_code_files(repo_path):
+    code_files = []
+    code_ext = (".py", ".js", ".ts", ".java", ".cpp", ".c", ".go", ".rs")
+    doc_ext = (".md", ".rst", ".txt")
+    config_ext = (".json", ".yaml", ".yml", ".toml", ".ini", ".env")
+    special_files = ("Dockerfile", "Makefile")
+    skip_dirs = (
+        ".git",
+        "node_modules",
+        "pycache",
+        "dist",
+        "build",
+        "venv",
+        ".venv",
+        "env",
+        ".env",
+        "site-packages",
+        ".idea",
+        ".vscode",
+        "coverage",
+        ".pytest_cache",
+        "logs",
+    )
+    lock_files = {"package-lock.json", "yarn.lock", "poetry.lock", "pipfile.lock"}
+    main_code_hints = ("/src/", "/core/", "/app/", "/service/","/services/","/lib/","/models/")
+    def _is_comment_or_whitespace_only(content):
+        comment_prefixes = ("#", "//", "/*", "*", "*/", "--", "<!--", "-->")
+        for line in content.splitlines():
+            stripped = line.strip()
+            if not stripped:
+                continue
+            if stripped.startswith(comment_prefixes):
+                continue
+            return False
+        return True
+    for root, _, files in os.walk(repo_path):
+        root_lower = root.lower().replace("\\", "/")
+        if any(skip in root_lower for skip in skip_dirs):
+            continue
+        for file in files:
+            file_lower = file.lower()
+            full_path = os.path.join(root, file)
+            normalized_path = full_path.replace("\\", "/").lower()
+            if file_lower in lock_files:
+                continue
+            if file_lower.endswith((".min.js", ".bundle.js")):
+                continue
+            try:
+                if os.path.getsize(full_path) > 300 * 1024:
+                    continue
+            except Exception:
+                continue
+            if file in special_files:
+                file_type = "config"
+            elif file_lower.endswith(code_ext):
+                file_type = "code"
+            elif file_lower.endswith(doc_ext):
+                file_type = "docs"
+            elif file_lower.endswith(config_ext):
+                file_type = "config"
+            else:
+                continue
+            try:
+                with open(full_path, "r", encoding="utf-8", errors="ignore") as f:
+                    content = f.read()
+                if not content.strip():
+                    continue
+                if file_lower == "__init__.py" and len(content) < 200:
+                    continue
+                if file_type != "docs" and _is_comment_or_whitespace_only(content):
+                    continue
+                is_main_code = any(hint in normalized_path for hint in main_code_hints)
+                code_files.append(
+                    {
+                        "content": content,
+                        "path": full_path,
+                        "priority": file_type,
+                        "file_name": os.path.basename(full_path),
+                        "is_main_code": is_main_code,
+                    }
+                )
+            except Exception:
+                continue
+    return code_files
+def chunk_files(code_files):
+    splitter = RecursiveCharacterTextSplitter(
+        chunk_size=800,
+        chunk_overlap=100,
+    )
+    documents = []
+    for file in code_files:
+        chunks = splitter.split_text(file["content"])
+        for chunk in chunks:
+            documents.append(
+                {
+                    "content": chunk,
+                    "path": file["path"],
+                    "file_name": os.path.basename(file["path"]),
+                    "type": file["priority"],
+                }
+            )
+    return documents
+def get_embeddings_model():
+    return HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
+def _repo_collection_name(repo_name):
+    return f"repo_docs_{repo_name}"
+def store_embeddings(chunks, repo_name):
+    collection_name = _repo_collection_name(repo_name)
+    client = QdrantClient(url=os.getenv("QDRANT_URL"), api_key=os.getenv("QDRANT_API_KEY"))
+    client.recreate_collection(
+        collection_name=collection_name,
+        vectors_config=VectorParams(size=384, distance=Distance.COSINE),
+    )
+    vectorstore = QdrantVectorStore(
+        client=client,
+        collection_name=collection_name,
+        embedding=get_embeddings_model(),
+    )
+    texts = [c["content"] for c in chunks]
+    metadatas = [
+        {
+            "path": c["path"],
+            "type": c["type"],
+            "file_name": c["file_name"],
+        }
+        for c in chunks
+    ]
+    if texts:
+        vectorstore.add_texts(texts, metadatas=metadatas)
+def ingest_repository(repo_url, base_dir="cloned_repo"):
+    repo_name = get_repo_name(repo_url)
+    local_path = os.path.join(base_dir, repo_name)
+    path = clone_repo(repo_url, local_path=local_path)
+    files = load_code_files(path)
+    chunks = chunk_files(files)
+    store_embeddings(chunks, repo_name)
+    return repo_name

learnings.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+os
+json operations
+files
+argparse

main.py ADDED Viewed

	@@ -0,0 +1,36 @@

+from ingest import clone_repo, load_code_files, chunk_files
+from embed_store import get_embeddings, store_embeddings
+from query import ask_question
+def ingest_repository(repo_url):
+    print("Cloning repository...")
+    path = clone_repo(repo_url)
+    print("Loading code files...")
+    files = load_code_files(path)
+    print("Chunking files...")
+    chunks = chunk_files(files)
+    print("Generating embeddings...")
+    embeddings = get_embeddings()
+    print("Storing embeddings...")
+    store_embeddings(chunks, embeddings)
+    print("Repository ingestion completed.")
+if __name__ == "__main__":
+    repo_url = input("Repository URL: ").strip()
+    if not repo_url:
+        raise ValueError("Repository URL cannot be empty.")
+    ingest_repository(repo_url)
+    while True:
+        question = input(">> ").strip()
+        if question.lower() == "exit":
+            break
+        if not question:
+            continue
+        answer, _ = ask_question(question)
+        print("\nAnswer:\n", answer)

query.py ADDED Viewed

	@@ -0,0 +1,273 @@

+import time
+import os
+from qdrant_client import QdrantClient
+from qdrant_client.models import Distance, VectorParams
+from langchain_qdrant import QdrantVectorStore
+from langchain_groq import ChatGroq
+from langchain_community.embeddings import HuggingFaceEmbeddings
+from dotenv import load_dotenv
+load_dotenv()
+VECTORSTORE_CACHE = {}
+MEMORY_CACHE = {}
+def _repo_collection_name(repo_name):
+    return f"repo_docs_{repo_name}"
+def _memory_collection_name(repo_name):
+    return f"memory_{repo_name}"
+def get_embeddings_model():
+    return HuggingFaceEmbeddings(
+        model_name="sentence-transformers/all-MiniLM-L6-v2"
+    )
+def get_llm():
+    groq_api_key = os.getenv("GROQ_API_KEY")
+    if not groq_api_key:
+        raise ValueError("GROQ_API_KEY is not set")
+    return ChatGroq(
+        model="llama-3.1-8b-instant",
+        temperature=0,
+        api_key=groq_api_key,
+    )
+def _invoke_text(llm, prompt):
+    result = llm.invoke(prompt)
+    if isinstance(result, str):
+        return result
+    content = getattr(result, "content", "")
+    if isinstance(content, list):
+        parts = []
+        for item in content:
+            if isinstance(item, str):
+                parts.append(item)
+            elif isinstance(item, dict):
+                text = item.get("text")
+                if text:
+                    parts.append(text)
+        return "".join(parts)
+    return str(content)
+def _get_client():
+    return QdrantClient(url=os.getenv("QDRANT_URL"), api_key=os.getenv("QDRANT_API_KEY"))
+def _ensure_collection(client, collection_name):
+    if not client.collection_exists(collection_name):
+        client.create_collection(
+            collection_name=collection_name,
+            vectors_config=VectorParams(size=384, distance=Distance.COSINE),
+        )
+def get_vectorstore(repo_name):
+    if repo_name in VECTORSTORE_CACHE:
+        return VECTORSTORE_CACHE[repo_name]
+    client = _get_client()
+    embeddings = get_embeddings_model()
+    collection_name = _repo_collection_name(repo_name)
+    _ensure_collection(client, collection_name)
+    vectorstore = QdrantVectorStore(
+        client=client,
+        collection_name=collection_name,
+        embedding=embeddings,
+    )
+    VECTORSTORE_CACHE[repo_name] = vectorstore
+    return vectorstore
+def get_memory_vectorstore(repo_name):
+    if repo_name in MEMORY_CACHE:
+        return MEMORY_CACHE[repo_name]
+    client = _get_client()
+    embeddings = get_embeddings_model()
+    collection_name = _memory_collection_name(repo_name)
+    _ensure_collection(client, collection_name)
+    memory_store = QdrantVectorStore(
+        client=client,
+        collection_name=collection_name,
+        embedding=embeddings,
+    )
+    MEMORY_CACHE[repo_name] = memory_store
+    return memory_store
+def initialize_repo_caches(repo_name):
+    get_vectorstore(repo_name)
+    get_memory_vectorstore(repo_name)
+def store_memory(query, response, repo_name):
+    if len(query.strip()) <= 10:
+        return
+    memory_text = f"User: {query}\nAssistant: {response}"
+    memory_store = get_memory_vectorstore(repo_name)
+    memory_store.add_texts(
+        [memory_text],
+        metadatas=[
+            {
+                "type": "memory",
+                "timestamp": time.time(),
+            }
+        ],
+    )
+def get_retriever(vectorstore):
+    return vectorstore.as_retriever(
+        search_type="mmr",
+        search_kwargs={"k": 6, "fetch_k": 24},
+    )
+def _get_overview_retriever(vectorstore):
+    return vectorstore.as_retriever(
+        search_type="mmr",
+        search_kwargs={"k": 10, "fetch_k": 40},
+    )
+def _looks_code_intent(query):
+    q = query.lower()
+    code_signals = [
+        "function", "method", "class", "module", "file", "implementation", "logic",
+        "algorithm", "predict", "prediction", "how does", "how is", "where is", "call",
+        "returns", "parameter", "bug", "error", "traceback", "stack", "refactor"
+    ]
+    return any(signal in q for signal in code_signals)
+def _looks_overview_intent(query):
+    q = query.lower().strip()
+    overview_signals = [
+        "what does this repository do",
+        "what does this repo do",
+        "what is this repository",
+        "what is this repo",
+        "repository summary",
+        "repo summary",
+        "overview",
+        "high level",
+        "purpose of",
+    ]
+    return any(signal in q for signal in overview_signals)
+def _select_diverse_docs(docs, max_docs=8, max_per_path=2):
+    selected = []
+    per_path = {}
+    for doc in docs:
+        path = doc.metadata.get("path", "")
+        count = per_path.get(path, 0)
+        if count >= max_per_path:
+            continue
+        selected.append(doc)
+        per_path[path] = count + 1
+        if len(selected) >= max_docs:
+            break
+    return selected or docs[:max_docs]
+def _rewrite_query(question, conversation_chunks, llm):
+    if not conversation_chunks:
+        return question
+    memory_context = "\n\n".join(conversation_chunks)
+    rewrite_prompt = f"""
+Rewrite the user question into a standalone retrieval query.
+Use relevant details from prior conversation only when needed to resolve references.
+Keep technical names, filenames, class names, and function names unchanged.
+Return only the rewritten query.
+Relevant Past Conversation:
+{memory_context}
+Original Question:
+{question}
+"""
+    rewritten = _invoke_text(llm, rewrite_prompt).strip()
+    if not rewritten:
+        return question
+    rewritten = rewritten.replace("\n", " ").strip('"\' ')
+    return rewritten or question
+def ask_question(query, repo_name):
+    vectorstore = get_vectorstore(repo_name)
+    llm = get_llm()
+    memory_store = get_memory_vectorstore(repo_name)
+    memory_retriever = memory_store.as_retriever(search_kwargs={"k": 3})
+    memory_docs = memory_retriever.invoke(query)
+    conversation_chunks = [d.page_content for d in memory_docs]
+    rewritten_query = _rewrite_query(query, conversation_chunks, llm)
+    is_overview_query = _looks_overview_intent(query) or _looks_overview_intent(rewritten_query)
+    retriever = _get_overview_retriever(vectorstore) if is_overview_query else get_retriever(vectorstore)
+    repo_docs = retriever.invoke(rewritten_query)
+    repo_docs = _select_diverse_docs(repo_docs, max_docs=10 if is_overview_query else 8)
+    if (not is_overview_query) and (_looks_code_intent(query) or _looks_code_intent(rewritten_query)):
+        code_docs = [d for d in repo_docs if d.metadata.get("type") == "code"]
+        if code_docs:
+            repo_docs = _select_diverse_docs(code_docs, max_docs=8)
+    conversation_context = "\n\n".join([d.page_content for d in memory_docs]) or "None"
+    code_context = "\n\n".join([doc.page_content for doc in repo_docs])
+    context = (
+        f"Relevant Past Conversation:\n{conversation_context}\n\n"
+        f"Relevant Code Context:\n{code_context}\n\n"
+        f"Question:\n{query}"
+    )
+    prompt = f"""
+You are a senior software engineer.
+Use:
+* Relevant Past Conversation to resolve references like "that function"
+* Relevant Code Context for factual answers
+If exact answer is missing, infer logically from code and mention it is an inference.
+Be concise and technical.
+Context:
+{context}
+"""
+    response = _invoke_text(llm, prompt)
+    store_memory(query, response, repo_name)
+    return response, repo_docs

requirements.txt ADDED Viewed

	@@ -0,0 +1,189 @@

+anyio==4.13.0
+apturl==0.5.2
+argon2-cffi==25.1.0
+argon2-cffi-bindings==25.1.0
+arrow==1.4.0
+asttokens==3.0.1
+async-lru==2.3.0
+attrs==26.1.0
+babel==2.18.0
+bcrypt==3.2.0
+beautifulsoup4==4.14.3
+bleach==6.3.0
+blinker==1.4
+Brlapi==0.8.3
+certifi==2026.2.25
+cffi==2.0.0
+chardet==4.0.0
+charset-normalizer==3.4.7
+click==8.0.3
+colorama==0.4.4
+comm==0.2.3
+command-not-found==0.3
+contourpy==1.3.2
+cryptography==3.4.8
+cupshelpers==1.0
+cycler==0.12.1
+dbus-python==1.2.18
+debugpy==1.8.20
+decorator==5.2.1
+defer==1.0.6
+defusedxml==0.7.1
+distro==1.7.0
+distro-info==1.1+ubuntu0.2
+duplicity==0.8.21
+exceptiongroup==1.3.1
+executing==2.2.1
+fasteners==0.14.1
+fastjsonschema==2.21.2
+filelock==3.25.2
+fonttools==4.62.1
+fqdn==1.5.1
+fsspec==2026.2.0
+future==0.18.2
+h11==0.16.0
+httpcore==1.0.9
+httplib2==0.20.2
+httpx==0.28.1
+idna==3.3
+importlib-metadata==4.6.4
+ipykernel==7.2.0
+ipython==8.39.0
+isoduration==20.11.0
+jedi==0.19.2
+jeepney==0.7.1
+Jinja2==3.1.6
+joblib==1.5.3
+json5==0.14.0
+jsonpointer==3.1.1
+jsonschema==4.26.0
+jsonschema-specifications==2025.9.1
+jupyter-events==0.12.0
+jupyter-lsp==2.3.1
+jupyter_client==8.8.0
+jupyter_core==5.9.1
+jupyter_server==2.17.0
+jupyter_server_terminals==0.5.4
+jupyterlab==4.5.6
+jupyterlab_pygments==0.3.0
+jupyterlab_server==2.28.0
+keyring==23.5.0
+kiwisolver==1.5.0
+language-selector==0.1
+lark==1.3.1
+launchpadlib==1.10.16
+lazr.restfulclient==0.14.4
+lazr.uri==1.0.6
+lockfile==0.12.2
+louis==3.20.0
+macaroonbakery==1.3.1
+Mako==1.1.3
+MarkupSafe==2.0.1
+matplotlib==3.10.8
+matplotlib-inline==0.2.1
+mistune==3.2.0
+monotonic==1.6
+more-itertools==8.10.0
+mpmath==1.3.0
+nbclient==0.10.4
+nbconvert==7.17.1
+nbformat==5.10.4
+nest-asyncio==1.6.0
+netifaces==0.11.0
+networkx==3.4.2
+notebook==7.5.5
+notebook_shim==0.2.4
+numpy==2.2.6
+nvidia-cublas-cu12==12.1.3.1
+nvidia-cuda-cupti-cu12==12.1.105
+nvidia-cuda-nvrtc-cu12==12.1.105
+nvidia-cuda-runtime-cu12==12.1.105
+nvidia-cudnn-cu12==9.1.0.70
+nvidia-cufft-cu12==11.0.2.54
+nvidia-curand-cu12==10.3.2.106
+nvidia-cusolver-cu12==11.4.5.107
+nvidia-cusparse-cu12==12.1.0.106
+nvidia-nccl-cu12==2.21.5
+nvidia-nvjitlink-cu12==12.9.86
+nvidia-nvtx-cu12==12.1.105
+oauthlib==3.2.0
+olefile==0.46
+overrides==7.7.0
+packaging==26.0
+pandas==2.3.3
+pandocfilters==1.5.1
+paramiko==2.9.3
+parso==0.8.6
+pexpect==4.8.0
+Pillow==9.0.1
+platformdirs==4.9.6
+prometheus_client==0.25.0
+prompt_toolkit==3.0.52
+protobuf==3.12.4
+psutil==7.2.2
+ptyprocess==0.7.0
+pure_eval==0.2.3
+pycairo==1.20.1
+pycparser==3.0
+pycups==2.0.1
+Pygments==2.20.0
+PyGObject==3.42.1
+PyJWT==2.3.0
+pymacaroons==0.13.0
+PyNaCl==1.5.0
+pyparsing==3.3.2
+pyRFC3339==1.1
+python-apt==2.4.0+ubuntu4.1
+python-dateutil==2.9.0.post0
+python-debian==0.1.43+ubuntu1.1
+python-json-logger==4.1.0
+pytz==2022.1
+pyxdg==0.27
+PyYAML==5.4.1
+pyzmq==27.1.0
+referencing==0.37.0
+reportlab==3.6.8
+requests==2.33.1
+rfc3339-validator==0.1.4
+rfc3986-validator==0.1.1
+rfc3987-syntax==1.1.0
+rpds-py==0.30.0
+scikit-learn==1.7.2
+scipy==1.15.3
+screen-resolution-extra==0.0.0
+SecretStorage==3.3.1
+Send2Trash==2.1.0
+six==1.16.0
+soupsieve==2.8.3
+stack-data==0.6.3
+sympy==1.13.1
+systemd-python==234
+terminado==0.18.1
+threadpoolctl==3.6.0
+tinycss2==1.4.0
+tomli==2.4.1
+torch==2.5.1+cu121
+torchaudio==2.5.1+cu121
+torchvision==0.20.1+cu121
+tornado==6.5.5
+tqdm==4.67.3
+traitlets==5.14.3
+triton==3.1.0
+typing_extensions==4.15.0
+tzdata==2026.1
+ubuntu-drivers-common==0.0.0
+ubuntu-pro-client==8001
+ufw==0.36.1
+unattended-upgrades==0.1
+uri-template==1.3.0
+urllib3==1.26.5
+usb-creator==0.3.7
+uv==0.11.3
+wadllib==1.3.6
+wcwidth==0.6.0
+webcolors==25.10.0
+webencodings==0.5.1
+websocket-client==1.9.0
+xdg==5
+xkit==0.0.0
+zipp==1.0.0