Spaces:

kerdosdotio
/

Custom-LLM-Chat

Running

App Files Files Community

Bhaskar Ram commited on Feb 25

Commit

55953aa

0 Parent(s):

Fix Gradio 6.x compatibility errors

Browse files

Files changed (8) hide show

README.md +115 -0
app.py +191 -0
rag/__init__.py +0 -0
rag/chain.py +71 -0
rag/document_loader.py +62 -0
rag/embedder.py +81 -0
rag/retriever.py +37 -0
requirements.txt +7 -0

README.md ADDED Viewed

	@@ -0,0 +1,115 @@

+---
+title: Enterprise Document Q&A (RAG)
+emoji: 🏢
+colorFrom: blue
+colorTo: indigo
+sdk: gradio
+sdk_version: "6.6.0"
+app_file: app.py
+pinned: false
+license: mit
+tags:
+  - rag
+  - document-qa
+  - enterprise
+  - llama
+  - langchain
+  - faiss
+  - gradio
+  - nlp
+  - question-answering
+---
+# 🏢 Enterprise Document Q&A — RAG System
+> **Upload your company documents. Ask questions. Get answers — strictly from your data.**
+A production-ready **Retrieval-Augmented Generation (RAG)** system built for businesses, enterprises, and private-sector organizations. Powered by **Llama 3** and **FAISS**, it lets your teams query internal documents through a clean chat interface — with zero hallucination from outside knowledge.
+---
+## ✨ Features
+| Feature                       | Details                                                    |
+| ----------------------------- | ---------------------------------------------------------- |
+| 📄 **Multi-format ingestion** | PDF, DOCX, TXT, MD, CSV                                    |
+| 🧠 **Open-source LLM**        | `meta-llama/Llama-3.1-8B-Instruct` via HF Inference API    |
+| 🔒 **Strictly grounded**      | Answers only from your uploaded documents                  |
+| 📦 **Multi-document**         | Upload and query across multiple files simultaneously      |
+| 💬 **Multi-turn chat**        | Maintains conversation context across questions            |
+| ⚡ **Fast**                   | CPU-friendly embeddings (`all-MiniLM-L6-v2` + FAISS)       |
+| 🔑 **Secure**                 | Files processed in-session only — never stored permanently |
+---
+## 🚀 How to Use
+### On Hugging Face Spaces
+1. Upload your documents (PDF, DOCX, TXT) using the left panel
+2. Click **Index Documents**
+3. Enter your [Hugging Face API token](https://huggingface.co/settings/tokens) _(Write access required for Llama 3)_
+4. Ask questions in the chat!
+### Self-Hosted / Local
+```bash
+git clone https://huggingface.co/kerdosdotio/Custom-LLM-Chat
+cd Custom-LLM-Chat
+pip install -r requirements.txt
+HF_TOKEN=hf_your_token python app.py
+```
+---
+## 🏗️ Architecture
+```
+User Uploads Files
+      ↓
+Document Parser (PDF / DOCX / TXT)
+      ↓
+Text Chunking (512 chars, 64 overlap)
+      ↓
+Embeddings (all-MiniLM-L6-v2)
+      ↓
+FAISS Vector Index (in-memory)
+      ↓
+User Question → Similarity Search → Top-K Chunks
+      ↓
+Llama 3.1 8B — answers ONLY from retrieved chunks
+      ↓
+Response + Source Citations
+```
+---
+## 🔧 Tech Stack
+- **UI**: [Gradio](https://gradio.app)
+- **LLM**: `meta-llama/Llama-3.1-8B-Instruct`
+- **Embeddings**: `sentence-transformers/all-MiniLM-L6-v2`
+- **Vector Store**: [FAISS](https://github.com/facebookresearch/faiss)
+- **Document Parsing**: PyMuPDF, python-docx
+---
+## 💼 Use Cases
+- **Customer Support**: Index your product manuals, FAQs, and policies
+- **HR & Legal**: Query employee handbooks, contracts, and compliance docs
+- **Sales Enablement**: Search product specs, case studies, and pricing docs
+- **IT Helpdesk**: Query runbooks, troubleshooting guides, and SOPs
+---
+## 🔐 Privacy
+- Uploaded documents are **processed in-memory** and **not stored** after your session ends
+- For persistent storage or on-premise deployment, clone and self-host this repository
+---
+## 📄 License
+MIT License — free for commercial and private use.

app.py ADDED Viewed

	@@ -0,0 +1,191 @@

+"""
+app.py — Enterprise Document Q&A (RAG)
+Powered by Llama 3 + FAISS + Sentence Transformers
+Hosted on Hugging Face Spaces
+"""
+import os
+import gradio as gr
+from rag.document_loader import load_documents
+from rag.embedder import build_index, add_to_index
+from rag.retriever import retrieve
+from rag.chain import answer
+# ─────────────────────────────────────────────
+# State helpers
+# ─────────────────────────────────────────────
+def get_hf_token(user_token: str) -> str:
+    """Prefer user-supplied token; fall back to Space secret."""
+    t = user_token.strip() if user_token else ""
+    return t or os.environ.get("HF_TOKEN", "")
+# ─────────────────────────────────────────────
+# Gradio handlers
+# ─────────────────────────────────────────────
+def process_files(files, current_index, status_box):
+    """Parse uploaded files and build / extend the FAISS index."""
+    if not files:
+        return current_index, "⚠️ No files uploaded."
+    file_paths = [f.name for f in files] if hasattr(files[0], "name") else files
+    docs = load_documents(file_paths)
+    if not docs:
+        return current_index, "❌ Could not extract text from the uploaded files. Please upload PDF, DOCX, or TXT files."
+    try:
+        if current_index is None:
+            idx = build_index(docs)
+        else:
+            idx = add_to_index(current_index, docs)
+    except Exception as e:
+        return current_index, f"❌ Failed to build index: {e}"
+    sources = list({d["source"] for d in docs})
+    total_chunks = idx.index.ntotal
+    msg = (
+        f"✅ Indexed {len(docs)} file(s): {', '.join(sources)}\n"
+        f"📦 Total chunks in knowledge base: {total_chunks}"
+    )
+    return idx, msg
+def chat(user_message, history, vector_index, hf_token_input, top_k):
+    """Main chat handler — retrieves context and calls the LLM."""
+    if not user_message.strip():
+        return history, ""
+    hf_token = get_hf_token(hf_token_input)
+    if not hf_token:
+        history = history + [(user_message, "⚠️ Please provide a Hugging Face API token to use the chat.")]
+        return history, ""
+    if vector_index is None:
+        history = history + [(user_message, "⚠️ Please upload at least one document first.")]
+        return history, ""
+    try:
+        chunks = retrieve(user_message, vector_index, top_k=int(top_k))
+        bot_reply = answer(user_message, chunks, hf_token, chat_history=history)
+    except Exception as e:
+        bot_reply = f"❌ Error: {e}"
+    history = history + [(user_message, bot_reply)]
+    return history, ""
+def reset_all():
+    """Clear index and chat."""
+    return None, [], "🗑️ Knowledge base and chat cleared.", ""
+# ─────────────────────────────────────────────
+# UI
+# ─────────────────────────────────────────────
+CSS = """
+#title { text-align: center; }
+#subtitle { text-align: center; color: #666; margin-bottom: 8px; }
+.upload-box { border: 2px dashed #4f8ef7 !important; border-radius: 12px !important; }
+#status-box { font-size: 0.9em; }
+footer { display: none !important; }
+"""
+with gr.Blocks(title="Enterprise Doc Q&A") as demo:
+    # ── Header ───────────────────────────────
+    gr.Markdown("# 🏢 Enterprise Document Q&A", elem_id="title")
+    gr.Markdown(
+        "Upload your company documents (PDF, DOCX, TXT) and ask questions. "
+        "The AI answers **only from your data** — never from outside knowledge.",
+        elem_id="subtitle",
+    )
+    # ── Shared state ─────────────────────────
+    vector_index = gr.State(None)
+    with gr.Row():
+        # ── Left panel: Upload + config ──────
+        with gr.Column(scale=1, min_width=300):
+            gr.Markdown("### 📂 Upload Documents")
+            file_upload = gr.File(
+                file_count="multiple",
+                file_types=[".pdf", ".docx", ".txt", ".md", ".csv"],
+                label="Drag & drop or click to upload",
+                elem_classes=["upload-box"],
+            )
+            index_btn = gr.Button("📥 Index Documents", variant="primary")
+            status_box = gr.Textbox(
+                label="Status",
+                interactive=False,
+                lines=3,
+                elem_id="status-box",
+            )
+            gr.Markdown("### ⚙️ Settings")
+            hf_token_input = gr.Textbox(
+                label="Hugging Face Token (optional if Space secret is set)",
+                placeholder="hf_...",
+                type="password",
+                value="",
+            )
+            top_k_slider = gr.Slider(
+                minimum=1, maximum=10, value=5, step=1,
+                label="Chunks to retrieve (top-K)",
+            )
+            reset_btn = gr.Button("🗑️ Clear All", variant="stop")
+        # ── Right panel: Chat ─────────────────
+        with gr.Column(scale=2):
+            gr.Markdown("### 💬 Ask Questions")
+            chatbot = gr.Chatbot(height=460, show_label=False)
+            with gr.Row():
+                user_input = gr.Textbox(
+                    placeholder="Ask a question about your documents...",
+                    show_label=False,
+                    scale=5,
+                    container=False,
+                )
+                send_btn = gr.Button("Send ▶", variant="primary", scale=1)
+    # ── Examples ─────────────────────────────
+    gr.Examples(
+        examples=[
+            ["What is the refund policy?"],
+            ["Summarize the key points of this document."],
+            ["What are the terms of service?"],
+            ["Who is the contact person for support?"],
+        ],
+        inputs=user_input,
+    )
+    # ── Event wiring ──────────────────────────
+    index_btn.click(
+        fn=process_files,
+        inputs=[file_upload, vector_index, status_box],
+        outputs=[vector_index, status_box],
+    )
+    send_btn.click(
+        fn=chat,
+        inputs=[user_input, chatbot, vector_index, hf_token_input, top_k_slider],
+        outputs=[chatbot, user_input],
+    )
+    user_input.submit(
+        fn=chat,
+        inputs=[user_input, chatbot, vector_index, hf_token_input, top_k_slider],
+        outputs=[chatbot, user_input],
+    )
+    reset_btn.click(
+        fn=reset_all,
+        inputs=[],
+        outputs=[vector_index, chatbot, status_box, user_input],
+    )
+if __name__ == "__main__":
+    demo.launch(show_api=False, css=CSS, theme=gr.themes.Soft())

rag/__init__.py ADDED Viewed

File without changes

rag/chain.py ADDED Viewed

	@@ -0,0 +1,71 @@

+"""
+chain.py
+Calls the LLM via HF Inference API with a strict RAG prompt.
+Only answers from the retrieved context — never from general knowledge.
+"""
+from __future__ import annotations
+from huggingface_hub import InferenceClient
+SYSTEM_PROMPT = """You are an enterprise document assistant. Your ONLY job is to answer questions using the provided document context below.
+STRICT RULES:
+1. Answer ONLY using information explicitly found in the provided context.
+2. Do NOT use any outside knowledge or assumptions.
+3. If the answer is not found in the context, respond EXACTLY with: "I don't have that information in the uploaded documents."
+4. Always cite the source document name(s) in your answer using [Source: <filename>].
+5. Be concise and professional.
+Context from uploaded documents:
+---
+{context}
+---
+"""
+LLM_MODEL = "meta-llama/Llama-3.1-8B-Instruct"
+MAX_NEW_TOKENS = 1024
+TEMPERATURE = 0.1   # Low temperature for factual, grounded responses
+def build_context(chunks: list[dict]) -> str:
+    """Format retrieved chunks into a readable context block."""
+    parts = []
+    for i, chunk in enumerate(chunks, 1):
+        parts.append(f"[{i}] (Source: {chunk['source']})\n{chunk['text']}")
+    return "\n\n".join(parts)
+def answer(
+    query: str,
+    context_chunks: list[dict],
+    hf_token: str,
+    chat_history: list[tuple[str, str]] | None = None,
+) -> str:
+    """
+    Call Llama 3 via HF Inference API to answer the query
+    grounded strictly in context_chunks.
+    """
+    if not context_chunks:
+        return "I don't have that information in the uploaded documents."
+    context = build_context(context_chunks)
+    system_msg = SYSTEM_PROMPT.format(context=context)
+    # Build message history for multi-turn conversation
+    messages = [{"role": "system", "content": system_msg}]
+    if chat_history:
+        for user_msg, bot_msg in chat_history[-4:]:  # keep last 4 turns for context
+            if user_msg:
+                messages.append({"role": "user", "content": user_msg})
+            if bot_msg:
+                messages.append({"role": "assistant", "content": bot_msg})
+    messages.append({"role": "user", "content": query})
+    client = InferenceClient(token=hf_token)
+    response = client.chat_completion(
+        model=LLM_MODEL,
+        messages=messages,
+        max_tokens=MAX_NEW_TOKENS,
+        temperature=TEMPERATURE,
+    )
+    return response.choices[0].message.content.strip()

rag/document_loader.py ADDED Viewed

	@@ -0,0 +1,62 @@

+"""
+document_loader.py
+Parses uploaded files (PDF, DOCX, TXT/MD) into plain text.
+"""
+import os
+from pathlib import Path
+def load_documents(file_paths: list[str]) -> list[dict]:
+    """
+    Given a list of file paths, parse each into a dict:
+      { "source": filename, "text": full text content }
+    Supports: .pdf, .docx, .txt, .md
+    """
+    docs = []
+    for path in file_paths:
+        if path is None:
+            continue
+        ext = Path(path).suffix.lower()
+        name = Path(path).name
+        try:
+            if ext == ".pdf":
+                text = _load_pdf(path)
+            elif ext == ".docx":
+                text = _load_docx(path)
+            elif ext in (".txt", ".md", ".csv"):
+                text = _load_text(path)
+            else:
+                print(f"[Loader] Unsupported file type: {ext} — skipping {name}")
+                continue
+            if text.strip():
+                docs.append({"source": name, "text": text})
+            else:
+                print(f"[Loader] Empty content from {name} — skipping")
+        except Exception as e:
+            print(f"[Loader] Failed to load {name}: {e}")
+    return docs
+def _load_pdf(path: str) -> str:
+    import fitz  # PyMuPDF
+    doc = fitz.open(path)
+    pages = []
+    for page in doc:
+        pages.append(page.get_text("text"))
+    doc.close()
+    return "\n".join(pages)
+def _load_docx(path: str) -> str:
+    from docx import Document
+    doc = Document(path)
+    paragraphs = [p.text for p in doc.paragraphs if p.text.strip()]
+    return "\n".join(paragraphs)
+def _load_text(path: str) -> str:
+    with open(path, "r", encoding="utf-8", errors="ignore") as f:
+        return f.read()

rag/embedder.py ADDED Viewed

	@@ -0,0 +1,81 @@

+"""
+embedder.py
+Chunks raw text documents and builds an in-memory FAISS vector index.
+"""
+from __future__ import annotations
+import numpy as np
+from dataclasses import dataclass, field
+CHUNK_SIZE = 512        # characters
+CHUNK_OVERLAP = 64      # characters
+EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
+@dataclass
+class VectorIndex:
+    """Holds chunks, their embeddings, and the FAISS index."""
+    chunks: list[dict] = field(default_factory=list)   # {"source", "text"}
+    index: object = None                                # faiss.IndexFlatL2
+    embedder: object = None                             # SentenceTransformer
+def _chunk_text(source: str, text: str) -> list[dict]:
+    """Split text into overlapping chunks."""
+    chunks = []
+    start = 0
+    while start < len(text):
+        end = start + CHUNK_SIZE
+        chunk_text = text[start:end]
+        if chunk_text.strip():
+            chunks.append({"source": source, "text": chunk_text})
+        start += CHUNK_SIZE - CHUNK_OVERLAP
+    return chunks
+def build_index(docs: list[dict]) -> VectorIndex:
+    """
+    Takes list of {"source", "text"} dicts.
+    Returns a VectorIndex with embeddings stored in FAISS.
+    """
+    import faiss
+    from sentence_transformers import SentenceTransformer
+    # Chunk all documents
+    all_chunks = []
+    for doc in docs:
+        all_chunks.extend(_chunk_text(doc["source"], doc["text"]))
+    if not all_chunks:
+        raise ValueError("No text chunks could be extracted from the uploaded files.")
+    print(f"[Embedder] Embedding {len(all_chunks)} chunks...")
+    model = SentenceTransformer(EMBEDDING_MODEL)
+    texts = [c["text"] for c in all_chunks]
+    embeddings = model.encode(texts, show_progress_bar=False, batch_size=32)
+    embeddings = np.array(embeddings, dtype="float32")
+    dim = embeddings.shape[1]
+    index = faiss.IndexFlatL2(dim)
+    index.add(embeddings)
+    print(f"[Embedder] Index built: {index.ntotal} vectors, dim={dim}")
+    return VectorIndex(chunks=all_chunks, index=index, embedder=model)
+def add_to_index(vector_index: VectorIndex, docs: list[dict]) -> VectorIndex:
+    """Incrementally add new docs to an existing index."""
+    import faiss
+    import numpy as np
+    new_chunks = []
+    for doc in docs:
+        new_chunks.extend(_chunk_text(doc["source"], doc["text"]))
+    texts = [c["text"] for c in new_chunks]
+    embeddings = vector_index.embedder.encode(texts, show_progress_bar=False, batch_size=32)
+    embeddings = np.array(embeddings, dtype="float32")
+    vector_index.index.add(embeddings)
+    vector_index.chunks.extend(new_chunks)
+    return vector_index

rag/retriever.py ADDED Viewed

	@@ -0,0 +1,37 @@

+"""
+retriever.py
+Performs similarity search against the FAISS index.
+"""
+from __future__ import annotations
+import numpy as np
+from rag.embedder import VectorIndex
+DEFAULT_TOP_K = 5
+def retrieve(query: str, vector_index: VectorIndex, top_k: int = DEFAULT_TOP_K) -> list[dict]:
+    """
+    Embed the query and return top_k most similar chunks.
+    Each result: {"source": str, "text": str, "score": float}
+    """
+    if vector_index is None or vector_index.index is None:
+        return []
+    query_embedding = vector_index.embedder.encode([query], show_progress_bar=False)
+    query_embedding = np.array(query_embedding, dtype="float32")
+    n_results = min(top_k, vector_index.index.ntotal)
+    distances, indices = vector_index.index.search(query_embedding, n_results)
+    results = []
+    for dist, idx in zip(distances[0], indices[0]):
+        if idx == -1:
+            continue
+        chunk = vector_index.chunks[idx]
+        results.append({
+            "source": chunk["source"],
+            "text": chunk["text"],
+            "score": float(dist),
+        })
+    return results

requirements.txt ADDED Viewed

	@@ -0,0 +1,7 @@

+gradio>=6.6.0
+sentence-transformers>=2.7.0
+faiss-cpu>=1.7.4
+PyMuPDF>=1.24.0
+python-docx>=1.1.0
+huggingface-hub>=0.23.0
+numpy>=1.24.0