Spaces:

Anvit25
/

LLM_chatbot2

Runtime error

App Files Files Community

Anvit25

mandarmgd-03 commited on Sep 29, 2025

Commit

c1fc6b8

verified ·

1 Parent(s): edbb0e4

Create methodology.md (#3)

Browse files

- Create methodology.md (9a7878e409dd451821a0d11123eff2d82e4efca7)

Co-authored-by: Mandar Garud <mandarmgd-03@users.noreply.huggingface.co>

Files changed (1) hide show

methodology.md +59 -0

methodology.md ADDED Viewed

	@@ -0,0 +1,59 @@

+## Methodology
+This project follows a standard **RAG (Retrieval-Augmented Generation)** workflow with conversational memory:
+1. **Document Ingestion**
+   - Load a fixed manual from `temp_docs/samsung_manual.txt` using `TextLoader` with UTF-8 to avoid encoding issues.
+   - If the file is missing, initialization fails early with a clear error.
+2. **Preprocessing & Chunking**
+   - Split the document with `RecursiveCharacterTextSplitter` (`chunk_size=1000`, `chunk_overlap=200`) to balance recall (overlap) and retrieval speed (chunk size).
+3. **Embedding**
+   - Convert each chunk to a dense vector using `sentence-transformers/all-MiniLM-L6-v2` via `HuggingFaceEmbeddings`.
+   - This small, fast model offers a good latency/quality trade-off for semantic search.
+4. **Vector Store (Persistence)**
+   - Store embeddings in **ChromaDB** (`persist_directory=chroma_db`).
+   - On startup:
+     - If `chroma_db/` is empty → build the index from the document and persist it.
+     - If `chroma_db/` exists → load the persisted index directly (fast startup).
+5. **Retriever**
+   - Expose the vector store as a retriever with `k=2` to fetch the two most relevant chunks per query.
+6. **LLM Generation**
+   - Use `google/flan-t5-base` through a Hugging Face `pipeline("text2text-generation")`:
+     - `max_length=512`, `temperature=0.1`, `top_p=0.95`, `repetition_penalty=1.2`.
+   - The LLM receives the user question plus retrieved context and generates a grounded answer.
+7. **Conversational Orchestration**
+   - Wrap everything with `ConversationalRetrievalChain` to:
+     - Retrieve relevant chunks for each turn.
+     - Generate answers conditioned on both **context** and **chat history**.
+8. **Memory**
+   - Maintain multi-turn context using `ConversationBufferMemory (return_messages=True)`, enabling follow-ups like “and what about the warranty?” without repeating details.
+9. **UI Layer (Gradio)**
+   - `gr.Blocks()` app with:
+     - Status banner showing whether the DB was built or loaded.
+     - `gr.Chatbot` for messages and a `Textbox` + `Button` for input.
+   - `submit` event calls a wrapper that:
+     - Appends the user message to `chat_history`.
+     - Invokes the chain and appends the assistant’s answer.
+10. **Operational Notes**
+    - **Force re-indexing**: delete `chroma_db/` and restart.
+    - **Swap documents**: replace `temp_docs/samsung_manual.txt` (keep plain text for best results).
+    - **Model changes**: update `MODEL_NAME_EMBEDDINGS` or `MODEL_ID_LLM` in `app.py`.
+### Quality & Evaluation (Lightweight)
+- **Grounding check**: ask questions whose answers are known to be in the manual and verify the response cites the right details.
+- **Follow-up coherence**: ask a sequence of related questions to ensure memory works.
+- **Latency tracking**: note first-run time (indexing) vs. warm start (loading persisted DB).
+### Limitations
+- Works best with **clean, textual manuals**; PDFs should be converted to text first.
+- `flan-t5-base` is compact; for higher fidelity, upgrade to a stronger model (with GPU if available).
+- Retrieval uses `k=2`; adjust if answers miss context or include irrelevant details.