LLM_chatbot2 / methodology.md
mandarmgd-03's picture
Create methodology.md
9a7878e verified
|
raw
history blame
3.16 kB
## Methodology
This project follows a standard **RAG (Retrieval-Augmented Generation)** workflow with conversational memory:
1. **Document Ingestion**
- Load a fixed manual from `temp_docs/samsung_manual.txt` using `TextLoader` with UTF-8 to avoid encoding issues.
- If the file is missing, initialization fails early with a clear error.
2. **Preprocessing & Chunking**
- Split the document with `RecursiveCharacterTextSplitter` (`chunk_size=1000`, `chunk_overlap=200`) to balance recall (overlap) and retrieval speed (chunk size).
3. **Embedding**
- Convert each chunk to a dense vector using `sentence-transformers/all-MiniLM-L6-v2` via `HuggingFaceEmbeddings`.
- This small, fast model offers a good latency/quality trade-off for semantic search.
4. **Vector Store (Persistence)**
- Store embeddings in **ChromaDB** (`persist_directory=chroma_db`).
- On startup:
- If `chroma_db/` is empty → build the index from the document and persist it.
- If `chroma_db/` exists → load the persisted index directly (fast startup).
5. **Retriever**
- Expose the vector store as a retriever with `k=2` to fetch the two most relevant chunks per query.
6. **LLM Generation**
- Use `google/flan-t5-base` through a Hugging Face `pipeline("text2text-generation")`:
- `max_length=512`, `temperature=0.1`, `top_p=0.95`, `repetition_penalty=1.2`.
- The LLM receives the user question plus retrieved context and generates a grounded answer.
7. **Conversational Orchestration**
- Wrap everything with `ConversationalRetrievalChain` to:
- Retrieve relevant chunks for each turn.
- Generate answers conditioned on both **context** and **chat history**.
8. **Memory**
- Maintain multi-turn context using `ConversationBufferMemory (return_messages=True)`, enabling follow-ups like “and what about the warranty?” without repeating details.
9. **UI Layer (Gradio)**
- `gr.Blocks()` app with:
- Status banner showing whether the DB was built or loaded.
- `gr.Chatbot` for messages and a `Textbox` + `Button` for input.
- `submit` event calls a wrapper that:
- Appends the user message to `chat_history`.
- Invokes the chain and appends the assistant’s answer.
10. **Operational Notes**
- **Force re-indexing**: delete `chroma_db/` and restart.
- **Swap documents**: replace `temp_docs/samsung_manual.txt` (keep plain text for best results).
- **Model changes**: update `MODEL_NAME_EMBEDDINGS` or `MODEL_ID_LLM` in `app.py`.
### Quality & Evaluation (Lightweight)
- **Grounding check**: ask questions whose answers are known to be in the manual and verify the response cites the right details.
- **Follow-up coherence**: ask a sequence of related questions to ensure memory works.
- **Latency tracking**: note first-run time (indexing) vs. warm start (loading persisted DB).
### Limitations
- Works best with **clean, textual manuals**; PDFs should be converted to text first.
- `flan-t5-base` is compact; for higher fidelity, upgrade to a stronger model (with GPU if available).
- Retrieval uses `k=2`; adjust if answers miss context or include irrelevant details.