Spaces:

Anvit25
/

LLM_chatbot2

Runtime error

App Files Files Community

LLM_chatbot2 / methodology.md

Anvit25

Create methodology.md (#3)

c1fc6b8 verified 8 months ago

preview code

raw

history blame contribute delete

3.16 kB

	## Methodology

	This project follows a standard RAG (Retrieval-Augmented Generation) workflow with conversational memory:

	1. Document Ingestion
	- Load a fixed manual from `temp_docs/samsung_manual.txt` using `TextLoader` with UTF-8 to avoid encoding issues.
	- If the file is missing, initialization fails early with a clear error.

	2. Preprocessing & Chunking
	- Split the document with `RecursiveCharacterTextSplitter` (`chunk_size=1000`, `chunk_overlap=200`) to balance recall (overlap) and retrieval speed (chunk size).

	3. Embedding
	- Convert each chunk to a dense vector using `sentence-transformers/all-MiniLM-L6-v2` via `HuggingFaceEmbeddings`.
	- This small, fast model offers a good latency/quality trade-off for semantic search.

	4. Vector Store (Persistence)
	- Store embeddings in ChromaDB (`persist_directory=chroma_db`).
	- On startup:
	- If `chroma_db/` is empty → build the index from the document and persist it.
	- If `chroma_db/` exists → load the persisted index directly (fast startup).

	5. Retriever
	- Expose the vector store as a retriever with `k=2` to fetch the two most relevant chunks per query.

	6. LLM Generation
	- Use `google/flan-t5-base` through a Hugging Face `pipeline("text2text-generation")`:
	- `max_length=512`, `temperature=0.1`, `top_p=0.95`, `repetition_penalty=1.2`.
	- The LLM receives the user question plus retrieved context and generates a grounded answer.

	7. Conversational Orchestration
	- Wrap everything with `ConversationalRetrievalChain` to:
	- Retrieve relevant chunks for each turn.
	- Generate answers conditioned on both context and chat history.

	8. Memory
	- Maintain multi-turn context using `ConversationBufferMemory (return_messages=True)`, enabling follow-ups like “and what about the warranty?” without repeating details.

	9. UI Layer (Gradio)
	- `gr.Blocks()` app with:
	- Status banner showing whether the DB was built or loaded.
	- `gr.Chatbot` for messages and a `Textbox` + `Button` for input.
	- `submit` event calls a wrapper that:
	- Appends the user message to `chat_history`.
	- Invokes the chain and appends the assistant’s answer.

	10. Operational Notes
	- Force re-indexing: delete `chroma_db/` and restart.
	- Swap documents: replace `temp_docs/samsung_manual.txt` (keep plain text for best results).
	- Model changes: update `MODEL_NAME_EMBEDDINGS` or `MODEL_ID_LLM` in `app.py`.

	### Quality & Evaluation (Lightweight)
	- Grounding check: ask questions whose answers are known to be in the manual and verify the response cites the right details.
	- Follow-up coherence: ask a sequence of related questions to ensure memory works.
	- Latency tracking: note first-run time (indexing) vs. warm start (loading persisted DB).

	### Limitations
	- Works best with clean, textual manuals; PDFs should be converted to text first.
	- `flan-t5-base` is compact; for higher fidelity, upgrade to a stronger model (with GPU if available).
	- Retrieval uses `k=2`; adjust if answers miss context or include irrelevant details.