Spaces:

Anvit25
/

LLM_chatbot2

Runtime error

App Files Files Community

LLM_chatbot2 / methodology.md

mandarmgd-03

Create methodology.md

9a7878e verified 8 months ago

preview code

raw

history blame

3.16 kB

Methodology

This project follows a standard RAG (Retrieval-Augmented Generation) workflow with conversational memory:

Document Ingestion
- Load a fixed manual from temp_docs/samsung_manual.txt using TextLoader with UTF-8 to avoid encoding issues.
- If the file is missing, initialization fails early with a clear error.
Preprocessing & Chunking
- Split the document with RecursiveCharacterTextSplitter (chunk_size=1000, chunk_overlap=200) to balance recall (overlap) and retrieval speed (chunk size).
Embedding
- Convert each chunk to a dense vector using sentence-transformers/all-MiniLM-L6-v2 via HuggingFaceEmbeddings.
- This small, fast model offers a good latency/quality trade-off for semantic search.
Vector Store (Persistence)
- Store embeddings in ChromaDB (persist_directory=chroma_db).
- On startup:
  - If chroma_db/ is empty → build the index from the document and persist it.
  - If chroma_db/ exists → load the persisted index directly (fast startup).
Retriever
- Expose the vector store as a retriever with k=2 to fetch the two most relevant chunks per query.
LLM Generation
- Use google/flan-t5-base through a Hugging Face pipeline("text2text-generation"):
  - max_length=512, temperature=0.1, top_p=0.95, repetition_penalty=1.2.
- The LLM receives the user question plus retrieved context and generates a grounded answer.
Conversational Orchestration
- Wrap everything with ConversationalRetrievalChain to:
  - Retrieve relevant chunks for each turn.
  - Generate answers conditioned on both context and chat history.
Memory
- Maintain multi-turn context using ConversationBufferMemory (return_messages=True), enabling follow-ups like “and what about the warranty?” without repeating details.
UI Layer (Gradio)
- gr.Blocks() app with:
  - Status banner showing whether the DB was built or loaded.
  - gr.Chatbot for messages and a Textbox + Button for input.
- submit event calls a wrapper that:
  - Appends the user message to chat_history.
  - Invokes the chain and appends the assistant’s answer.
Operational Notes
- Force re-indexing: delete chroma_db/ and restart.
- Swap documents: replace temp_docs/samsung_manual.txt (keep plain text for best results).
- Model changes: update MODEL_NAME_EMBEDDINGS or MODEL_ID_LLM in app.py.

Quality & Evaluation (Lightweight)

Grounding check: ask questions whose answers are known to be in the manual and verify the response cites the right details.
Follow-up coherence: ask a sequence of related questions to ensure memory works.
Latency tracking: note first-run time (indexing) vs. warm start (loading persisted DB).

Limitations

Works best with clean, textual manuals; PDFs should be converted to text first.
flan-t5-base is compact; for higher fidelity, upgrade to a stronger model (with GPU if available).
Retrieval uses k=2; adjust if answers miss context or include irrelevant details.