LLM_chatbot2 / methodology.md
mandarmgd-03's picture
Create methodology.md
9a7878e verified
|
raw
history blame
3.16 kB

Methodology

This project follows a standard RAG (Retrieval-Augmented Generation) workflow with conversational memory:

  1. Document Ingestion

    • Load a fixed manual from temp_docs/samsung_manual.txt using TextLoader with UTF-8 to avoid encoding issues.
    • If the file is missing, initialization fails early with a clear error.
  2. Preprocessing & Chunking

    • Split the document with RecursiveCharacterTextSplitter (chunk_size=1000, chunk_overlap=200) to balance recall (overlap) and retrieval speed (chunk size).
  3. Embedding

    • Convert each chunk to a dense vector using sentence-transformers/all-MiniLM-L6-v2 via HuggingFaceEmbeddings.
    • This small, fast model offers a good latency/quality trade-off for semantic search.
  4. Vector Store (Persistence)

    • Store embeddings in ChromaDB (persist_directory=chroma_db).
    • On startup:
      • If chroma_db/ is empty → build the index from the document and persist it.
      • If chroma_db/ exists → load the persisted index directly (fast startup).
  5. Retriever

    • Expose the vector store as a retriever with k=2 to fetch the two most relevant chunks per query.
  6. LLM Generation

    • Use google/flan-t5-base through a Hugging Face pipeline("text2text-generation"):
      • max_length=512, temperature=0.1, top_p=0.95, repetition_penalty=1.2.
    • The LLM receives the user question plus retrieved context and generates a grounded answer.
  7. Conversational Orchestration

    • Wrap everything with ConversationalRetrievalChain to:
      • Retrieve relevant chunks for each turn.
      • Generate answers conditioned on both context and chat history.
  8. Memory

    • Maintain multi-turn context using ConversationBufferMemory (return_messages=True), enabling follow-ups like “and what about the warranty?” without repeating details.
  9. UI Layer (Gradio)

    • gr.Blocks() app with:
      • Status banner showing whether the DB was built or loaded.
      • gr.Chatbot for messages and a Textbox + Button for input.
    • submit event calls a wrapper that:
      • Appends the user message to chat_history.
      • Invokes the chain and appends the assistant’s answer.
  10. Operational Notes

    • Force re-indexing: delete chroma_db/ and restart.
    • Swap documents: replace temp_docs/samsung_manual.txt (keep plain text for best results).
    • Model changes: update MODEL_NAME_EMBEDDINGS or MODEL_ID_LLM in app.py.

Quality & Evaluation (Lightweight)

  • Grounding check: ask questions whose answers are known to be in the manual and verify the response cites the right details.
  • Follow-up coherence: ask a sequence of related questions to ensure memory works.
  • Latency tracking: note first-run time (indexing) vs. warm start (loading persisted DB).

Limitations

  • Works best with clean, textual manuals; PDFs should be converted to text first.
  • flan-t5-base is compact; for higher fidelity, upgrade to a stronger model (with GPU if available).
  • Retrieval uses k=2; adjust if answers miss context or include irrelevant details.