Spaces:
Runtime error
Runtime error
Methodology
This project follows a standard RAG (Retrieval-Augmented Generation) workflow with conversational memory:
Document Ingestion
- Load a fixed manual from
temp_docs/samsung_manual.txtusingTextLoaderwith UTF-8 to avoid encoding issues. - If the file is missing, initialization fails early with a clear error.
- Load a fixed manual from
Preprocessing & Chunking
- Split the document with
RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200) to balance recall (overlap) and retrieval speed (chunk size).
- Split the document with
Embedding
- Convert each chunk to a dense vector using
sentence-transformers/all-MiniLM-L6-v2viaHuggingFaceEmbeddings. - This small, fast model offers a good latency/quality trade-off for semantic search.
- Convert each chunk to a dense vector using
Vector Store (Persistence)
- Store embeddings in ChromaDB (
persist_directory=chroma_db). - On startup:
- If
chroma_db/is empty → build the index from the document and persist it. - If
chroma_db/exists → load the persisted index directly (fast startup).
- If
- Store embeddings in ChromaDB (
Retriever
- Expose the vector store as a retriever with
k=2to fetch the two most relevant chunks per query.
- Expose the vector store as a retriever with
LLM Generation
- Use
google/flan-t5-basethrough a Hugging Facepipeline("text2text-generation"):max_length=512,temperature=0.1,top_p=0.95,repetition_penalty=1.2.
- The LLM receives the user question plus retrieved context and generates a grounded answer.
- Use
Conversational Orchestration
- Wrap everything with
ConversationalRetrievalChainto:- Retrieve relevant chunks for each turn.
- Generate answers conditioned on both context and chat history.
- Wrap everything with
Memory
- Maintain multi-turn context using
ConversationBufferMemory (return_messages=True), enabling follow-ups like “and what about the warranty?” without repeating details.
- Maintain multi-turn context using
UI Layer (Gradio)
gr.Blocks()app with:- Status banner showing whether the DB was built or loaded.
gr.Chatbotfor messages and aTextbox+Buttonfor input.
submitevent calls a wrapper that:- Appends the user message to
chat_history. - Invokes the chain and appends the assistant’s answer.
- Appends the user message to
Operational Notes
- Force re-indexing: delete
chroma_db/and restart. - Swap documents: replace
temp_docs/samsung_manual.txt(keep plain text for best results). - Model changes: update
MODEL_NAME_EMBEDDINGSorMODEL_ID_LLMinapp.py.
- Force re-indexing: delete
Quality & Evaluation (Lightweight)
- Grounding check: ask questions whose answers are known to be in the manual and verify the response cites the right details.
- Follow-up coherence: ask a sequence of related questions to ensure memory works.
- Latency tracking: note first-run time (indexing) vs. warm start (loading persisted DB).
Limitations
- Works best with clean, textual manuals; PDFs should be converted to text first.
flan-t5-baseis compact; for higher fidelity, upgrade to a stronger model (with GPU if available).- Retrieval uses
k=2; adjust if answers miss context or include irrelevant details.