Spaces:
Sleeping
title: Teacher LLM
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
Socrates is an advanced RAG-based chatbot create to help learn and interact with your documents. Where you can upload a PDF and start a conversation, generate practice quizzes, and augment your knowledge with optional real-time web searches.
Built with LangChain and powered by Google's Gemini-1.5-flash, this tool uses a sophisticated retrieval pipeline to provide accurate and contextually relevant answers.
Key Features
Interactive Document Chat: Ask questions in natural language and get answers sourced directly from your document.
Practice Quiz Generation: Automatically create Multiple-Choice Questions (MCQs) from any topic within the document to test your knowledge.
Hybrid Search: Augment document-based answers with real-time online search results from DuckDuckGo.
Efficient & Fast: Built with a highly optimized retrieval system for quick and relevant context searching.
Advanced Retrieval Strategy
To ensure high-quality responses, this project implements a multi-layered retrieval strategy:
Vectorization: Documents are processed and stored in a FAISS vector store using state-of-the-art embedding models.
HNSW Algorithm: The FAISS index uses the Hierarchical Navigable Small World (HNSW) algorithm for extremely fast and efficient similarity searches.
Multi-Query Generation: Your single question is expanded into multiple related queries to retrieve a broader and more relevant set of initial documents.
Reciprocal Rank Fusion (RRF): The results from the multi-query search are intelligently re-ranked using RRF to bring the most relevant information to the top, significantly improving context quality.
Tech Stack
Frameworks: LangChain, Streamlit
LLM: Google Gemini 1.5 Flash, Ollama (configurable)
Vector Database: FAISS (with HNSW)
Embeddings: Hugging Face Sentence Transformers
Tools: DuckDuckGo Search API
Deployment: Docker, Hugging Face Spaces
Evaluation
The system was evaluated on key RAG metrics to ensure reliability and accuracy.
Metric Score Faithfulness 0.95 Context Recall 0.68 Factual Correctness 0.52
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference