--- title: Bioethics RAG emoji: 🧬 colorFrom: pink colorTo: purple sdk: streamlit sdk_version: 1.49.1 app_file: streamlit_app.py pinned: false --- # Bioethics AI Assistant A retrieval-augmented generation (RAG) system that provides intelligent answers to bioethics questions by searching through academic papers and generating contextually-aware responses. ## Features - **Semantic Search**: FAISS vector store with OpenAI embeddings for accurate document retrieval - **Confidence-based Citations**: Automatic citation generation with confidence levels based on similarity scores - **Streaming Responses**: Real-time answer generation with interactive UI - **Document Processing**: Automated PDF text extraction, cleaning, and chunking - **Conversation History**: Context-aware responses that consider previous exchanges ## Architecture - **Workflow**: User Query → Embedding → Vector Search → Context Assembly → LLM → Cited Response - **Document Processing**: PyMuPDF and PyPDF2 for text extraction and metadata parsing - **Vector Store**: FAISS with cosine similarity search on normalized embeddings - **Language Model**: OpenAI GPT-4o-mini with streaming support - **Frontend**: Streamlit with custom CSS for chat interface ## Technical Implementation ### Document Processing Pipeline - PDF text extraction with page markers and metadata inference - Text cleaning (whitespace normalization, header/footer removal) - Semantic chunking with configurable overlap for context preservation - Automated metadata extraction (title, authors, publication year) ### Retrieval System - 3072-dimensional embeddings using OpenAI's `text-embedding-3-large` - L2-normalized vectors for cosine similarity computation - Confidence thresholds for citation reliability (high: 0.8+, medium: 0.65+, low: 0.5+) - Thread-safe operations with persistent storage ### Response Generation - Context assembly from retrieved chunks with confidence-based grouping - Conversation history integration (last 4 exchanges) - Citation formatting based on similarity confidence levels - Streaming response delivery with real-time UI updates ## Installation ```bash git clone cd bioethics-rag pip install -r requirements.txt echo "OPENAI_API_KEY=your_key_here" > .env streamlit run streamlit_app.py ``` Built for demonstration of RAG system design, vector search implementation, and conversational AI interfaces.