Spaces:
Sleeping
Sleeping
A newer version of the Streamlit SDK is available:
1.55.0
metadata
title: Bioethics RAG
emoji: 🧬
colorFrom: pink
colorTo: purple
sdk: streamlit
sdk_version: 1.49.1
app_file: streamlit_app.py
pinned: false
Bioethics AI Assistant
A retrieval-augmented generation (RAG) system that provides intelligent answers to bioethics questions by searching through academic papers and generating contextually-aware responses.
Features
- Semantic Search: FAISS vector store with OpenAI embeddings for accurate document retrieval
- Confidence-based Citations: Automatic citation generation with confidence levels based on similarity scores
- Streaming Responses: Real-time answer generation with interactive UI
- Document Processing: Automated PDF text extraction, cleaning, and chunking
- Conversation History: Context-aware responses that consider previous exchanges
Architecture
- Workflow: User Query → Embedding → Vector Search → Context Assembly → LLM → Cited Response
- Document Processing: PyMuPDF and PyPDF2 for text extraction and metadata parsing
- Vector Store: FAISS with cosine similarity search on normalized embeddings
- Language Model: OpenAI GPT-4o-mini with streaming support
- Frontend: Streamlit with custom CSS for chat interface
Technical Implementation
Document Processing Pipeline
- PDF text extraction with page markers and metadata inference
- Text cleaning (whitespace normalization, header/footer removal)
- Semantic chunking with configurable overlap for context preservation
- Automated metadata extraction (title, authors, publication year)
Retrieval System
- 3072-dimensional embeddings using OpenAI's
text-embedding-3-large - L2-normalized vectors for cosine similarity computation
- Confidence thresholds for citation reliability (high: 0.8+, medium: 0.65+, low: 0.5+)
- Thread-safe operations with persistent storage
Response Generation
- Context assembly from retrieved chunks with confidence-based grouping
- Conversation history integration (last 4 exchanges)
- Citation formatting based on similarity confidence levels
- Streaming response delivery with real-time UI updates
Installation
git clone <repository-url>
cd bioethics-rag
pip install -r requirements.txt
echo "OPENAI_API_KEY=your_key_here" > .env
streamlit run streamlit_app.py
Built for demonstration of RAG system design, vector search implementation, and conversational AI interfaces.