A newer version of the Gradio SDK is available:
6.1.0
Medical Q&A Bot - System Architecture
Visual Overview
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER INTERFACE β
β β
β ββββββββββββββββββββββββ ββββββββββββββββββββββββ β
β β Gradio Web UI β β Streamlit Web UI β β
β β (app.py) β OR β (app_streamlit.py) β β
β β Port: 7860 β β Port: 8501 β β
β ββββββββββββ¬ββββββββββββ ββββββββββββ¬ββββββββββββ β
βββββββββββββββΌβββββββββββββββββββββββββββββββββΌββββββββββββββββββ
β β
ββββββββββββββββββ¬ββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββ
β Query Processing Layer β
β β
β 1. Text Input Validation β
β 2. Embedding Generation β
β 3. Model Inference β
ββββββββββββββ¬ββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββ
β CLASSIFIER MODULE β
β (classifier/) β
β β
β ββββββββββββββββββββββββββββ β
β β SentenceTransformer β β
β β Embedding Model β β
β βββββββββββββ¬βββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββ β
β β Classification Head β β
β β (Neural Network) β β
β βββββββββββββ¬βββββββββββββββ β
ββββββββββββββββΌββββββββββββββββββ
β
ββββββββββββ΄βββββββββββ
β β
ββββββββββΌβββββββββ βββββββββΌβββββββββ
β MEDICAL β β ADMINISTRATIVEβ
β QUERY β β QUERY β
ββββββββββ¬βββββββββ βββββββββ¬βββββββββ
β β
β ββββΊ End (No Retrieval)
β
βΌ
βββββββββββββββββββββββββββββββββββ
β RETRIEVAL MODULE β
β (retriever/) β
β β
β ββββββββββββββββββββββββββ β
β β BM25 Search β β
β β (Sparse Retrieval) β β
β βββββββββββββ¬βββββββββββββ β
β β β
β βββββββββββββΌβββββββββββββ β
β β Dense Search β β
β β (Vector Similarity) β β
β βββββββββββββ¬βββββββββββββ β
β β β
β βββββββββββββΌβββββββββββββ β
β β RRF Fusion β β
β β (Rank Combination) β β
β βββββββββββββ¬βββββββββββββ β
β β β
β βββββββββββββΌβββββββββββββ β
β β Optional Reranker β β
β β (Cross-Encoder) β β
β βββββββββββββ¬βββββββββββββ β
ββββββββββββββββΌββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββ
β DATA SOURCES β
β β
β β’ PubMed Articles β
β β’ Miriad Q&A β
β β’ UniDoc Q&A β
β β
β (data/corpora/) β
βββββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββββ
β RESULTS β
β β
β β’ Document Title β
β β’ Text Content β
β β’ Relevance Scores β
β β’ Metadata β
βββββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββββ
β UI DISPLAY β
β β
β β’ Formatted Cards β
β β’ JSON View β
β β’ Score Badges β
βββββββββββββββββββββββββ
Data Flow
1. User Input
User Types Query β Web Interface Captures Input β Sends to Backend
2. Classification Phase
Query Text
β
Sentence Transformer (Embedding)
β
Classification Head (Neural Network)
β
Output: [Medical | Administrative | Other] + Confidence Scores
3. Retrieval Phase (Medical Queries Only)
Medical Query
β
ββββββββββββββββββββββββββ
β Parallel Retrieval β
β βββββββββββββββββββ β
β β BM25 (Sparse) β β β Top 100 docs
β βββββββββββββββββββ β
β βββββββββββββββββββ β
β β Dense (Vector) β β β Top 100 docs
β βββββββββββββββββββ β
ββββββββββββββββββββββββββ
β
RRF Fusion Algorithm
β
Top K Candidates
β
Optional: Cross-Encoder Reranking
β
Final Top N Results
Technology Stack
Frontend
- Gradio - Primary UI framework
- Streamlit - Alternative UI framework
- HTML/CSS - Custom styling
- JavaScript - Auto-generated by frameworks
Backend
- Python 3.8+ - Core language
- PyTorch - Deep learning framework
- Sentence-Transformers - Embedding models
- scikit-learn - ML utilities
Search & Retrieval
- Rank-BM25 - Sparse retrieval
- FAISS - Dense vector search
- Custom RRF - Rank fusion
- Cross-Encoder - Optional reranking
Data
- PubMed - Medical research articles
- Miriad - Medical Q&A database
- UniDoc - Unified document corpus
- JSONL - Data storage format
Component Interactions
1. Initialization
# Load models once at startup
embedding_model, classifier = classifier_init()
2. Classification
classification = predict_query(
text=[query],
embedding_model=embedding_model,
classifier_head=classifier
)
3. Retrieval
hits = get_candidates(
query=query,
k_retrieve=10,
use_reranker=False
)
4. Display
# Gradio displays results in tabs
# - Formatted HTML view
# - Raw JSON view
Performance Characteristics
Speed
- Classification: ~100-500ms
- BM25 Search: ~50-200ms
- Dense Search: ~100-300ms
- Reranking: ~500-2000ms (if enabled)
Accuracy
- Classification: ~95% accuracy
- Retrieval: Depends on corpus and query
- Reranking: +5-10% improvement
Resource Usage
- Memory: ~2-4 GB (with models loaded)
- CPU: Moderate during inference
- GPU: Optional (speeds up inference)
Scalability Considerations
Current Setup (Single User)
- β Perfect for demos and development
- β Low latency
- β Easy to debug
Future Scaling Options
- π Add caching for common queries
- π Deploy on cloud with autoscaling
- π Use model quantization for faster inference
- π Implement request queuing
- π Add load balancing
Security & Privacy
Current Implementation
- Local hosting only
- No data persistence
- No user tracking
- No authentication (optional)
Production Considerations
- Add user authentication
- Implement rate limiting
- Sanitize inputs
- Log access for auditing
- HTTPS for encrypted communication
Monitoring & Debugging
Available Information
- Query classification results
- Confidence scores per category
- Retrieval scores (BM25, Dense, RRF)
- Document metadata
- Error messages
Debug Mode
# In app.py, set:
demo.launch(show_error=True) # Shows detailed errors
Deployment Options
1. Local (Current)
Pros: Easy, fast, secure
Cons: Single user, not accessible remotely
2. Hugging Face Spaces
Pros: Free, easy deploy, public URL
Cons: Limited resources, public access
3. Cloud (AWS/GCP/Azure)
Pros: Scalable, private, customizable
Cons: Costs money, requires setup
4. Docker Container
Pros: Portable, consistent environment
Cons: Requires Docker knowledge
File Structure
health-query-classifier/
βββ π₯οΈ UI Layer
β βββ app.py # Main Gradio UI
β βββ app_streamlit.py # Alternative Streamlit UI
β βββ launch_ui.bat # Windows launcher
β βββ launch_ui.ps1 # PowerShell launcher
β
βββ π§ Classifier Layer
β βββ classifier/
β β βββ infer.py # Inference logic
β β βββ head.py # Classification head
β β βββ train.py # Training script
β β βββ utils.py # Utilities
β
βββ π Retrieval Layer
β βββ retriever/
β β βββ search.py # Search interface
β β βββ index_bm25.py # BM25 indexing
β β βββ index_dense.py # Dense indexing
β β βββ rrf.py # Rank fusion
β
βββ π₯ Team Layer
β βββ team/
β β βββ candidates.py # Candidate retrieval
β β βββ interfaces.py # Data interfaces
β
βββ π Data Layer
β βββ data/
β β βββ corpora/ # Corpus files
β β βββ medical_qa.jsonl
β β βββ miriad_text.jsonl
β β βββ unidoc_qa.jsonl
β
βββ π Documentation
βββ README.md # Main documentation
βββ QUICKSTART.md # Quick start guide
βββ UI_README.md # UI documentation
βββ UI_IMPLEMENTATION.md # Implementation details
βββ ARCHITECTURE.md # This file
This architecture ensures:
- β Clean separation of concerns
- β Modular design
- β Easy to test and debug
- β Scalable and maintainable
- β Well-documented