A newer version of the Gradio SDK is available:
6.1.0
Summary: Available UI Options for Medical Q&A Bot
π― Three Versions Created
1. app_demo.py β‘ RECOMMENDED FOR DEMOS
Port: 7863 Speed: Instant (<1 second) Features:
- β Real-time classification (medical vs administrative)
- β Confidence scores with visualization
- β Action recommendations
- β Uses your group's trained models
- β No document retrieval (for speed)
Best for:
- Class presentations
- Quick demonstrations
- Testing classification accuracy
- When time is limited
Run: python app_demo.py
2. app_full.py π¬ COMPLETE SYSTEM
Port: 7864 Speed: First query: 6-10 minutes, subsequent: 2-5 seconds Features:
- β Real-time classification
- β Full document retrieval from PubMed & Miriad
- β BM25 + Dense search + RRF fusion
- β Optional cross-encoder reranking
- β οΈ Very slow first initialization
Best for:
- Showing full system capabilities
- When you have 10+ minutes to wait
- Detailed technical demonstrations
- Proving retrieval works
Run: python app_full.py
β οΈ WARNING: First medical query takes 6-10 minutes because:
- Loads ~200MB+ of medical corpus data
- Builds BM25 keyword index
- Generates embeddings for ALL documents (this is the slow part)
- Builds FAISS vector index
3. app.py / app_safe.py / app_lightweight.py π§ EXPERIMENTAL
These were intermediate versions created while troubleshooting. Not recommended for use.
π¬ Recommendation for Your Group Presentation
Strategy 1: Fast Demo (5 minutes)
Use app_demo.py only:
- Show classification working instantly
- Test medical vs administrative queries
- Highlight confidence scores
- Explain that retrieval is available but disabled for demo speed
- Show the codebase that supports retrieval (team/candidates.py)
Advantage: Reliable, professional, no waiting
Strategy 2: Split Demo (15+ minutes)
Use BOTH versions:
Part 1: Use app_demo.py for quick classification demos (5 min)
- Show multiple queries rapidly
- Demonstrate accuracy
Part 2: Switch to app_full.py that you pre-initialized (10 min)
- Before presentation: Run
app_full.pyand make ONE medical query to initialize - Wait the 10 minutes for it to build indexes
- Keep it running
- During presentation: Show actual document retrieval working fast
Advantage: Shows both speed AND capabilities
Strategy 3: Video Backup
- Use
app_demo.pyfor live demo - Record a video of
app_full.pyworking with retrieval - Show video during presentation if needed
π Technical Details to Mention
Your Group's Implementation:
- Classification Model: Fine-tuned sentence-transformers (embeddinggemma-300m-medical)
- Hybrid Retrieval: BM25 (sparse) + Dense embeddings (semantic)
- Fusion Algorithm: Reciprocal Rank Fusion (RRF)
- Data Sources: PubMed Medical Q&A + Miriad corpus
- Optional Enhancement: Cross-encoder reranker for accuracy
Why Retrieval is Slow:
- Real ML systems need to index large datasets
- Your corpus has thousands of medical documents
- CPU-only inference (no GPU acceleration available)
- This is a REAL implementation, not a toy demo
Production Solutions:
- Pre-build and save indexes (don't rebuild each time)
- Use GPU for faster embedding
- Implement caching
- Deploy on cloud with more resources
π‘ Demo Script Suggestions
Opening (30 seconds):
"We built an AI system that automatically classifies patient queries and retrieves relevant medical research. Let me show you how it works..."
Classification Demo (2-3 minutes):
"First, our classification system determines if a query is medical or administrative..." [Use app_demo.py, try 3-4 different queries]
Technical Explanation (2 minutes):
"Under the hood, we use:
- A fine-tuned 300-million parameter transformer model
- Hybrid search combining keyword matching and semantic similarity
- Reciprocal Rank Fusion to combine results
- Medical corpora from PubMed and Miriad databases"
Show Retrieval (Optional, if pre-initialized):
"Now let me show you actual document retrieval..." [Use app_full.py if you pre-initialized it]
Closing (30 seconds):
"This demonstrates how AI can improve healthcare triage, reduce response times, and provide evidence-based information to both patients and providers."
π Quick Start Commands
# For demos and presentations
python app_demo.py
# Access at: http://127.0.0.1:7863
# For full system (wait 10 minutes after first query)
python app_full.py
# Access at: http://127.0.0.1:7864
β What You Successfully Built
- β Working web UI with professional design
- β Real-time classification using your trained model
- β Full retrieval system integrated
- β Two versions: fast demo + complete system
- β Comprehensive documentation
- β Example queries
- β Clear visualization of results
You have everything you need for a successful presentation!
π― Final Recommendation
For your presentation, use app_demo.py
It shows your ML work instantly and professionally. You can explain:
- "The classification happens in real-time"
- "The full system includes retrieval which we can show separately"
- "This demonstrates the core AI capability"
If anyone asks about retrieval, you can:
- Show the code in
team/candidates.py - Explain the hybrid search architecture
- Mention it's fully implemented but slow due to index building
This is the smart approach for a live demo!