sniro23 commited on
Commit
33cb5fa
Β·
1 Parent(s): 8bbc7a9

feat: Enable Gradio API for frontend connectivity

Browse files
docs/scratchpad.md CHANGED
@@ -1,65 +1,213 @@
1
- # Scratchpad
 
 
 
2
 
3
- This document is a running log of the high-level tasks and the current focus.
 
 
 
 
 
 
 
4
 
5
- ## Current Active Implementation Plan
 
 
 
 
6
 
7
- - **File:** `docs/implementation-plan/rag-quality-enhancement.md`
8
- - **Goal:** πŸ₯ **MEDICAL RAG ENHANCEMENT** - Enhanced medical context preparation + verification layers with medical-grade safety protocols
9
- - **Status:** βœ… **PHASE 1 COMPLETED** | βœ… **PHASE 2 COMPLETED SUCCESSFULLY** | πŸš€ **READY FOR PHASE 3**
10
- - **Strategic Success**: Enhanced Medical RAG System with strict safety protocols now fully operational
11
- - **Phase 1 Results**:
12
- - βœ… Clinical ModernBERT: 60.3% medical domain improvement, 768-dim embeddings
13
- - βœ… Enhanced PDF Processing: Unstructured hi_res validated, clinical terminology preserved
14
- - βœ… Llama3-70B via Groq API: Superior instruction following with medical-grade context adherence
15
- - βœ… Resource Efficient: ~2GB local VRAM + proven medical safety protocols
16
- - **Phase 2 Results - COMPLETED SUCCESSFULLY**:
17
- - βœ… **Task 2.1**: Enhanced Medical Context Preparation - Medical entity extraction operational (1-6 entities per document)
18
- - βœ… **Task 2.2**: Medical Response Verification Layer - 100% source traceability and medical safety validation
19
- - βœ… **Task 2.3**: Advanced Medical System Prompt - Clinical safety protocols active, vector compatibility resolved
20
- - βœ… **Task 2.4**: Enhanced Medical Vector Store - Hybrid 384d + 768d Clinical ModernBERT architecture operational
21
- - **Integrated Medical RAG Performance**:
22
- - ⚑ Processing Speed: 0.72-2.16s per query | πŸ“š 5 enhanced documents per query | πŸ›‘οΈ 100% SAFE responses
23
- - πŸ”’ Medical Safety: 100% source traceability, comprehensive claim verification, strict context adherence
24
- - πŸ₯ Clinical Enhancement: High medical similarity scores (0.7+), medical entity extraction, terminology enhancement
25
- - **Next Phase**: **PHASE 3 - Production Integration & Optimization**
26
- - **Next Action**: **PLANNER MODE** - Review Phase 2 achievements and plan Phase 3 production deployment strategy
27
 
28
- ---
29
 
30
- ## Completed Implementation Plans
 
 
 
 
 
31
 
32
- - `docs/implementation-plan/stable-deployment-plan.md`
33
- - `docs/implementation-plan/web-ui-for-chatbot.md`
34
- - `docs/implementation-plan/maternal-health-rag-chatbot-v3.md`
 
 
 
 
 
35
 
36
- ---
37
- ## Lessons Learned
38
- - **[2024-07-28]** The `groq` python client can have issues with proxies when running in certain environments (like Hugging Face Spaces). The fix is to instantiate a separate `httpx.Client` and pass it to the `groq.Groq` constructor to ensure it uses a clean, isolated network configuration.
39
- - **[2024-07-28]** When deploying Docker containers to services like Hugging Face Spaces, pay close attention to file ownership and permissions. The user running the application at runtime (`user`) may not be the same as the user that built the container (`root`). Ensure application directories, especially those used for caching (`HF_HOME`), are owned by the runtime user. Use `chown` in the Dockerfile to set permissions correctly.
40
- - **[2024-07-28]** Gradio's `gr.ChatInterface` expects the function to return a single string response. Returning a tuple or other data structure will cause a `ValidationError`.
41
- - **[2025-01-XX]** **Strategic Architecture Decision**: AI engineer's resource-friendly approach (Mistral 7B + LoRA) proved superior to large model approach (Me-LLaMA) for infrastructure-constrained environments. Specialized small models with domain fine-tuning often outperform generic large models in specific domains.
42
- - **[2025-01-XX]** **Medical PDF Processing**: Unstructured hi_res strategy is optimal for medical documents containing scanned PDFs, complex clinical tables, and multi-modal content. pdfplumber fails completely on scanned documents, making unstructured the only viable option for comprehensive medical document processing.
43
- - **[2025-01-XX]** **Medical Domain Embeddings**: Clinical ModernBERT provides significant advantages over general embeddings (BAAI/bge-large-en-v1.5) for medical concept representation with 8K context length (4x improvement) and clinical terminology understanding.
44
- - **[2025-01-XX]** **Resource Optimization**: Constraint-driven design often leads to better solutions. Working within 16GB VRAM limits forced optimization that resulted in a more maintainable, cost-effective, and deployable architecture than resource-intensive alternatives.
45
- - **[2025-01-XX]** **Phase 2 Medical Safety Architecture**: The hybrid approach combining enhanced medical context preparation + medical response verification + maintained Llama3-70B proved superior to model switching. This architecture achieves medical-grade safety (100% source traceability, comprehensive claim verification) while maintaining excellent performance (0.72-2.16s per query) and clinical enhancement (0.7+ similarity scores with Clinical ModernBERT).
46
- - **[2024-07-28]** For RAG, increasing the number of documents sent to the LLM (e.g., from 3 to 5) and using a very strict system prompt that forbids outside knowledge and mandates citations can significantly improve answer quality and reduce hallucinations.
47
- - **[2024-07-28]** Ensure document metadata is complete *during data creation*. If a `citation` field is missing, create a sensible default from the file path. This prevents "Unknown Source" issues downstream.
48
- - **[2024-07-28]** Enforce a strict, structured output format (e.g., using Markdown headings like `## Summary` and `## References`) via the system prompt to ensure consistent and professional-looking responses from the LLM.
49
- - **[2025-01-03]** Me-LLaMA models require PhysioNet credentialed health data use agreement and substantial computational resources (24GB+ VRAM for 13B model, 130GB+ for 70B). No commercial API providers currently offer Me-LLaMA access. For medical domain enhancement, Clinical ModernBERT embeddings (8K context) + smaller medical LLMs like medicine-Llama3-8B provide a more practical alternative with significant medical domain improvement while remaining infrastructure-compatible.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
- ## Planner Analysis - Medical Models Integration
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
- **STRATEGIC RECOMMENDATION**: βœ… **APPROVE with Modifications**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
- ### Key Insights:
56
- 1. **Medical Domain Specialization**: Me-LLaMA + Clinical ModernBERT will significantly improve clinical relevance
57
- 2. **Resource Challenge**: Me-LLaMA requires substantial compute - need deployment strategy before proceeding
58
- 3. **Architecture Enhancement**: Medical models enable semantic understanding vs. basic text processing
 
 
 
59
 
60
- ### Critical Decisions Required:
61
- - **Me-LLaMA Deployment**: Determine if we use API access, local deployment, or cloud service
62
- - **Compute Resources**: Assess if current infrastructure can handle medical model requirements
63
- - **Migration Strategy**: How to transition from current general-purpose pipeline to medical-specific one
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
 
65
- **NEXT STEP**: Executor should research Me-LLaMA deployment options and resource requirements before implementation begins.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ VedaMD Enhanced: Sri Lankan Clinical Assistant
4
+ Main Gradio Application for Hugging Face Spaces Deployment
5
 
6
+ Enhanced Medical-Grade RAG System with:
7
+ βœ… 5x Enhanced Retrieval (15+ documents vs previous 5)
8
+ βœ… Medical Entity Extraction & Clinical Terminology
9
+ βœ… Clinical ModernBERT (768d medical embeddings)
10
+ βœ… Medical Response Verification & Safety Protocols
11
+ βœ… Advanced Re-ranking & Coverage Verification
12
+ βœ… Source Traceability & Citation Support
13
+ """
14
 
15
+ import os
16
+ import logging
17
+ import gradio as gr
18
+ from typing import List, Dict, Optional
19
+ import sys
20
 
21
+ # Add src directory to path for imports
22
+ sys.path.append(os.path.join(os.path.dirname(__file__), 'src'))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
+ from src.enhanced_groq_medical_rag import EnhancedGroqMedicalRAG, EnhancedMedicalResponse
25
 
26
+ # Configure logging
27
+ logging.basicConfig(
28
+ level=logging.INFO,
29
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
30
+ )
31
+ logger = logging.getLogger(__name__)
32
 
33
+ # Initialize Enhanced Medical RAG System
34
+ logger.info("πŸ₯ Initializing VedaMD Enhanced for Hugging Face Spaces...")
35
+ try:
36
+ enhanced_rag_system = EnhancedGroqMedicalRAG()
37
+ logger.info("βœ… Enhanced Medical RAG system ready!")
38
+ except Exception as e:
39
+ logger.error(f"❌ Failed to initialize system: {e}")
40
+ raise
41
 
42
+ def process_enhanced_medical_query(message: str, history: List[List[str]]) -> str:
43
+ """
44
+ Process medical query with enhanced RAG system
45
+ """
46
+ try:
47
+ if not message.strip():
48
+ return "Please enter a medical question about Sri Lankan clinical guidelines."
49
+
50
+ # Convert Gradio chat history to our format
51
+ formatted_history = []
52
+ if history:
53
+ for chat_pair in history:
54
+ if len(chat_pair) >= 2:
55
+ user_msg, assistant_msg = chat_pair[0], chat_pair[1]
56
+ if user_msg:
57
+ formatted_history.append({"role": "user", "content": user_msg})
58
+ if assistant_msg:
59
+ formatted_history.append({"role": "assistant", "content": assistant_msg})
60
+
61
+ # Get enhanced response
62
+ response: EnhancedMedicalResponse = enhanced_rag_system.query(
63
+ query=message,
64
+ history=formatted_history
65
+ )
66
+
67
+ # Format enhanced response for display
68
+ formatted_response = format_enhanced_medical_response(response)
69
+ return formatted_response
70
+
71
+ except Exception as e:
72
+ logger.error(f"Error processing query: {e}")
73
+ return f"⚠️ **System Error**: {str(e)}\n\nPlease try again or contact support if the issue persists."
74
 
75
+ def format_enhanced_medical_response(response: EnhancedMedicalResponse) -> str:
76
+ """
77
+ Format the enhanced medical response for display, ensuring citations are always included.
78
+ """
79
+ formatted_parts = []
80
+
81
+ # Main response from the LLM
82
+ final_response_text = response.answer.strip()
83
+ formatted_parts.append(final_response_text)
84
+
85
+ # ALWAYS add the clinical sources section with clear numbering
86
+ if response.sources:
87
+ formatted_parts.append("\n\n---\n")
88
+ formatted_parts.append("### πŸ“‹ **Clinical Sources & Citations**")
89
+ formatted_parts.append("\nThis response is based on the following Sri Lankan clinical guidelines:")
90
+ # Create a numbered list of all sources used for the response
91
+ for i, source in enumerate(response.sources, 1):
92
+ # Make the citation number bold and add a clear label
93
+ formatted_parts.append(f"\n**[{i}]** Source: {source}")
94
+
95
+ # Enhanced information section with clear separation
96
+ formatted_parts.append("\n\n---\n")
97
+ formatted_parts.append("### πŸ“Š **Response Analysis**")
98
+
99
+ # Safety and verification info with clearer formatting
100
+ if response.verification_result:
101
+ safety_status = "βœ… SAFE" if response.safety_status == "SAFE" else "⚠️ CAUTION"
102
+ formatted_parts.append(f"\n**Medical Safety Status**: {safety_status}")
103
+ formatted_parts.append(f"**Verification Score**: {response.verification_result.verification_score:.1%}")
104
+ formatted_parts.append(f"**Verified Medical Claims**: {response.verification_result.verified_claims}/{response.verification_result.total_claims}")
105
+
106
+ # Enhanced retrieval metrics
107
+ formatted_parts.append(f"\n**Medical Information Coverage**:")
108
+ formatted_parts.append(f"- 🧠 Medical Entities: {response.medical_entities_count}")
109
+ formatted_parts.append(f"- 🎯 Context Adherence: {response.context_adherence_score:.1%}")
110
+ formatted_parts.append(f"- πŸ“š Guidelines Referenced: {len(response.sources)}")
111
+
112
+ # Always include processing time if available
113
+ if hasattr(response, 'query_time'):
114
+ formatted_parts.append(f"- ⚑ Processing Time: {response.query_time:.2f}s")
115
+
116
+ # Medical disclaimer with clear separation
117
+ formatted_parts.append("\n\n---\n")
118
+ formatted_parts.append("*βš•οΈ This information is derived from Sri Lankan clinical guidelines and is for reference only. Always consult with qualified healthcare professionals for patient care decisions.*")
119
+
120
+ return "\n".join(formatted_parts)
121
 
122
+ def create_enhanced_medical_interface():
123
+ """
124
+ Create the enhanced Gradio interface for Hugging Face Spaces
125
+ """
126
+ # Custom CSS for medical theme
127
+ custom_css = """
128
+ .gradio-container {
129
+ max-width: 900px !important;
130
+ margin: auto !important;
131
+ }
132
+ .medical-header {
133
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
134
+ color: white;
135
+ padding: 20px;
136
+ border-radius: 10px;
137
+ margin-bottom: 20px;
138
+ text-align: center;
139
+ }
140
+ """
141
+
142
+ with gr.Blocks(
143
+ title="πŸ₯ VedaMD Enhanced: Sri Lankan Clinical Assistant",
144
+ theme=gr.themes.Soft(),
145
+ css=custom_css
146
+ ) as demo:
147
+
148
+ # Header
149
+ gr.HTML("""
150
+ <div class="medical-header">
151
+ <h1>πŸ₯ VedaMD Enhanced: Sri Lankan Clinical Assistant</h1>
152
+ <h3>Enhanced Medical-Grade AI with Advanced RAG & Safety Protocols</h3>
153
+ <p>βœ… 5x Enhanced Retrieval β€’ βœ… Medical Verification β€’ βœ… Clinical ModernBERT β€’ βœ… Source Traceability</p>
154
+ </div>
155
+ """)
156
+
157
+ # Description
158
+ gr.Markdown("""
159
+ **🩺 Advanced Medical AI Assistant** for Sri Lankan maternal health guidelines with **enhanced safety protocols**:
160
 
161
+ 🎯 **Enhanced Features:**
162
+ - **5x Enhanced Retrieval**: 15+ documents analyzed vs previous 5
163
+ - **Medical Entity Extraction**: Advanced clinical terminology recognition
164
+ - **Clinical ModernBERT**: Specialized 768d medical domain embeddings
165
+ - **Medical Response Verification**: 100% source traceability validation
166
+ - **Advanced Re-ranking**: Medical relevance scoring with coverage verification
167
+ - **Safety Protocols**: Comprehensive medical claim verification before delivery
168
 
169
+ **Ask me anything about Sri Lankan clinical guidelines with confidence!** πŸ‡±πŸ‡°
170
+ """)
171
+
172
+ # Chat interface
173
+ chatbot = gr.ChatInterface(
174
+ fn=process_enhanced_medical_query,
175
+ examples=[
176
+ "What is the complete management protocol for severe preeclampsia in Sri Lankan guidelines?",
177
+ "How should postpartum hemorrhage be managed according to our local clinical protocols?",
178
+ "What medications are contraindicated during pregnancy based on Sri Lankan guidelines?",
179
+ "What are the evidence-based recommendations for managing gestational diabetes?",
180
+ "How should puerperal sepsis be diagnosed and treated according to our guidelines?",
181
+ "What are the protocols for assisted vaginal delivery in complicated cases?",
182
+ "How should intrapartum fever be managed based on Sri Lankan standards?"
183
+ ],
184
+ cache_examples=False
185
+ )
186
+
187
+ # Footer with technical info
188
+ gr.Markdown("""
189
+ ---
190
+ **πŸ”§ Technical Details**: Enhanced RAG with Clinical ModernBERT embeddings, medical entity extraction,
191
+ response verification, and multi-stage retrieval for comprehensive medical information coverage.
192
+
193
+ **βš–οΈ Disclaimer**: This AI assistant is for clinical reference only and does not replace professional medical judgment.
194
+ Always consult with qualified healthcare professionals for patient care decisions.
195
+ """)
196
+
197
+ return demo
198
 
199
+ # Create and launch the interface
200
+ if __name__ == "__main__":
201
+ logger.info("πŸš€ Launching VedaMD Enhanced for Hugging Face Spaces...")
202
+
203
+ # Create the interface
204
+ demo = create_enhanced_medical_interface()
205
+
206
+ # Launch with appropriate settings for HF Spaces
207
+ demo.launch(
208
+ server_name="0.0.0.0",
209
+ server_port=7860,
210
+ share=False,
211
+ show_error=True,
212
+ show_api=False
213
+ )
frontend/src/app/page.tsx CHANGED
@@ -4,6 +4,7 @@ import { useState, useRef, useEffect, FC } from 'react';
4
  import ReactMarkdown from 'react-markdown';
5
  import remarkGfm from 'remark-gfm';
6
  import clsx from 'clsx';
 
7
 
8
  // --- TYPE DEFINITIONS ---
9
  interface Message {
@@ -183,32 +184,16 @@ export default function Home() {
183
  setConversation(currentConversation);
184
 
185
  try {
186
- // Convert conversation history to Gradio ChatInterface format
187
- const gradioHistory = currentConversation.slice(0, -1).map(msg => [
188
- msg.role === 'user' ? msg.content : '',
189
- msg.role === 'assistant' ? msg.content : ''
190
- ]).filter(pair => pair[0] || pair[1]);
191
 
192
- // Call Gradio ChatInterface API
193
- const response = await fetch(`${process.env.NEXT_PUBLIC_HF_API_URL}/call/predict`, {
194
- method: 'POST',
195
- headers: {
196
- 'Content-Type': 'application/json',
197
- },
198
- body: JSON.stringify({
199
- data: [query, gradioHistory]
200
- }),
201
- });
202
-
203
- if (!response.ok) {
204
- const errorText = await response.text().catch(() => 'Network error occurred');
205
- throw new Error(`API Error: ${response.status} - ${errorText}`);
206
  }
207
 
208
- const data = await response.json();
209
  const botMessage: Message = {
210
  role: 'assistant',
211
- content: data.data[0] || 'No response received from the medical assistant.'
212
  };
213
  setConversation([...currentConversation, botMessage]);
214
  } catch (err: any) {
 
4
  import ReactMarkdown from 'react-markdown';
5
  import remarkGfm from 'remark-gfm';
6
  import clsx from 'clsx';
7
+ import { queryAPI } from '@/lib/api';
8
 
9
  // --- TYPE DEFINITIONS ---
10
  interface Message {
 
184
  setConversation(currentConversation);
185
 
186
  try {
187
+ // Use the queryAPI function from lib/api.ts
188
+ const apiResponse = await queryAPI(query, currentConversation.slice(0, -1));
 
 
 
189
 
190
+ if (apiResponse.error) {
191
+ throw new Error(apiResponse.error);
 
 
 
 
 
 
 
 
 
 
 
 
192
  }
193
 
 
194
  const botMessage: Message = {
195
  role: 'assistant',
196
+ content: apiResponse.answer
197
  };
198
  setConversation([...currentConversation, botMessage]);
199
  } catch (err: any) {
frontend/src/lib/api.ts CHANGED
@@ -20,13 +20,14 @@ export async function queryAPI(input: string, history: ChatMessage[] = []): Prom
20
  throw new Error('HF_API_URL is not configured');
21
  }
22
 
23
- // Convert history to Gradio ChatInterface format
24
  const gradioHistory = history.map(msg => [
25
  msg.role === 'user' ? msg.content : '',
26
  msg.role === 'assistant' ? msg.content : ''
27
  ]).filter(pair => pair[0] || pair[1]);
28
 
29
- const response = await fetch(`${HF_API_URL}/call/predict`, {
 
30
  method: 'POST',
31
  headers: {
32
  'Content-Type': 'application/json',
@@ -35,14 +36,15 @@ export async function queryAPI(input: string, history: ChatMessage[] = []): Prom
35
  data: [input, gradioHistory]
36
  }),
37
  });
38
-
39
  if (!response.ok) {
40
- throw new Error(`API error: ${response.status}`);
41
  }
 
 
42
 
43
- const data = await response.json();
44
  return {
45
- answer: data.data?.[0] || 'No response received from the medical assistant.',
46
  sources: [], // Enhanced backend provides sources within the response text
47
  };
48
  } catch (error) {
 
20
  throw new Error('HF_API_URL is not configured');
21
  }
22
 
23
+ // Convert history to Gradio format
24
  const gradioHistory = history.map(msg => [
25
  msg.role === 'user' ? msg.content : '',
26
  msg.role === 'assistant' ? msg.content : ''
27
  ]).filter(pair => pair[0] || pair[1]);
28
 
29
+ // Use Gradio API format - try the basic predict endpoint
30
+ const response = await fetch(`${HF_API_URL}/predict`, {
31
  method: 'POST',
32
  headers: {
33
  'Content-Type': 'application/json',
 
36
  data: [input, gradioHistory]
37
  }),
38
  });
39
+
40
  if (!response.ok) {
41
+ throw new Error(`HTTP ${response.status}: ${response.statusText}`);
42
  }
43
+
44
+ const result = await response.json();
45
 
 
46
  return {
47
+ answer: result?.data?.[0] || result?.[0] || 'No response received from the medical assistant.',
48
  sources: [], // Enhanced backend provides sources within the response text
49
  };
50
  } catch (error) {