Spaces:

Pulastya0
/

Data-Science-Agent

Running

App Files Files Community

Pulastya B commited on 2 days ago

Commit

6badf55

1 Parent(s): 4eacfaa

Fix visibility of changes: strengthen file path prohibition, prepend metrics to summary, add progress polling

Browse files

Files changed (2) hide show

FRRONTEEEND/components/ChatInterface.tsx +42 -1
src/orchestrator.py +30 -15

FRRONTEEEND/components/ChatInterface.tsx CHANGED Viewed

@@ -74,8 +74,35 @@ export const ChatInterface: React.FC<{ onBack: () => void }> = ({ onBack }) => {
     updateSession(activeSessionId, newMessages);
     setInput('');
     setIsTyping(true);
     try {
       // Use the current origin if running on same server, otherwise use env variable
       const API_URL = window.location.origin;
       console.log('API URL:', API_URL);
@@ -137,7 +164,10 @@ export const ChatInterface: React.FC<{ onBack: () => void }> = ({ onBack }) => {
       const data = await response.json();
-      // Clear progress indicator
       setCurrentStep('');
       let assistantContent = '';
@@ -234,6 +264,12 @@ export const ChatInterface: React.FC<{ onBack: () => void }> = ({ onBack }) => {
     } catch (error: any) {
       console.error("Chat Error:", error);
       let errorMessage = "I'm sorry, I encountered an error processing your request.";
       if (error.message) {
@@ -260,6 +296,11 @@ export const ChatInterface: React.FC<{ onBack: () => void }> = ({ onBack }) => {
         timestamp: new Date()
       }]);
     } finally {
       setIsTyping(false);
     }
   };

     updateSession(activeSessionId, newMessages);
     setInput('');
     setIsTyping(true);
+    // Start polling for progress updates
+    const sessionKey = activeSessionId || 'default';
+    let progressInterval: NodeJS.Timeout | null = null;
+    const pollProgress = async () => {
+      try {
+        const API_URL = window.location.origin;
+        const progressResponse = await fetch(`${API_URL}/api/progress/${sessionKey}`);
+        if (progressResponse.ok) {
+          const progressData = await progressResponse.json();
+          const steps = progressData.steps || [];
+          // Find the most recent running step
+          const runningSteps = steps.filter((s: any) => s.status === 'running');
+          if (runningSteps.length > 0) {
+            const lastStep = runningSteps[runningSteps.length - 1];
+            setCurrentStep(lastStep.tool);
+          }
+        }
+      } catch (err) {
+        console.error('Progress polling error:', err);
+      }
+    };
     try {
+      // Start polling every 1 second
+      progressInterval = setInterval(pollProgress, 1000);
       // Use the current origin if running on same server, otherwise use env variable
       const API_URL = window.location.origin;
       console.log('API URL:', API_URL);
       const data = await response.json();
+      // Stop progress polling and clear indicator
+      if (progressInterval) {
+        clearInterval(progressInterval);
+      }
       setCurrentStep('');
       let assistantContent = '';
     } catch (error: any) {
       console.error("Chat Error:", error);
+      // Stop progress polling
+      if (progressInterval) {
+        clearInterval(progressInterval);
+      }
+      setCurrentStep('');
       let errorMessage = "I'm sorry, I encountered an error processing your request.";
       if (error.message) {
         timestamp: new Date()
       }]);
     } finally {
+      // Stop progress polling
+      if (progressInterval) {
+        clearInterval(progressInterval);
+      }
+      setCurrentStep('');
       setIsTyping(false);
     }
   };

src/orchestrator.py CHANGED Viewed

@@ -412,15 +412,17 @@ class DataScienceCopilot:
 **CRITICAL: User Interface Integration & Response Formatting**
 - The user interface automatically displays clickable buttons for all generated plots, reports, and outputs
-- **NEVER mention file paths** (e.g., "./outputs/plots/...", "./outputs/data/...", etc.) in your responses
-- **NEVER use markdown code blocks** for file paths or structured data in final summaries
-- DO NOT say "Output File: ..." or "Saved to: ..." - users can click buttons to view outputs
-- Simply describe what was created and what insights it shows
-- Use clean, aesthetic formatting with proper sections, bullet points, and spacing
-- Example: ❌ "📊 Output File: `./outputs/plots/heatmap.html`"
-           ✅ "Generated an interactive correlation heatmap showing relationships between variables"
-- Example: ❌ "Saved cleaned data to: `./outputs/data/cleaned.csv`"
-           ✅ "Cleaned the dataset by handling missing values and outliers"
 **CRITICAL: Tool Calling Format**
 When you need to use a tool, respond with a JSON block like this:
@@ -834,8 +836,12 @@ When you've finished all tool executions and are ready to return the final respo
    - What patterns were discovered in the data?
    - What were the most important features?
    - Were there any interesting correlations or anomalies?
-3. **Model performance** (if trained):
-   - Best model name and metrics (R², RMSE, MAE)
    - How accurate is the model? What does the score mean in practical terms?
    - Were there any challenges (imbalanced data, multicollinearity, etc.)?
 4. **Recommendations**:
@@ -1093,15 +1099,13 @@ You are a DOER. Complete workflows based on user intent."""
                     "url": f"/outputs/{nested_result['output_path'].replace('./outputs/', '')}"
                 })
-        # Build enhanced text summary
         summary_lines = [
             f"## 📊 Analysis Complete",
-            "",
-            llm_summary,
             ""
         ]
-        # Show all baseline models comparison first
         if "all_models" in metrics and metrics["all_models"]:
             summary_lines.extend([
                 "### 🔬 Baseline Models Comparison",
@@ -1152,6 +1156,17 @@ You are a DOER. Complete workflows based on user intent."""
                 ""
             ])
         # Add artifact links
         if artifacts["models"]:
             summary_lines.append("### 💾 Trained Models")

 **CRITICAL: User Interface Integration & Response Formatting**
 - The user interface automatically displays clickable buttons for all generated plots, reports, and outputs
+- **ABSOLUTELY FORBIDDEN**: NEVER EVER mention file paths in your responses
+  - ❌ NEVER write: "./outputs/...", "/outputs/...", "saved to", "output file:", "file path:"
+  - ❌ NEVER use markdown code blocks for file paths (no backticks around paths)
+  - ❌ NEVER say: "Output File:", "Saved to:", "File:", "Path:", "Location:"
+- **WHAT TO SAY INSTEAD**:
+  - ✅ "Generated an interactive correlation heatmap"
+  - ✅ "Cleaned the dataset by handling missing values"
+  - ✅ "Created visualizations showing the relationships"
+  - ✅ "Trained multiple models and optimized the best performer"
+- Users can click buttons to view outputs - you don't need to tell them where files are
+- Use clean, aesthetic formatting with sections, bullets, and proper spacing
 **CRITICAL: Tool Calling Format**
 When you need to use a tool, respond with a JSON block like this:
    - What patterns were discovered in the data?
    - What were the most important features?
    - Were there any interesting correlations or anomalies?
+3. **Model performance** (if trained) - **CRITICAL: YOU MUST INCLUDE THESE METRICS**:
+   - **ALWAYS extract and display** the exact metrics from tool results:
+   - R² Score, RMSE, MAE from the train_baseline_models results
+   - List ALL models trained (not just the best one)
+   - Example: "Trained 6 models: XGBoost (R²=0.713, RMSE=0.207), Random Forest (R²=0.685, RMSE=0.218), etc."
+   - If hyperparameter tuning was done, show before/after comparison
    - How accurate is the model? What does the score mean in practical terms?
    - Were there any challenges (imbalanced data, multicollinearity, etc.)?
 4. **Recommendations**:
                     "url": f"/outputs/{nested_result['output_path'].replace('./outputs/', '')}"
                 })
+        # Build enhanced text summary - start with metrics then LLM explanation
         summary_lines = [
             f"## 📊 Analysis Complete",
             ""
         ]
+        # Show all baseline models comparison FIRST (before LLM summary)
         if "all_models" in metrics and metrics["all_models"]:
             summary_lines.extend([
                 "### 🔬 Baseline Models Comparison",
                 ""
             ])
+        # Add LLM's explanation after metrics
+        if llm_summary and llm_summary.strip():
+            summary_lines.extend([
+                "---",
+                "",
+                "### 📝 Analysis Summary",
+                "",
+                llm_summary,
+                ""
+            ])
         # Add artifact links
         if artifacts["models"]:
             summary_lines.append("### 💾 Trained Models")