Spaces:

Pulastya0
/

Data-Science-Agent

Running

Pulastya B commited on Dec 28, 2025

Commit

4d07a53

1 Parent(s): 03b24f8

fix: Clarify report intent and fix parameter issues

- Add 'DATA PROFILING REPORT' intent category
- 'detailed report' keyword now triggers ydata_profiling_report
- Fix 'None' string being passed as target_col -> convert to None
- Add plotly_dashboard to report detection in frontend
- Dashboard and ydata reports now show view buttons
- Fixes: LLM choosing visualization instead of profiling report

Files changed (2) hide show

FRRONTEEEND/components/ChatInterface.tsx +1 -1
src/orchestrator.py +18 -3

FRRONTEEEND/components/ChatInterface.tsx CHANGED Viewed

@@ -141,7 +141,7 @@ export const ChatInterface: React.FC<{ onBack: () => void }> = ({ onBack }) => {
         // Extract report paths from workflow history
         if (result.workflow_history) {
-          const reportTools = ['generate_ydata_profiling_report'];
           result.workflow_history.forEach((step: any) => {
             if (reportTools.includes(step.tool)) {
               // Check multiple possible locations for the report path

         // Extract report paths from workflow history
         if (result.workflow_history) {
+          const reportTools = ['generate_ydata_profiling_report', 'generate_plotly_dashboard', 'generate_all_plots'];
           result.workflow_history.forEach((step: any) => {
             if (reportTools.includes(step.tool)) {
               // Check multiple possible locations for the report path

src/orchestrator.py CHANGED Viewed

@@ -397,14 +397,24 @@ When you need to use a tool, respond with a JSON block like this:
   2. STOP - DO NOT clean data, encode, or train models!
 - **Example**: "Generate interactive plots for Magnitude and latitude" → generate_interactive_scatter → DONE ✓
-**C. DATA ANALYSIS WITH ML** - Full workflow with model training:
 - Real dataset file path provided (CSV, Excel, etc. - NOT "dummy")
 - Keywords: "train model", "predict", "classify", "build model", "forecast"
 - User wants: cleaning + feature engineering + model training
 - **ACTION**: Run full ML workflow (steps 1-15 below)
 - **Example**: "Train a model to predict earthquake magnitude" → Full pipeline
-**D. UNCLEAR/AMBIGUOUS REQUESTS** - Intent is not obvious:
 - User says: "analyze", "look at", "check", "review" (without specifics)
 - Could mean: visualization only OR full ML OR just exploration
 - **ACTION**: ASK USER to clarify BEFORE starting work
@@ -413,7 +423,7 @@ When you need to use a tool, respond with a JSON block like this:
   - "Do you need model training or just want to explore the data visually?"
 - **DO NOT ASSUME** - Always ask when unclear!
-**E. SIMPLE QUESTIONS** - User asks for explanation/advice:
 - Keywords: "what is", "how to", "explain", "recommend"
 - **ACTION**: Answer directly, no tools needed
@@ -838,6 +848,11 @@ You are a DOER. Complete workflows based on user intent."""
                     # Convert directory to full file path
                     arguments["output_path"] = f"{output_dir}/ydata_profile.html"
             result = tool_func(**arguments)
             # Check if tool itself returned an error (some tools return dict with 'status': 'error')

   2. STOP - DO NOT clean data, encode, or train models!
 - **Example**: "Generate interactive plots for Magnitude and latitude" → generate_interactive_scatter → DONE ✓
+**C. DATA PROFILING REPORT** - User wants comprehensive data analysis report:
+- Keywords: "detailed report", "comprehensive report", "data report", "profiling report", "full analysis"
+- **NO specific visualization mentioned** (no "plot", "chart", "graph")
+- Real dataset provided
+- **ACTION**: Use generate_ydata_profiling_report tool
+- **Workflow**:
+  1. generate_ydata_profiling_report(file_path)
+  2. STOP - This generates a complete HTML report with all stats, correlations, distributions
+- **Example**: "Generate a detailed report for this" → generate_ydata_profiling_report → DONE ✓
+**D. DATA ANALYSIS WITH ML** - Full workflow with model training:
 - Real dataset file path provided (CSV, Excel, etc. - NOT "dummy")
 - Keywords: "train model", "predict", "classify", "build model", "forecast"
 - User wants: cleaning + feature engineering + model training
 - **ACTION**: Run full ML workflow (steps 1-15 below)
 - **Example**: "Train a model to predict earthquake magnitude" → Full pipeline
+**E. UNCLEAR/AMBIGUOUS REQUESTS** - Intent is not obvious:
 - User says: "analyze", "look at", "check", "review" (without specifics)
 - Could mean: visualization only OR full ML OR just exploration
 - **ACTION**: ASK USER to clarify BEFORE starting work
   - "Do you need model training or just want to explore the data visually?"
 - **DO NOT ASSUME** - Always ask when unclear!
+**F. SIMPLE QUESTIONS** - User asks for explanation/advice:
 - Keywords: "what is", "how to", "explain", "recommend"
 - **ACTION**: Answer directly, no tools needed
                     # Convert directory to full file path
                     arguments["output_path"] = f"{output_dir}/ydata_profile.html"
+            # Fix "None" string being passed as actual None
+            for key, value in list(arguments.items()):
+                if isinstance(value, str) and value.lower() in ["none", "null", "undefined"]:
+                    arguments[key] = None
             result = tool_func(**arguments)
             # Check if tool itself returned an error (some tools return dict with 'status': 'error')