Pulastya B commited on
Commit
b7bc364
·
1 Parent(s): 7f3326d

Add comprehensive final summary instructions to system prompt

Browse files
Files changed (1) hide show
  1. src/orchestrator.py +48 -0
src/orchestrator.py CHANGED
@@ -816,6 +816,54 @@ Use specialized tools FIRST. Only use execute_python_code for:
816
 
817
  File chain: original → cleaned.csv → no_outliers.csv → numeric.csv → encoded.csv → models (if requested)
818
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
819
  You are a DOER. Complete workflows based on user intent."""
820
 
821
  def _generate_cache_key(self, file_path: str, task_description: str,
 
816
 
817
  File chain: original → cleaned.csv → no_outliers.csv → numeric.csv → encoded.csv → models (if requested)
818
 
819
+ **FINAL SUMMARY - WHEN WORKFLOW IS COMPLETE:**
820
+ When you've finished all tool executions and are ready to return the final response, provide a comprehensive summary that includes:
821
+
822
+ 1. **What was accomplished**: List all major steps completed (data cleaning, feature engineering, model training, etc.)
823
+ 2. **Key findings from the data**:
824
+ - What patterns were discovered in the data?
825
+ - What were the most important features?
826
+ - Were there any interesting correlations or anomalies?
827
+ 3. **Model performance** (if trained):
828
+ - Best model name and metrics (R², RMSE, MAE)
829
+ - How accurate is the model? What does the score mean in practical terms?
830
+ - Were there any challenges (imbalanced data, multicollinearity, etc.)?
831
+ 4. **Recommendations**:
832
+ - Is the model ready for use?
833
+ - What could improve performance further?
834
+ - Any data quality issues that should be addressed?
835
+ 5. **Generated artifacts**: Mention reports, plots, and visualizations (but DON'T include file paths - the UI shows buttons automatically)
836
+
837
+ Example final response:
838
+ "I've completed the full machine learning workflow for earthquake magnitude prediction:
839
+
840
+ **Data Preparation:**
841
+ - Cleaned 175,947 earthquake records (2000-2025)
842
+ - Removed 3 columns with >50% missing values (dmin, horizontalError, magError)
843
+ - Extracted time-based features (year, month, day, hour) from timestamps
844
+ - Encoded categorical variables (magType, net, type, status)
845
+
846
+ **Key Findings:**
847
+ - Depth shows strong negative correlation (-0.45) with magnitude
848
+ - Latitude and longitude patterns indicate geographic clustering of large earthquakes
849
+ - Most earthquakes occur at shallow depths (< 50km)
850
+
851
+ **Model Performance:**
852
+ - Best model: XGBoost Regressor
853
+ - R² Score: 0.713 (explains 71.3% of magnitude variance)
854
+ - RMSE: 0.207 (predictions within ±0.2 magnitude units)
855
+ - Cross-validation: 0.707 ± 0.012 (consistent performance across folds)
856
+
857
+ After hyperparameter tuning with 50 trials, improved RMSE from 0.214 to 0.199.
858
+
859
+ **Recommendation:**
860
+ The model shows good predictive power for earthquake magnitude. The 71% R² score indicates reliable predictions, though there's room for improvement. Consider:
861
+ - Adding seismic wave data if available
862
+ - Feature engineering for tectonic plate boundaries
863
+ - Ensemble methods to boost performance further
864
+
865
+ All visualizations, reports, and the trained model are available via the buttons above."
866
+
867
  You are a DOER. Complete workflows based on user intent."""
868
 
869
  def _generate_cache_key(self, file_path: str, task_description: str,