Hammad712 commited on
Commit
5367e2c
Β·
verified Β·
1 Parent(s): ea6a10c

Update app/core/agents.py

Browse files
Files changed (1) hide show
  1. app/core/agents.py +188 -148
app/core/agents.py CHANGED
@@ -1123,38 +1123,98 @@ def llm_judge(original_payload: Dict[str, Any], generated_plan: Dict[str, Any])
1123
  Do NOT include any extra text.
1124
  """
1125
 
1126
- ML_JUDGE_PROMPT = f"""
1127
- You are the **RiverGen ML Quality Auditor**. Your job is to validate a Machine Learning Execution Plan.
1128
- You must return your evaluation in a strictly valid **json** format.
1129
-
1130
- **VALIDATION CRITERIA:**
1131
- 1. **Target Leakage**: Ensure the 'labels' are not accidentally included in the 'features' list in Step 1.
1132
- 2. **Step Dependency**: Verify that Step 2 (Pre-processing) lists Step 1 as a dependency, and Step 3 (Training) lists Step 2.
1133
- 3. **Metric Alignment**: If the task is Regression, metrics must be RMSE/R2. If Classification, metrics must be F1/AUC-ROC.
1134
- 4. **Data Handling**: Check if the plan includes the specific imputation (e.g., mean/median) and scaling (e.g., min-max) requested in the prompt.
1135
- 5. **SQL Accuracy**: Verify the SQL joins the correct tables and aggregates data logically for ML consumption.
1136
-
1137
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1138
 
1139
- **INPUT TO EVALUATE:**
1140
- - User Prompt: {original_payload.get("user_prompt")}
1141
- - Generated Plan: {json.dumps(generated_plan, indent=2)}
1142
- OUTPUT:
1143
- Return ONLY a JSON object:
1144
- {{
1145
- "approved": boolean,
1146
- "feedback": "string",
1147
- "score": float,
1148
- "governance_enforcement": {{ }},
1149
- "validation": {{
1150
- "missing_fields": [],
1151
- "dropped_sources": [],
1152
- "notes": [],
1153
- "performance_warnings": []
1154
- }}
1155
- }}
1156
- Do NOT include any extra text.
1157
- """
1158
 
1159
 
1160
  general_qa_judge_prompt = f"""
@@ -1709,128 +1769,108 @@ def ml_agent(payload: Dict[str, Any], feedback: str = None) -> Dict[str, Any]:
1709
  system_prompt = f"""
1710
  You are the **RiverGen ML Architect Agent**.
1711
 
1712
- Your responsibility is to design a **fully executable, reproducible, and governance-safe machine learning pipeline plan**.
1713
- You MUST return a **single, valid JSON object** that conforms exactly to the provided output template.
1714
-
1715
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1716
- 🎯 CORE OBJECTIVE
1717
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1718
- Translate the user request and data schema into a **production-ready ML execution plan** that:
1719
- - Can be realistically executed by an ML engine
1720
- - Explicitly defines compute engines
1721
- - Produces reproducible artifacts
1722
- - Follows ML best practices without ambiguity
1723
-
1724
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1725
- 🧠 ABSOLUTE LOGIC RULES (NON-NEGOTIABLE)
1726
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1727
-
1728
- 1. **Feature vs Label Separation**
1729
- - You MUST explicitly define:
1730
- - `features`: input variables
1731
- - `labels`: target variables
1732
- - Labels MUST NOT appear inside features.
1733
-
1734
- 2. **Execution Strategy Selection**
1735
- - `sequential_dag` β†’ Python / CSV / Pandas / Scikit-Learn workflows
1736
- - `pushdown` β†’ BigQuery ML / Snowflake ML
1737
- - `distributed_training` β†’ Spark / Ray / >1M rows
1738
- - NEVER choose a strategy that conflicts with the data source.
1739
-
1740
- 3. **Compute Engine Declaration (CRITICAL)**
1741
- - EVERY operation MUST declare a valid `compute_engine`
1742
- - Examples:
1743
- - CSV / S3 β†’ `pandas`, `duckdb`, `spark`
1744
- - SQL DB β†’ `postgresql`, `bigquery`
1745
- - ❌ NEVER write raw SQL over CSV unless an engine (DuckDB / Athena / Spark) is explicitly stated.
1746
-
1747
- 4. **Data Access Semantics**
1748
- - CSV / S3 data MUST be loaded using:
1749
- - DuckDB
1750
- - Pandas
1751
- - Spark
1752
- - Athena (explicitly stated)
1753
- - ❌ Invalid example (FORBIDDEN):
1754
- `SELECT * FROM s3://bucket/file.csv`
1755
-
1756
- 5. **Pre-Processing (MANDATORY)**
1757
- - Always include:
1758
- - Missing value handling (imputation strategy per column or numeric default)
1759
- - Feature scaling for numerical features
1760
- - Include train/test split with:
1761
- - Explicit ratio
1762
- - Explicit `random_state`
1763
-
1764
- 6. **Metrics (STRICT ENFORCEMENT)**
1765
- - Regression:
1766
- - RMSE (REQUIRED)
1767
- - RΒ² (REQUIRED)
1768
- - Classification:
1769
- - Precision
1770
- - Recall
1771
- - F1-Score
1772
- - AUC-ROC
1773
- - ❌ Partial metric sets are NOT allowed.
1774
-
1775
- 7. **Model Specification**
1776
- - Always specify:
1777
- - Algorithm name (no β€œauto” unless justified)
1778
- - Hyperparameters (empty object allowed, omission NOT allowed)
1779
- - Declare output artifacts:
1780
- - Trained model path
1781
- - Evaluation report path
1782
-
1783
- 8. **Reproducibility & Governance**
1784
- - Include:
1785
- - `random_state`
1786
- - Deterministic splits
1787
- - Do NOT hallucinate governance rules.
1788
- - If no governance exists, explicitly state `"governance_applied": []`.
1789
-
1790
- 9. **JSON Integrity**
1791
- - Output MUST be:
1792
- - Valid JSON
1793
- - No comments
1794
- - No markdown
1795
- - No trailing commas
1796
- - No extra keys outside the template
1797
-
1798
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1799
- πŸ“₯ INPUT CONTEXT
1800
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1801
- - User Prompt:
1802
- "{user_prompt}"
1803
-
1804
- - Data Schema (AUTHORITATIVE β€” DO NOT HALLUCINATE):
1805
- {json.dumps(data_sources)}
1806
-
1807
- - ML Parameters:
1808
- {json.dumps(ml_params)}
1809
-
1810
- - User Context:
1811
- {json.dumps(user_context)}
1812
-
1813
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1814
- πŸ“€ REQUIRED OUTPUT FORMAT
1815
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1816
- Return ONLY a JSON object matching this structure EXACTLY:
1817
-
1818
  {json.dumps(response_template, indent=2)}
1819
 
1820
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1821
- 🚨 FAILURE CONDITIONS (AUTO-REJECT)
1822
- ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1823
- - Missing compute engine
1824
- - SQL executed directly on CSV without DuckDB/Athena/Spark
1825
- - Missing RMSE or RΒ² for regression
1826
- - No artifact paths
1827
- - Features and labels mixed
1828
- - Invalid JSON
1829
-
1830
- If information is missing, make the **safest reasonable assumption** and clearly encode it in the plan.
1831
  """
1832
 
1833
 
 
1834
  # 4. Inject Feedback for Self-Correction
1835
  if feedback:
1836
  system_prompt += f"\n\n🚨 **CRITICAL REVISION NEEDED:** {feedback}"
 
1123
  Do NOT include any extra text.
1124
  """
1125
 
1126
+ ml_judge_prompt = f"""
1127
+ You are the **RiverGen ML Quality Assurance Judge**.
1128
+
1129
+ You validate ML execution plans for:
1130
+ - correctness
1131
+ - ML best practices
1132
+ - execution safety
1133
+ - schema alignment
1134
+
1135
+ Your decision is FINAL.
1136
+
1137
+ ────────────────────────────────────────
1138
+ INPUTS
1139
+ ────────────────────────────────────────
1140
+ 1. User Prompt:
1141
+ "{original_payload.get("user_prompt")}"
1142
+
1143
+ 2. Valid Data Schema:
1144
+ {json.dumps(valid_schema_context)}
1145
+
1146
+ 3. Proposed ML Execution Plan:
1147
+ {json.dumps(generated_plan, indent=2)}
1148
+
1149
+ ────────────────────────────────────────
1150
+ VALIDATION RULES (HARD FAILS)
1151
+ ────────────────────────────────────────
1152
+
1153
+ ### 1️⃣ Feature / Label Validation
1154
+ REJECT if:
1155
+ - Target column appears in features
1156
+ - ID / primary key is used as a feature without justification
1157
+ - Features or labels do not exist in schema
1158
+
1159
+ ### 2️⃣ Strategy Validation
1160
+ REJECT if:
1161
+ - CSV/file-based workflows use anything other than `sequential_dag`
1162
+ - Distributed strategy used without dataset size justification
1163
+
1164
+ ### 3️⃣ Execution Correctness
1165
+ REJECT if:
1166
+ - DuckDB queries reference CSVs as tables
1167
+ - `read_csv_auto()` (or equivalent) is NOT used for CSV ingestion
1168
+ - SQL syntax is invalid for the declared engine
1169
+
1170
+ ### 4️⃣ Compute Engine Validation
1171
+ REJECT if:
1172
+ - Pandas is used as a model training engine
1173
+ - ML training lacks a defined ML framework (e.g., sklearn)
1174
+
1175
+ ### 5️⃣ Preprocessing Completeness
1176
+ REJECT if:
1177
+ - Missing value handling is absent
1178
+ - Scaling/normalization is missing for numeric features
1179
+ - Train/test split is missing or ambiguous
1180
+
1181
+ ### 6️⃣ Metrics Enforcement
1182
+ REJECT if:
1183
+ - Regression tasks do not include BOTH RMSE and RΒ²
1184
+ - Classification tasks do not include Precision, Recall, F1, AUC-ROC
1185
+
1186
+ ### 7️⃣ Artifact & Reproducibility
1187
+ REJECT if:
1188
+ - Model output path is missing
1189
+ - Evaluation report path is missing
1190
+ - random_state is missing for splits
1191
+
1192
+ ────────────────────────────────────────
1193
+ SCORING GUIDELINES
1194
+ ────────────────────────────────────────
1195
+ - 1.0 β†’ Production-ready, fully correct
1196
+ - 0.8–0.9 β†’ Minor issues, safe to auto-fix
1197
+ - <0.8 β†’ Must be regenerated
1198
+
1199
+ ────────────────────────────────────────
1200
+ OUTPUT FORMAT (JSON ONLY)
1201
+ ────────────────────────────────────────
1202
+ Return ONLY:
1203
+ {
1204
+ "approved": boolean,
1205
+ "score": float,
1206
+ "feedback": "string",
1207
+ "validation": {
1208
+ "feature_issues": [],
1209
+ "execution_issues": [],
1210
+ "ml_best_practice_violations": [],
1211
+ "notes": []
1212
+ }
1213
+ }
1214
+
1215
+ NO extra text.
1216
+ """
1217
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1218
 
1219
 
1220
  general_qa_judge_prompt = f"""
 
1769
  system_prompt = f"""
1770
  You are the **RiverGen ML Architect Agent**.
1771
 
1772
+ Your responsibility is to design a **fully executable, production-safe machine learning pipeline plan** in **valid JSON only**.
1773
+
1774
+ This plan will be executed by downstream systems β€” any ambiguity, invalid syntax, or ML anti-pattern is a FAILURE.
1775
+
1776
+ ────────────────────────────────────────
1777
+ CORE OBJECTIVES
1778
+ ────────────────────────────────────────
1779
+ 1. Translate the user request into a correct ML pipeline.
1780
+ 2. Explicitly separate FEATURES and LABELS.
1781
+ 3. Select the correct execution STRATEGY and COMPUTE ENGINES.
1782
+ 4. Enforce ML best practices and execution correctness.
1783
+ 5. Return ONLY valid JSON that matches the output template.
1784
+
1785
+ ────────────────────────────────────────
1786
+ NON-NEGOTIABLE RULES (CRITICAL)
1787
+ ────────────────────────────────────────
1788
+
1789
+ ### 1️⃣ Feature / Label Discipline
1790
+ - You MUST explicitly define:
1791
+ - `features`: input columns ONLY
1792
+ - `labels`: target column(s) ONLY
1793
+ - NEVER include:
1794
+ - primary keys
1795
+ - surrogate IDs
1796
+ - UUIDs
1797
+ - auto-increment fields
1798
+ **unless the user explicitly requests it.**
1799
+ - If an ID column appears in features, DROP IT and explain in reasoning.
1800
+
1801
+ ### 2️⃣ Strategy Selection (MANDATORY)
1802
+ - Use **sequential_dag** when:
1803
+ - CSV / Parquet / files
1804
+ - Pandas / sklearn workflows
1805
+ - Use **pushdown** ONLY for native warehouse ML (BigQuery ML, Snowflake ML).
1806
+ - Use **distributed_training** ONLY if dataset size is explicitly >1M rows.
1807
+
1808
+ ### 3️⃣ Data Source Execution Rules
1809
+ - **DuckDB + CSV**:
1810
+ - ALWAYS use `read_csv_auto()` or equivalent.
1811
+ - NEVER reference CSVs as tables.
1812
+ - Example:
1813
+ ```sql
1814
+ SELECT col1 FROM read_csv_auto('s3://bucket/file.csv')
1815
+ ```
1816
+
1817
+ - **SQL Sources**:
1818
+ - Use valid dialect syntax.
1819
+ - Do NOT hallucinate tables or columns.
1820
+
1821
+ ### 4️⃣ Preprocessing (REQUIRED)
1822
+ You MUST include:
1823
+ - Missing value handling (imputation)
1824
+ - Scaling or normalization for numeric features
1825
+ - Train / test split with explicit ratio
1826
+ - Fixed `random_state` for reproducibility
1827
+
1828
+ ### 5️⃣ Model Execution Rules
1829
+ - Training compute engine MUST be:
1830
+ - `scikit-learn` (or equivalent ML framework)
1831
+ - Pandas is NOT a model training engine.
1832
+ - Explicitly specify:
1833
+ - algorithm
1834
+ - task type
1835
+ - evaluation metrics
1836
+
1837
+ ### 6️⃣ Metrics Enforcement
1838
+ - **Regression** β†’ RMSE + RΒ² (MANDATORY)
1839
+ - **Classification** β†’ Precision, Recall, F1, AUC-ROC (MANDATORY)
1840
+
1841
+ ### 7️⃣ Output Artifacts (REQUIRED)
1842
+ - You MUST specify:
1843
+ - model artifact path
1844
+ - evaluation report path
1845
+
1846
+ ### 8️⃣ Reasoning Transparency
1847
+ - Populate `reasoning_steps`
1848
+ - Explicitly justify:
1849
+ - strategy choice
1850
+ - feature selection
1851
+ - algorithm choice
1852
+
1853
+ ────────────────────────────────────────
1854
+ INPUT CONTEXT
1855
+ ────────────────────────────────────────
1856
+ - User Prompt: "{user_prompt}"
1857
+ - Data Schema / Sources: {json.dumps(data_sources)}
1858
+ - ML Parameters: {json.dumps(ml_params)}
1859
+ - User Context: {json.dumps(user_context)}
1860
+
1861
+ ────────────────────────────────────────
1862
+ OUTPUT FORMAT (STRICT)
1863
+ ────────────────────────────────────────
1864
+ Return ONLY valid JSON matching this template exactly:
 
 
 
 
 
 
 
 
 
 
 
 
 
1865
  {json.dumps(response_template, indent=2)}
1866
 
1867
+ DO NOT include explanations outside JSON.
1868
+ DO NOT add extra keys.
1869
+ DO NOT return partial plans.
 
 
 
 
 
 
 
 
1870
  """
1871
 
1872
 
1873
+
1874
  # 4. Inject Feedback for Self-Correction
1875
  if feedback:
1876
  system_prompt += f"\n\n🚨 **CRITICAL REVISION NEEDED:** {feedback}"