Spaces:

ACA050
/

ModelSmith-AI

Sleeping

App Files Files Community

ACA050 commited on Feb 4

Commit

a309487

verified ·

1 Parent(s): b425234

Upload 79 files

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

Dockerfile +12 -0
README.md +62 -12
backend/__init__.py +0 -0
backend/__pycache__/__init__.cpython-310.pyc +0 -0
backend/__pycache__/__init__.cpython-313.pyc +0 -0
backend/api/__init__.py +0 -0
backend/api/__pycache__/__init__.cpython-310.pyc +0 -0
backend/api/__pycache__/__init__.cpython-313.pyc +0 -0
backend/api/__pycache__/main.cpython-310.pyc +0 -0
backend/api/__pycache__/main.cpython-313.pyc +0 -0
backend/api/main.py +80 -0
backend/config/__init__.py +0 -0
backend/core/__init__.py +0 -0
backend/core/__pycache__/__init__.cpython-310.pyc +0 -0
backend/core/__pycache__/__init__.cpython-313.pyc +0 -0
backend/core/__pycache__/dataset_analyzer.cpython-310.pyc +0 -0
backend/core/__pycache__/dataset_analyzer.cpython-313.pyc +0 -0
backend/core/__pycache__/deployment_generator.cpython-310.pyc +0 -0
backend/core/__pycache__/deployment_generator.cpython-313.pyc +0 -0
backend/core/__pycache__/explainability.cpython-310.pyc +0 -0
backend/core/__pycache__/explainability.cpython-313.pyc +0 -0
backend/core/__pycache__/model_factory.cpython-310.pyc +0 -0
backend/core/__pycache__/model_factory.cpython-313.pyc +0 -0
backend/core/__pycache__/monitoring.cpython-310.pyc +0 -0
backend/core/__pycache__/monitoring.cpython-313.pyc +0 -0
backend/core/__pycache__/orchestrator.cpython-310.pyc +0 -0
backend/core/__pycache__/orchestrator.cpython-313.pyc +0 -0
backend/core/__pycache__/problem_inference.cpython-310.pyc +0 -0
backend/core/__pycache__/problem_inference.cpython-313.pyc +0 -0
backend/core/__pycache__/strategy_reasoner.cpython-310.pyc +0 -0
backend/core/__pycache__/strategy_reasoner.cpython-313.pyc +0 -0
backend/core/dataset_analyzer.py +99 -0
backend/core/deployment_generator.py +30 -0
backend/core/explainability.py +33 -0
backend/core/model_factory.py +40 -0
backend/core/monitoring.py +13 -0
backend/core/orchestrator.py +88 -0
backend/core/problem_inference.py +13 -0
backend/core/strategy_reasoner.py +68 -0
backend/experiments/__init__.py +0 -0
backend/experiments/__pycache__/benchmark_runner.cpython-313.pyc +0 -0
backend/experiments/benchmark_runner.py +32 -0
backend/experiments/run_benchmarks.py +32 -0
backend/nlp/__pycache__/evaluators.cpython-310.pyc +0 -0
backend/nlp/__pycache__/evaluators.cpython-313.pyc +0 -0
backend/nlp/__pycache__/preprocess.cpython-310.pyc +0 -0
backend/nlp/__pycache__/preprocess.cpython-313.pyc +0 -0
backend/nlp/__pycache__/trainers.cpython-310.pyc +0 -0
backend/nlp/__pycache__/trainers.cpython-313.pyc +0 -0
backend/nlp/embeddings.py +12 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,12 @@

+FROM python:3.10-slim
+WORKDIR /app
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+EXPOSE 7860
+CMD ["uvicorn", "backend.api.main:app", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,12 +1,62 @@
----
-title: ModelSmith AI
-emoji: 🐨
-colorFrom: pink
-colorTo: indigo
-sdk: docker
-pinned: false
-license: apache-2.0
-short_description: Intelligent system that designs, explains and deploys ML sol
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# ModelSmith AI
+An intelligent ML platform that automates tabular classification and regression tasks. It analyzes datasets, recommends optimal strategies, trains models, and provides explanations.
+## Features
+- **Dataset Analysis**: Automatic detection of data types, missing values, and potential issues
+- **Strategy Reasoning**: Intelligent model selection based on dataset characteristics
+- **Automated Training**: End-to-end model training with preprocessing pipelines
+- **Explainability**: SHAP-based feature importance explanations
+- **FastAPI Backend**: RESTful API for seamless integration
+## Supported Scope
+- **Task**: Tabular classification and regression
+- **Input**: CSV files with ≥1200 rows
+- **Target**: Binary or multiclass classification, regression
+- **Features**: At least 2 usable features after preprocessing
+## API Endpoints
+- `POST /analyze`: Analyze dataset and get strategy recommendations
+- `POST /train`: Train a model on the dataset
+- `POST /explain`: Get model explanations and feature importance
+- `POST /predict`: Make predictions with trained model
+- `GET /health`: Health check
+## Deployment
+This project is designed for deployment on Hugging Face Spaces using Docker.
+### Files for Deployment
+- `Dockerfile`
+- `requirements.txt`
+- `backend/` (entire directory)
+### Running Locally
+```bash
+pip install -r requirements.txt
+uvicorn backend.api.main:app --host 0.0.0.0 --port 7860
+```
+## Limitations
+- NLP functionality is disabled
+- Requires datasets with ≥1200 rows
+- CPU-only, no GPU support
+- Stateless API (models saved temporarily)
+## Architecture
+- **Orchestrator**: Main workflow coordinator
+- **Dataset Analyzer**: Data profiling and preprocessing
+- **Strategy Reasoner**: Model selection logic
+- **Model Factory**: Training and evaluation
+- **Explainability Engine**: SHAP explanations
+## License
+MIT License

backend/__init__.py ADDED Viewed

File without changes

backend/__pycache__/__init__.cpython-310.pyc ADDED Viewed

Binary file (166 Bytes). View file

backend/__pycache__/__init__.cpython-313.pyc ADDED Viewed

Binary file (142 Bytes). View file

backend/api/__init__.py ADDED Viewed

File without changes

backend/api/__pycache__/__init__.cpython-310.pyc ADDED Viewed

Binary file (170 Bytes). View file

backend/api/__pycache__/__init__.cpython-313.pyc ADDED Viewed

Binary file (146 Bytes). View file

backend/api/__pycache__/main.cpython-310.pyc ADDED Viewed

Binary file (1.82 kB). View file

backend/api/__pycache__/main.cpython-313.pyc ADDED Viewed

Binary file (4.43 kB). View file

backend/api/main.py ADDED Viewed

	@@ -0,0 +1,80 @@

+from fastapi import FastAPI, UploadFile, File, HTTPException
+import pandas as pd
+from backend.core.orchestrator import Orchestrator
+app = FastAPI()
+orchestrator = Orchestrator()
+@app.post("/analyze")
+async def analyze_dataset(file: UploadFile = File(...), target_column: str = "target"):
+    try:
+        df = pd.read_csv(file.file)
+        result = orchestrator.run(df, target_column)
+        # Format response for frontend
+        dataset_info = result.get("dataset_info", {})
+        strategy = result.get("strategy", {})
+        response = {
+            "columns": list(df.columns),
+            "dataTypes": dataset_info.get("data_types", {}),
+            "risks": dataset_info.get("risks", []),
+            "problemType": result.get("problem_type"),
+            "confidence": strategy.get("confidence", 0),
+            "strategy": strategy
+        }
+        return response
+    except Exception as e:
+        raise HTTPException(status_code=400, detail=str(e))
+@app.post("/train")
+async def train_model(file: UploadFile = File(...), target_column: str = "target"):
+    try:
+        df = pd.read_csv(file.file)
+        result = orchestrator.run(df, target_column, train=True)
+        # Ensure strategy is included in the response
+        strategy = result.get("strategy", {})
+        response = {
+            "strategy": strategy,
+            "metrics": result.get("metrics", {}),
+            "model_path": result.get("model_path", "/path/to/model.pkl"),
+            "training_time": result.get("training_time", 0),
+            "model_id": result.get("model_id", "trained_model_123")
+        }
+        return response
+    except Exception as e:
+        raise HTTPException(status_code=400, detail=str(e))
+@app.post("/explain")
+async def explain_model(file: UploadFile = File(...), target_column: str = "target"):
+    try:
+        df = pd.read_csv(file.file)
+        result = orchestrator.run(df, target_column, train=True)
+        return {
+            "strategy_explanation": result.get("strategy_explanation"),
+            "metrics": result.get("metrics", {}),
+            "feature_importance": result.get("feature_importance", [])
+        }
+    except Exception as e:
+        raise HTTPException(status_code=400, detail=str(e))
+@app.post("/predict")
+async def predict(data: dict):
+    try:
+        # Load the trained model
+        model = orchestrator.model_io.load("exports/models/trained_model.pkl")
+        # Prepare data for prediction
+        df = pd.DataFrame([data])
+        preds = model.predict(df)
+        return {"prediction": preds.tolist()}
+    except Exception as e:
+        raise HTTPException(status_code=400, detail=str(e))
+@app.get("/health")
+def health():
+    return {"status": "ok"}

backend/config/__init__.py ADDED Viewed

File without changes

backend/core/__init__.py ADDED Viewed

File without changes

backend/core/__pycache__/__init__.cpython-310.pyc ADDED Viewed

Binary file (171 Bytes). View file

backend/core/__pycache__/__init__.cpython-313.pyc ADDED Viewed

Binary file (147 Bytes). View file

backend/core/__pycache__/dataset_analyzer.cpython-310.pyc ADDED Viewed

Binary file (3.39 kB). View file

backend/core/__pycache__/dataset_analyzer.cpython-313.pyc ADDED Viewed

Binary file (6.55 kB). View file

backend/core/__pycache__/deployment_generator.cpython-310.pyc ADDED Viewed

Binary file (1.13 kB). View file

backend/core/__pycache__/deployment_generator.cpython-313.pyc ADDED Viewed

Binary file (1.21 kB). View file

backend/core/__pycache__/explainability.cpython-310.pyc ADDED Viewed

Binary file (912 Bytes). View file

backend/core/__pycache__/explainability.cpython-313.pyc ADDED Viewed

Binary file (1.72 kB). View file

backend/core/__pycache__/model_factory.cpython-310.pyc ADDED Viewed

Binary file (1.81 kB). View file

backend/core/__pycache__/model_factory.cpython-313.pyc ADDED Viewed

Binary file (2.07 kB). View file

backend/core/__pycache__/monitoring.cpython-310.pyc ADDED Viewed

Binary file (645 Bytes). View file

backend/core/__pycache__/monitoring.cpython-313.pyc ADDED Viewed

Binary file (815 Bytes). View file

backend/core/__pycache__/orchestrator.cpython-310.pyc ADDED Viewed

Binary file (2.86 kB). View file

backend/core/__pycache__/orchestrator.cpython-313.pyc ADDED Viewed

Binary file (4.74 kB). View file

backend/core/__pycache__/problem_inference.cpython-310.pyc ADDED Viewed

Binary file (611 Bytes). View file

backend/core/__pycache__/problem_inference.cpython-313.pyc ADDED Viewed

Binary file (786 Bytes). View file

backend/core/__pycache__/strategy_reasoner.cpython-310.pyc ADDED Viewed

Binary file (2.04 kB). View file

backend/core/__pycache__/strategy_reasoner.cpython-313.pyc ADDED Viewed

Binary file (3.02 kB). View file

backend/core/dataset_analyzer.py ADDED Viewed

	@@ -0,0 +1,99 @@

+import pandas as pd
+import numpy as np
+from backend.utils.logger import logger
+def convert_numpy_types(obj):
+    """Recursively convert numpy types to Python types for JSON serialization."""
+    if isinstance(obj, np.ndarray):
+        return obj.tolist()
+    elif isinstance(obj, (np.integer, np.int64, np.int32)):
+        return int(obj)
+    elif isinstance(obj, (np.floating, np.float64, np.float32)):
+        return float(obj)
+    elif isinstance(obj, np.bool_):
+        return bool(obj)
+    elif isinstance(obj, dict):
+        return {key: convert_numpy_types(value) for key, value in obj.items()}
+    elif isinstance(obj, list):
+        return [convert_numpy_types(item) for item in obj]
+    else:
+        return obj
+class DatasetAnalyzer:
+    def analyze(self, df: pd.DataFrame, target_column: str = None):
+        logger.info("Starting dataset analysis...")
+        # Remove all-null columns
+        null_columns = df.columns[df.isnull().all()]
+        if len(null_columns) > 0:
+            logger.warning(f"Removing all-null columns: {list(null_columns)}")
+            df = df.drop(columns=null_columns)
+        # Remove duplicate rows
+        duplicate_rows = df.duplicated().sum()
+        if duplicate_rows > 0:
+            logger.warning(f"Removing {duplicate_rows} duplicate rows")
+            df = df.drop_duplicates()
+        # Remove constant columns
+        constant_columns = [col for col in df.columns if df[col].nunique() == 1]
+        if len(constant_columns) > 0:
+            logger.warning(f"Removing constant columns: {constant_columns}")
+            df = df.drop(columns=constant_columns)
+        # Ensure at least 2 usable features after preprocessing
+        usable_features = [col for col in df.columns if col != target_column]
+        if len(usable_features) < 2:
+            raise ValueError(f"Insufficient features: only {len(usable_features)} usable features after preprocessing, need at least 2")
+        info = {}
+        info["num_rows"] = df.shape[0]
+        info["num_columns"] = df.shape[1]
+        info["missing_ratio"] = df.isnull().mean().mean()
+        info["row_count"] = df.shape[0]
+        info["high_dimensional"] = bool(df.shape[1] > 50)
+        info["small_data"] = bool(df.shape[0] < 1200)
+        info["sparse_data"] = bool(df.isnull().mean().mean() > 0.4)
+        all_numeric_cols = df.select_dtypes(include="number").columns.tolist()
+        all_categorical_cols = df.select_dtypes(exclude="number").columns.tolist()
+        info["numeric_cols"] = [col for col in all_numeric_cols if col != target_column]
+        info["categorical_cols"] = [col for col in all_categorical_cols if col != target_column]
+        if len(info["numeric_cols"]) + len(info["categorical_cols"]) < 2:
+            raise ValueError("Dataset must have at least 2 usable features after preprocessing")
+        # Cardinality
+        cardinality = {col: df[col].nunique() for col in df.columns}
+        info["cardinality"] = cardinality
+        # Target-specific checks
+        if target_column and target_column in df.columns:
+            target = df[target_column]
+            unique_vals = target.nunique()
+            if target.dtype in ['int64', 'float64'] and unique_vals > 10:
+                info["target_type"] = "regression"
+                info["class_distribution"] = None
+                info["imbalance"] = None
+            else:
+                info["target_type"] = "classification"
+                value_counts = target.value_counts(normalize=True)
+                info["class_distribution"] = value_counts.to_dict()
+                info["imbalance"] = bool(value_counts.max() > 0.8)
+        else:
+            info["target_type"] = None
+            info["class_distribution"] = None
+            info["imbalance"] = None
+        # NLP detection heuristic
+        avg_text_len = None
+        text_columns = []
+        for col in info["categorical_cols"]:
+            if df[col].astype(str).str.len().mean() > 30:
+                text_columns.append(col)
+        info["text_columns"] = text_columns
+        info["possible_nlp"] = len(text_columns) > 0
+        return convert_numpy_types(info)

backend/core/deployment_generator.py ADDED Viewed

	@@ -0,0 +1,30 @@

+class DeploymentGenerator:
+    def generate_fastapi_app(self, model_path):
+        template = f'''
+from fastapi import FastAPI
+import joblib
+import pandas as pd
+app = FastAPI()
+model = joblib.load("{model_path}")
+@app.post("/predict")
+async def predict(data: dict):
+    df = pd.DataFrame([data])
+    preds = model.predict(df)
+    return {{"prediction": preds.tolist()}}
+'''
+        return template
+    def generate_dockerfile(self):
+        return '''
+FROM python:3.9
+WORKDIR /app
+COPY . /app
+RUN pip install fastapi uvicorn pandas scikit-learn joblib
+CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
+'''

backend/core/explainability.py ADDED Viewed

	@@ -0,0 +1,33 @@

+import shap
+import numpy as np
+class ExplainabilityEngine:
+    def explain_tabular(self, model_pipeline, X_sample):
+        if X_sample.empty:
+            raise ValueError("Sample data is empty, cannot compute explanations")
+        # Extract trained model and preprocessor
+        preprocessor = model_pipeline.named_steps["preprocessor"]
+        model = model_pipeline.named_steps["model"]
+        X_transformed = preprocessor.transform(X_sample)
+        if X_transformed.shape[0] == 0:
+            raise ValueError("Transformed sample data is empty after preprocessing")
+        explainer = shap.Explainer(model, X_transformed)
+        shap_values = explainer(X_transformed, check_additivity=False)
+        if shap_values is None or shap_values.values is None:
+            raise ValueError("SHAP computation failed")
+        global_importance = np.abs(shap_values.values).mean(axis=0).tolist()
+        if len(global_importance) == 0:
+            raise ValueError("No feature importance computed")
+        return global_importance

backend/core/model_factory.py ADDED Viewed

	@@ -0,0 +1,40 @@

+import os
+from sklearn.model_selection import train_test_split
+from ..tabular.pipelines import build_preprocessing_pipeline
+from ..tabular.trainers import train_model
+from ..tabular.evaluators import evaluate_model
+from ..nlp.trainers import TextClassifier
+from ..nlp.evaluators import evaluate_nlp_model
+from ..utils.model_io import ModelIO
+class ModelFactory:
+    def __init__(self):
+        self.model_io = ModelIO()
+    def build_and_train(self, df, target_column, dataset_info, problem_type, strategy):
+        if dataset_info["small_data"]:
+            raise ValueError("Dataset is too small for training. Minimum 1200 rows required.")
+        if problem_type == "nlp":
+            raise ValueError("NLP functionality is not supported in this version.")
+        else:
+            # Tabular
+            X = df.drop(columns=[target_column])
+            y = df[target_column]
+            X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
+            pipeline = build_preprocessing_pipeline(dataset_info["numeric_cols"], dataset_info["categorical_cols"])
+            pipeline.fit(X_train, y_train)
+            model = train_model(pipeline, X_train, y_train, problem_type, strategy)
+            metrics = evaluate_model(model, X_test, y_test, problem_type)
+            # Save model
+            self.model_io.save(model, "exports/models/trained_model.pkl")
+            return model, metrics

backend/core/monitoring.py ADDED Viewed

	@@ -0,0 +1,13 @@

+import numpy as np
+class MonitoringEngine:
+    def detect_drift(self, train_stats, new_data_stats, threshold=0.2):
+        drift_flags = {}
+        for feature in train_stats:
+            if abs(train_stats[feature] - new_data_stats.get(feature, train_stats[feature])) > threshold:
+                drift_flags[feature] = True
+        return drift_flags

backend/core/orchestrator.py ADDED Viewed

	@@ -0,0 +1,88 @@

+from .dataset_analyzer import DatasetAnalyzer
+from .problem_inference import ProblemInference
+from .strategy_reasoner import StrategyReasoner
+from .model_factory import ModelFactory
+from .explainability import ExplainabilityEngine
+from .deployment_generator import DeploymentGenerator
+from .monitoring import MonitoringEngine
+from ..utils.logger import logger
+from ..utils.validators import DataValidator
+from ..utils.model_io import ModelIO
+import json
+import os
+class Orchestrator:
+    def __init__(self):
+        self.validator = DataValidator()
+        self.analyzer = DatasetAnalyzer()
+        self.inferencer = ProblemInference()
+        self.reasoner = StrategyReasoner()
+        self.model_factory = ModelFactory()
+        self.explainer = ExplainabilityEngine()
+        self.deployer = DeploymentGenerator()
+        self.monitor = MonitoringEngine()
+        self.model_io = ModelIO()
+    def run(self, df, target_column, train=False):
+        self.validator.validate_dataframe(df, target_column)
+        logger.info("Validation passed")
+        dataset_info = self.analyzer.analyze(df, target_column)
+        problem_type = self.inferencer.infer(dataset_info, target_column)
+        strategy = self.reasoner.decide(dataset_info, problem_type)
+        tradeoff_explanation = self.reasoner.explain_tradeoffs(strategy)
+        # Log strategy behavior
+        log_data = {
+            "dataset_characteristics": dataset_info,
+            "chosen_model_family": strategy.get("model_family"),
+            "detected_risks": strategy.get("risks", []),
+            "confidence_score": strategy.get("confidence", 0)
+        }
+        os.makedirs("experiments/logs", exist_ok=True)
+        with open(f"experiments/logs/{target_column}_strategy.json", "w") as f:
+            json.dump(log_data, f, indent=4, default=str)
+        response = {
+            "dataset_info": dataset_info,
+            "problem_type": problem_type,
+            "strategy": strategy,
+            "strategy_tradeoffs": tradeoff_explanation
+        }
+        if problem_type == "nlp":
+            response["nlp_mode"] = "activated"
+        if train:
+            model, metrics = self.model_factory.build_and_train(
+                df, target_column, dataset_info, problem_type, strategy
+            )
+            response["metrics"] = metrics
+            explanation = self.reasoner.explain_strategy(strategy)
+            response["strategy_explanation"] = explanation
+            X_sample = df.drop(columns=[target_column]).head(100)  # Sample for SHAP
+            feature_importance = self.explainer.explain_tabular(model, X_sample)
+            response["feature_importance"] = feature_importance
+            # Save the trained model
+            os.makedirs("exports/models", exist_ok=True)
+            os.makedirs("exports/deployment", exist_ok=True)
+            model_path = "exports/models/trained_model.pkl"
+            self.model_io.save(model, model_path)
+            # Generate deployment artifacts
+            fastapi_app = self.deployer.generate_fastapi_app(model_path)
+            dockerfile = self.deployer.generate_dockerfile()
+            with open("exports/deployment/main.py", "w") as f:
+                f.write(fastapi_app)
+            with open("exports/deployment/Dockerfile", "w") as f:
+                f.write(dockerfile)
+        return response

backend/core/problem_inference.py ADDED Viewed

	@@ -0,0 +1,13 @@

+class ProblemInference:
+    def infer(self, dataset_info, target_column):
+        if dataset_info.get("possible_nlp"):
+            return "nlp"
+        if target_column:
+            return "classification" if dataset_info.get("class_distribution") else "regression"
+        return "unknown"

backend/core/strategy_reasoner.py ADDED Viewed

	@@ -0,0 +1,68 @@

+class StrategyReasoner:
+    def decide(self, dataset_info, problem_type):
+        strategy = {}
+        risks = []
+        score = 0.0
+        if dataset_info.get("small_data"):
+            risks.append("small_dataset")
+            score += 0.1
+        if dataset_info.get("high_dimensional"):
+            risks.append("high_dimensionality")
+            score += 0.1
+        if dataset_info.get("imbalance"):
+            risks.append("class_imbalance")
+            score += 0.2
+        if dataset_info.get("sparse_data"):
+            risks.append("high_missingness")
+            score += 0.2
+        if problem_type == "classification":
+            if "small_dataset" in risks:
+                model_family = "tree_ensemble"
+                reason = "Small datasets benefit from simpler models"
+            elif "high_dimensionality" in risks:
+                model_family = "tree_ensemble"
+                reason = "Tree ensembles handle high-dimensional data better"
+            else:
+                model_family = "tree_ensemble"
+                reason = "Tree ensembles handle complexity well"
+        elif problem_type == "regression":
+            if "high_dimensionality" in risks:
+                model_family = "tree_ensemble"
+                reason = "Tree ensembles handle high-dimensional data better"
+            else:
+                model_family = "linear_or_tree"
+                reason = "Balances interpretability and accuracy"
+        elif problem_type == "nlp":
+            model_family = "transformer"
+            reason = "Transformers best capture language semantics"
+        strategy["model_family"] = model_family
+        strategy["reason"] = reason
+        strategy["risks"] = risks
+        strategy["confidence"] = round(1 - min(score, 0.9), 2)
+        return strategy
+    def explain_strategy(self, strategy):
+        explanation = f"Selected {strategy['model_family']} models because: {strategy['reason']}."
+        if strategy.get("risks"):
+            explanation += f" Identified risks: {', '.join(strategy['risks'])}."
+        return explanation
+    def explain_tradeoffs(self, strategy):
+        explanation = f"Chose {strategy['model_family']} due to: {strategy['reason']}."
+        if strategy.get("risks"):
+            explanation += f" Risks detected: {', '.join(strategy['risks'])}."
+        explanation += f" Confidence score: {strategy.get('confidence')}."
+        return explanation

backend/experiments/__init__.py ADDED Viewed

File without changes

backend/experiments/__pycache__/benchmark_runner.cpython-313.pyc ADDED Viewed

Binary file (1.47 kB). View file

backend/experiments/benchmark_runner.py ADDED Viewed

	@@ -0,0 +1,32 @@

+import time
+class BenchmarkRunner:
+    def run(self, orchestrator, datasets):
+        results = []
+        for name, (df, target) in datasets.items():
+            start = time.time()
+            try:
+                output = orchestrator.run(df, target, train=True)
+                end = time.time()
+                results.append({
+                    "dataset": name,
+                    "strategy": output.get("strategy"),
+                    "metrics": output.get("metrics"),
+                    "time": round(end - start, 2),
+                    "error": None
+                })
+            except Exception as e:
+                end = time.time()
+                results.append({
+                    "dataset": name,
+                    "strategy": None,
+                    "metrics": None,
+                    "time": round(end - start, 2),
+                    "error": str(e)
+                })
+        return results

backend/experiments/run_benchmarks.py ADDED Viewed

	@@ -0,0 +1,32 @@

+import pandas as pd
+from benchmark_runner import BenchmarkRunner
+import sys
+import os
+sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..'))
+from backend.core.orchestrator import Orchestrator
+# Load datasets
+datasets = {
+    "titanic": (pd.read_csv(os.path.join(os.path.dirname(__file__), '..', '..', 'datasets', 'real_world', 'titanic.csv')), "Survived"),
+    "credit_default": (pd.read_csv(os.path.join(os.path.dirname(__file__), '..', '..', 'datasets', 'real_world', 'credit_default.csv')), "default.payment.next.month"),
+    "house_prices": (pd.read_csv(os.path.join(os.path.dirname(__file__), '..', '..', 'datasets', 'real_world', 'house_prices.csv')), "Price"),
+    "telecom_churn": (pd.read_csv(os.path.join(os.path.dirname(__file__), '..', '..', 'datasets', 'real_world', 'telecom_churn.csv')), "Churn"),
+    "news_classification": (pd.read_csv(os.path.join(os.path.dirname(__file__), '..', '..', 'datasets', 'real_world', 'news_classification.csv')), "label"),
+}
+orchestrator = Orchestrator()
+runner = BenchmarkRunner()
+results = runner.run(orchestrator, datasets)
+print("Benchmark Results:")
+for result in results:
+    print(result)
+# Save results to file
+with open("experiments/benchmark_results.json", "w") as f:
+    import json
+    json.dump(results, f, indent=4)

backend/nlp/__pycache__/evaluators.cpython-310.pyc ADDED Viewed

Binary file (747 Bytes). View file

backend/nlp/__pycache__/evaluators.cpython-313.pyc ADDED Viewed

Binary file (1.1 kB). View file

backend/nlp/__pycache__/preprocess.cpython-310.pyc ADDED Viewed

Binary file (584 Bytes). View file

backend/nlp/__pycache__/preprocess.cpython-313.pyc ADDED Viewed

Binary file (816 Bytes). View file

backend/nlp/__pycache__/trainers.cpython-310.pyc ADDED Viewed

Binary file (2.56 kB). View file

backend/nlp/__pycache__/trainers.cpython-313.pyc ADDED Viewed

Binary file (3.12 kB). View file

backend/nlp/embeddings.py ADDED Viewed

	@@ -0,0 +1,12 @@

+from sentence_transformers import SentenceTransformer
+class EmbeddingEngine:
+    def __init__(self, model_name="all-MiniLM-L6-v2"):
+        self.model = SentenceTransformer(model_name)
+    def encode(self, texts):
+        return self.model.encode(texts)