Spaces:

adsurkasur
/

arina-agripredict-analysis

Sleeping

App Files Files Community

adsurkasur commited on Sep 11, 2025

Commit

831d9e1

verified ·

1 Parent(s): 4481f1d

init

Browse files

Files changed (13) hide show

Dockerfile +35 -0
README.md +129 -11
README_spaces.md +26 -0
app.py +11 -0
main.py +268 -0
models/data_processor.py +178 -0
models/forecast_models.py +586 -0
requirements.txt +23 -0
run.py +53 -0
test_api.py +125 -0
train_catboost.py +316 -0
utils/config.py +45 -0
utils/logger.py +31 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,35 @@

+# Dockerfile for AgriPredict Analysis Service
+FROM python:3.10-slim
+# Set working directory
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    gcc \
+    g++ \
+    && rm -rf /var/lib/apt/lists/*
+# Copy requirements first for better caching
+COPY requirements.txt .
+# Install Python dependencies
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application code
+COPY . .
+# Create non-root user
+RUN useradd --create-home --shell /bin/bash app \
+    && chown -R app:app /app
+USER app
+# Expose port
+EXPOSE 7860
+# Health check
+HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:7860/health || exit 1
+# Start the application
+CMD ["python", "main.py"]

README.md CHANGED Viewed

@@ -1,11 +1,129 @@
----
-title: Arina Agripredict Analysis
-emoji: 📚
-colorFrom: blue
-colorTo: green
-sdk: docker
-pinned: false
-license: apache-2.0
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# AgriPredict Analysis Service
+A FastAPI-based service for advanced agricultural demand forecasting using multiple ML models including ensemble methods, statistical models, and machine learning algorithms.
+## Features
+- **Multi-Model Forecasting**: Ensemble, ARIMA, Exponential Smoothing, CatBoost, and more
+- **Scenario Planning**: Optimistic, pessimistic, and realistic forecast scenarios
+- **Confidence Intervals**: Uncertainty quantification for all predictions
+- **Revenue Projections**: Automatic revenue forecasting based on demand predictions
+- **Real-time Processing**: Asynchronous processing for high performance
+- **RESTful API**: Clean, documented API endpoints
+## API Endpoints
+### Health Check
+```
+GET /health
+```
+Returns service health status and version information.
+### Generate Forecast
+```
+POST /forecast
+```
+Generate demand forecast using specified models and parameters.
+**Request Body:**
+```json
+{
+  "product_id": "string",
+  "historical_data": [
+    {
+      "date": "2024-01-01",
+      "quantity": 100.0,
+      "price": 25.0
+    }
+  ],
+  "days": 30,
+  "selling_price": 25.0,
+  "models": ["ensemble"],
+  "include_confidence": true,
+  "scenario": "realistic"
+}
+```
+### List Models
+```
+GET /models
+```
+Returns list of available forecasting models.
+## Models Available
+1. **Ensemble** - Combines multiple models for best accuracy
+2. **SMA** - Simple Moving Average (basic trend analysis)
+3. **WMA** - Weighted Moving Average (recent data weighted more)
+4. **ES** - Exponential Smoothing (seasonal trend analysis)
+5. **ARIMA** - Statistical time series model
+6. **CatBoost** - Machine learning model (ready for training)
+## Usage
+### Local Development
+1. Install dependencies:
+```bash
+pip install -r requirements.txt
+```
+2. Run the service:
+```bash
+python main.py
+```
+The API will be available at `http://localhost:8000`
+### API Documentation
+Once running, visit `http://localhost:8000/docs` for interactive API documentation.
+## Deployment
+This service is designed to run on Hugging Face Spaces with the following configuration:
+- **Runtime**: Python 3.10+
+- **Framework**: FastAPI
+- **GPU**: Not required (CPU-only ML models)
+- **Memory**: 2GB minimum recommended
+## Training the CatBoost Model
+The CatBoost model is currently using a placeholder implementation. To train it with real data:
+1. Prepare your training dataset with features like:
+   - Historical prices and quantities
+   - Date-based features (day of week, month, etc.)
+   - Lag features (previous days' data)
+   - Rolling statistics
+2. Train the model using the prepared dataset
+3. Replace the placeholder implementation in `models/forecast_models.py`
+## Architecture
+```
+analysis-service/
+├── main.py              # FastAPI application
+├── models/
+│   ├── forecast_models.py    # Forecasting algorithms
+│   └── data_processor.py     # Data processing utilities
+├── utils/
+│   ├── config.py            # Configuration settings
+│   └── logger.py            # Logging setup
+└── requirements.txt         # Python dependencies
+```
+## Contributing
+1. Fork the repository
+2. Create a feature branch
+3. Make your changes
+4. Add tests if applicable
+5. Submit a pull request
+## License
+MIT License - see LICENSE file for details.

README_spaces.md ADDED Viewed

	@@ -0,0 +1,26 @@

+title: AgriPredict Analysis Service
+emoji: 🌾
+colorFrom: green
+colorTo: blue
+sdk: gradio
+sdk_version: "4.0.0"
+app_file: main.py
+pinned: false
+# Hugging Face Spaces Configuration for AgriPredict Analysis Service
+# This service provides advanced agricultural demand forecasting
+# Python version requirement
+python_version: "3.10"
+# Build configuration
+build:
+  python_version: "3.10"
+# Environment variables
+env:
+  PORT: 7860
+  PYTHONPATH: /app
+# Startup command
+start_command: "python main.py"

app.py ADDED Viewed

	@@ -0,0 +1,11 @@

+---
+title: AgriPredict Analysis Service
+emoji: 🌾
+colorFrom: green
+colorTo: blue
+sdk: docker
+sdk_version: null
+app_file: main.py
+pinned: false
+license: mit
+---

main.py ADDED Viewed

	@@ -0,0 +1,268 @@

+"""
+AgriPredict Analysis Service
+A FastAPI-based service for agricultural demand forecasting using multiple ML models.
+"""
+from fastapi import FastAPI, HTTPException, Depends
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.responses import JSONResponse
+from pydantic import BaseModel, Field
+from typing import List, Dict, Any, Optional
+import pandas as pd
+import numpy as np
+from datetime import datetime, timedelta
+import logging
+import os
+from contextlib import asynccontextmanager
+# Import our custom modules
+from models.forecast_models import ForecastEngine
+from models.data_processor import DataProcessor
+from utils.config import settings
+from utils.logger import setup_logger
+# Setup logging
+logger = setup_logger(__name__)
+# Lifespan context manager for startup/shutdown events
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    # Startup
+    logger.info("Starting AgriPredict Analysis Service")
+    yield
+    # Shutdown
+    logger.info("Shutting down AgriPredict Analysis Service")
+# Create FastAPI app
+app = FastAPI(
+    title="AgriPredict Analysis Service",
+    description="Advanced agricultural demand forecasting using ensemble ML models",
+    version="1.0.0",
+    lifespan=lifespan
+)
+# CORS middleware for Next.js integration
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=[
+        "http://localhost:3000",
+        "http://localhost:3001",
+        "https://*.huggingface.co",
+        "https://huggingface.co",
+        os.getenv("FRONTEND_URL", "*")
+    ],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# Data Models
+class DemandData(BaseModel):
+    date: str = Field(..., description="ISO date string")
+    quantity: float = Field(..., gt=0, description="Demand quantity")
+    price: float = Field(..., gt=0, description="Price per unit")
+class ForecastRequest(BaseModel):
+    product_id: str = Field(..., description="Product identifier")
+    historical_data: List[DemandData] = Field(..., min_items=3, description="Historical demand data")
+    days: int = Field(..., ge=1, le=365, description="Forecast horizon in days")
+    selling_price: Optional[float] = Field(None, gt=0, description="Selling price for revenue calculation")
+    date_from: Optional[str] = Field(None, description="Start date for historical data filter")
+    date_to: Optional[str] = Field(None, description="End date for historical data filter")
+    models: Optional[List[str]] = Field(["ensemble"], description="Models to use for forecasting")
+    include_confidence: Optional[bool] = Field(True, description="Include confidence intervals")
+    scenario: Optional[str] = Field("realistic", description="Forecast scenario")
+class ForecastDataPoint(BaseModel):
+    date: str = Field(..., description="Forecast date")
+    predicted_value: float = Field(..., description="Predicted demand/price")
+    confidence_lower: Optional[float] = Field(None, description="Lower confidence bound")
+    confidence_upper: Optional[float] = Field(None, description="Upper confidence bound")
+    model_used: Optional[str] = Field(None, description="Model that generated this prediction")
+class RevenueProjection(BaseModel):
+    date: str = Field(..., description="Projection date")
+    projected_quantity: float = Field(..., description="Projected quantity")
+    selling_price: float = Field(..., description="Selling price")
+    projected_revenue: float = Field(..., description="Projected revenue")
+    confidence_lower: Optional[float] = Field(None, description="Lower revenue confidence")
+    confidence_upper: Optional[float] = Field(None, description="Upper revenue confidence")
+class ForecastResponse(BaseModel):
+    forecast_data: List[ForecastDataPoint] = Field(..., description="Forecast data points")
+    revenue_projection: Optional[List[RevenueProjection]] = Field(None, description="Revenue projections")
+    models_used: List[str] = Field(..., description="Models used in forecasting")
+    summary: str = Field(..., description="AI-generated summary in Markdown")
+    confidence: Optional[float] = Field(None, description="Overall forecast confidence")
+    scenario: Optional[str] = Field(None, description="Applied scenario")
+    metadata: Optional[Dict[str, Any]] = Field(None, description="Additional metadata")
+# Dependency injection
+def get_forecast_engine() -> ForecastEngine:
+    """Dependency injection for forecast engine"""
+    return ForecastEngine()
+def get_data_processor() -> DataProcessor:
+    """Dependency injection for data processor"""
+    return DataProcessor()
+# API Endpoints
+@app.get("/health")
+async def health_check():
+    """Health check endpoint"""
+    return {
+        "status": "healthy",
+        "service": "analysis-service",
+        "timestamp": datetime.utcnow().isoformat(),
+        "version": "1.0.0"
+    }
+@app.post("/forecast", response_model=ForecastResponse)
+async def generate_forecast(
+    request: ForecastRequest,
+    forecast_engine: ForecastEngine = Depends(get_forecast_engine),
+    data_processor: DataProcessor = Depends(get_data_processor)
+):
+    """
+    Generate demand forecast using ensemble ML models
+    """
+    try:
+        logger.info(f"Generating forecast for product {request.product_id}")
+        # Process and validate data
+        df = data_processor.process_historical_data(request.historical_data)
+        if len(df) < 3:
+            raise HTTPException(
+                status_code=400,
+                detail="Insufficient historical data. Need at least 3 data points."
+            )
+        # Generate forecast
+        forecast_result = await forecast_engine.generate_forecast(
+            df=df,
+            days=request.days,
+            models=request.models or ["ensemble"],
+            include_confidence=request.include_confidence,
+            scenario=request.scenario
+        )
+        # Calculate revenue projection if selling price provided
+        revenue_projection = None
+        if request.selling_price and request.selling_price > 0:
+            revenue_projection = forecast_engine.calculate_revenue_projection(
+                forecast_data=forecast_result["forecast_data"],
+                selling_price=request.selling_price,
+                historical_data=df
+            )
+        # Generate AI summary
+        summary = forecast_engine.generate_summary(
+            forecast_data=forecast_result["forecast_data"],
+            historical_data=df,
+            models_used=forecast_result["models_used"],
+            scenario=request.scenario
+        )
+        # Calculate overall confidence
+        confidence = forecast_engine.calculate_overall_confidence(
+            forecast_data=forecast_result["forecast_data"]
+        )
+        # Prepare metadata
+        metadata = {
+            "data_points": len(df),
+            "forecast_horizon": request.days,
+            "product_id": request.product_id,
+            "generated_at": datetime.utcnow().isoformat(),
+            "scenario": request.scenario
+        }
+        response = ForecastResponse(
+            forecast_data=forecast_result["forecast_data"],
+            revenue_projection=revenue_projection,
+            models_used=forecast_result["models_used"],
+            summary=summary,
+            confidence=confidence,
+            scenario=request.scenario,
+            metadata=metadata
+        )
+        logger.info(f"Successfully generated forecast for product {request.product_id}")
+        return response
+    except Exception as e:
+        logger.error(f"Forecast generation failed: {str(e)}")
+        raise HTTPException(
+            status_code=500,
+            detail=f"Forecast generation failed: {str(e)}"
+        )
+@app.get("/models")
+async def list_available_models():
+    """List all available forecasting models"""
+    return {
+        "models": [
+            {
+                "id": "ensemble",
+                "name": "Ensemble (Recommended)",
+                "description": "Combines multiple models for best accuracy",
+                "type": "ensemble"
+            },
+            {
+                "id": "sma",
+                "name": "Simple Moving Average",
+                "description": "Basic trend analysis",
+                "type": "statistical"
+            },
+            {
+                "id": "wma",
+                "name": "Weighted Moving Average",
+                "description": "Recent data weighted more",
+                "type": "statistical"
+            },
+            {
+                "id": "es",
+                "name": "Exponential Smoothing",
+                "description": "Seasonal trend analysis",
+                "type": "statistical"
+            },
+            {
+                "id": "arima",
+                "name": "ARIMA",
+                "description": "Statistical time series model",
+                "type": "statistical"
+            },
+            {
+                "id": "catboost",
+                "name": "CatBoost",
+                "description": "Machine learning model",
+                "type": "ml"
+            }
+        ]
+    }
+# Error handlers
+@app.exception_handler(HTTPException)
+async def http_exception_handler(request, exc):
+    return JSONResponse(
+        status_code=exc.status_code,
+        content={"detail": exc.detail}
+    )
+@app.exception_handler(Exception)
+async def general_exception_handler(request, exc):
+    logger.error(f"Unhandled exception: {str(exc)}")
+    return JSONResponse(
+        status_code=500,
+        content={"detail": "Internal server error"}
+    )
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(
+        "main:app",
+        host="0.0.0.0",
+        port=int(os.getenv("PORT", 8000)),
+        reload=True
+    )

models/data_processor.py ADDED Viewed

	@@ -0,0 +1,178 @@

+"""
+Data processing utilities for AgriPredict Analysis Service
+"""
+import pandas as pd
+import numpy as np
+from datetime import datetime
+from typing import List, Dict, Any
+from utils.logger import setup_logger
+from utils.config import settings
+logger = setup_logger(__name__)
+class DataProcessor:
+    """Handles data processing and validation for forecasting"""
+    def __init__(self):
+        self.logger = logger
+    def process_historical_data(self, historical_data: List[Dict[str, Any]]) -> pd.DataFrame:
+        """
+        Process and validate historical demand data
+        Args:
+            historical_data: List of demand data points
+        Returns:
+            Processed pandas DataFrame
+        """
+        try:
+            self.logger.info(f"Processing {len(historical_data)} historical data points")
+            # Convert to DataFrame
+            df = pd.DataFrame(historical_data)
+            # Validate required columns
+            required_columns = ['date', 'quantity', 'price']
+            missing_columns = [col for col in required_columns if col not in df.columns]
+            if missing_columns:
+                raise ValueError(f"Missing required columns: {missing_columns}")
+            # Convert date column
+            df['date'] = pd.to_datetime(df['date'])
+            # Validate data types and ranges
+            df['quantity'] = pd.to_numeric(df['quantity'], errors='coerce')
+            df['price'] = pd.to_numeric(df['price'], errors='coerce')
+            # Remove invalid data
+            df = df.dropna(subset=['quantity', 'price'])
+            df = df[df['quantity'] > 0]
+            df = df[df['price'] > 0]
+            # Sort by date
+            df = df.sort_values('date').reset_index(drop=True)
+            # Remove duplicates based on date
+            df = df.drop_duplicates(subset=['date'], keep='last')
+            # Limit data points if too many
+            if len(df) > settings.MAX_DATA_POINTS:
+                self.logger.warning(f"Limiting data from {len(df)} to {settings.MAX_DATA_POINTS} points")
+                df = df.tail(settings.MAX_DATA_POINTS)
+            self.logger.info(f"Successfully processed {len(df)} data points")
+            return df
+        except Exception as e:
+            self.logger.error(f"Data processing failed: {str(e)}")
+            raise
+    def validate_data_quality(self, df: pd.DataFrame) -> Dict[str, Any]:
+        """
+        Validate data quality and return metrics
+        Args:
+            df: Processed DataFrame
+        Returns:
+            Dictionary with quality metrics
+        """
+        try:
+            quality_metrics = {
+                'total_points': len(df),
+                'date_range': {
+                    'start': df['date'].min().isoformat() if len(df) > 0 else None,
+                    'end': df['date'].max().isoformat() if len(df) > 0 else None
+                },
+                'missing_values': {
+                    'quantity': df['quantity'].isnull().sum(),
+                    'price': df['price'].isnull().sum()
+                },
+                'outliers': {
+                    'quantity': self._detect_outliers(df['quantity']),
+                    'price': self._detect_outliers(df['price'])
+                },
+                'data_completeness': self._calculate_completeness(df)
+            }
+            return quality_metrics
+        except Exception as e:
+            self.logger.error(f"Quality validation failed: {str(e)}")
+            return {}
+    def _detect_outliers(self, series: pd.Series) -> int:
+        """Detect outliers using IQR method"""
+        try:
+            Q1 = series.quantile(0.25)
+            Q3 = series.quantile(0.75)
+            IQR = Q3 - Q1
+            lower_bound = Q1 - 1.5 * IQR
+            upper_bound = Q3 + 1.5 * IQR
+            outliers = ((series < lower_bound) | (series > upper_bound)).sum()
+            return int(outliers)
+        except:
+            return 0
+    def _calculate_completeness(self, df: pd.DataFrame) -> float:
+        """Calculate data completeness percentage"""
+        try:
+            total_cells = len(df) * 2  # quantity and price columns
+            missing_cells = df[['quantity', 'price']].isnull().sum().sum()
+            completeness = ((total_cells - missing_cells) / total_cells) * 100
+            return round(completeness, 2)
+        except:
+            return 0.0
+    def prepare_features_for_ml(self, df: pd.DataFrame) -> pd.DataFrame:
+        """
+        Prepare features for machine learning models
+        Args:
+            df: Processed DataFrame
+        Returns:
+            DataFrame with engineered features
+        """
+        try:
+            # Create feature engineering
+            feature_df = df.copy()
+            # Date-based features
+            feature_df['day_of_week'] = feature_df['date'].dt.dayofweek
+            feature_df['month'] = feature_df['date'].dt.month
+            feature_df['day_of_month'] = feature_df['date'].dt.day
+            feature_df['quarter'] = feature_df['date'].dt.quarter
+            # Lag features
+            for lag in [1, 7, 14, 30]:
+                if len(feature_df) > lag:
+                    feature_df[f'price_lag_{lag}'] = feature_df['price'].shift(lag)
+                    feature_df[f'quantity_lag_{lag}'] = feature_df['quantity'].shift(lag)
+            # Rolling statistics
+            for window in [7, 14, 30]:
+                if len(feature_df) > window:
+                    feature_df[f'price_rolling_mean_{window}'] = feature_df['price'].rolling(window).mean()
+                    feature_df[f'price_rolling_std_{window}'] = feature_df['price'].rolling(window).std()
+                    feature_df[f'quantity_rolling_mean_{window}'] = feature_df['quantity'].rolling(window).mean()
+            # Price change features
+            feature_df['price_change'] = feature_df['price'].pct_change()
+            feature_df['price_change_7d'] = feature_df['price'].pct_change(7)
+            # Volume-weighted features
+            feature_df['value'] = feature_df['quantity'] * feature_df['price']
+            # Drop rows with NaN values created by lag features
+            feature_df = feature_df.dropna()
+            self.logger.info(f"Created {len(feature_df.columns) - len(df.columns)} additional features")
+            return feature_df
+        except Exception as e:
+            self.logger.error(f"Feature engineering failed: {str(e)}")
+            return df

models/forecast_models.py ADDED Viewed

	@@ -0,0 +1,586 @@

+"""
+Forecasting models for AgriPredict Analysis Service
+"""
+import pandas as pd
+import numpy as np
+from datetime import datetime, timedelta
+from typing import List, Dict, Any, Optional
+from dataclasses import dataclass
+import asyncio
+from concurrent.futures import ThreadPoolExecutor
+import traceback
+# Import ML libraries (will be available when deployed)
+try:
+    from statsmodels.tsa.holtwinters import ExponentialSmoothing
+    from statsmodels.tsa.arima.model import ARIMA
+    from catboost import CatBoostRegressor
+    STATS_MODELS_AVAILABLE = True
+    CATBOOST_AVAILABLE = True
+except ImportError:
+    STATS_MODELS_AVAILABLE = False
+    CATBOOST_AVAILABLE = False
+from utils.logger import setup_logger
+from utils.config import settings
+logger = setup_logger(__name__)
+@dataclass
+class ForecastResult:
+    """Container for forecast results"""
+    values: List[float]
+    confidence_lower: Optional[List[float]] = None
+    confidence_upper: Optional[List[float]] = None
+    model_name: str = ""
+class ForecastEngine:
+    """Main forecasting engine with multiple models"""
+    def __init__(self):
+        self.logger = logger
+        self.executor = ThreadPoolExecutor(max_workers=4)
+    async def generate_forecast(
+        self,
+        df: pd.DataFrame,
+        days: int,
+        models: List[str],
+        include_confidence: bool = True,
+        scenario: str = "realistic"
+    ) -> Dict[str, Any]:
+        """
+        Generate forecast using specified models
+        Args:
+            df: Historical data DataFrame
+            days: Number of days to forecast
+            models: List of model names to use
+            include_confidence: Whether to include confidence intervals
+            scenario: Forecast scenario (optimistic, pessimistic, realistic)
+        Returns:
+            Dictionary with forecast results
+        """
+        try:
+            self.logger.info(f"Generating {days}-day forecast using models: {models}")
+            # Apply scenario adjustment
+            scenario_multiplier = self._get_scenario_multiplier(scenario)
+            df = df.copy()
+            df['price'] = df['price'] * scenario_multiplier
+            # Generate forecasts from different models
+            forecast_tasks = []
+            model_results = {}
+            for model_name in models:
+                if model_name.lower() == 'ensemble':
+                    # Ensemble uses all available models
+                    continue
+                elif hasattr(self, f'_generate_{model_name.lower()}_forecast'):
+                    task = asyncio.get_event_loop().run_in_executor(
+                        self.executor,
+                        getattr(self, f'_generate_{model_name.lower()}_forecast'),
+                        df.copy(),
+                        days,
+                        include_confidence
+                    )
+                    forecast_tasks.append((model_name, task))
+            # Wait for all model forecasts
+            if forecast_tasks:
+                results = await asyncio.gather(*[task for _, task in forecast_tasks], return_exceptions=True)
+                for (model_name, _), result in zip(forecast_tasks, results):
+                    if isinstance(result, Exception):
+                        self.logger.warning(f"Model {model_name} failed: {str(result)}")
+                        continue
+                    if result and result.values:
+                        model_results[model_name] = result
+            # If no models succeeded, use fallback
+            if not model_results:
+                self.logger.warning("All models failed, using fallback forecast")
+                fallback_result = self._generate_fallback_forecast(df, days)
+                model_results['Fallback'] = fallback_result
+            # Generate ensemble forecast if requested
+            if 'ensemble' in [m.lower() for m in models]:
+                ensemble_result = self._generate_ensemble_forecast(model_results, days, include_confidence)
+                model_results['Ensemble'] = ensemble_result
+            # Prepare final forecast data
+            final_forecast = self._prepare_forecast_data(model_results, df, days)
+            return {
+                "forecast_data": final_forecast,
+                "models_used": list(model_results.keys()),
+                "scenario": scenario
+            }
+        except Exception as e:
+            self.logger.error(f"Forecast generation failed: {str(e)}")
+            raise
+    def _get_scenario_multiplier(self, scenario: str) -> float:
+        """Get multiplier for scenario adjustment"""
+        multipliers = {
+            'optimistic': 1.1,  # 10% increase
+            'pessimistic': 0.9,  # 10% decrease
+            'realistic': 1.0    # No change
+        }
+        return multipliers.get(scenario.lower(), 1.0)
+    def _generate_sma_forecast(
+        self,
+        df: pd.DataFrame,
+        days: int,
+        include_confidence: bool = True
+    ) -> ForecastResult:
+        """Simple Moving Average forecast"""
+        try:
+            if len(df) < 7:
+                raise ValueError("Insufficient data for SMA")
+            window = min(7, len(df))
+            sma_value = df['price'].rolling(window=window).mean().iloc[-1]
+            if pd.isna(sma_value):
+                sma_value = df['price'].mean()
+            values = [float(sma_value)] * days
+            # Simple confidence interval
+            std_dev = df['price'].std()
+            confidence_lower = [v - std_dev * 0.5 for v in values] if include_confidence else None
+            confidence_upper = [v + std_dev * 0.5 for v in values] if include_confidence else None
+            return ForecastResult(
+                values=values,
+                confidence_lower=confidence_lower,
+                confidence_upper=confidence_upper,
+                model_name="SMA"
+            )
+        except Exception as e:
+            self.logger.error(f"SMA forecast failed: {str(e)}")
+            raise
+    def _generate_wma_forecast(
+        self,
+        df: pd.DataFrame,
+        days: int,
+        include_confidence: bool = True
+    ) -> ForecastResult:
+        """Weighted Moving Average forecast"""
+        try:
+            if len(df) < 7:
+                raise ValueError("Insufficient data for WMA")
+            window = min(7, len(df))
+            weights = np.arange(1, window + 1)
+            weights = weights / weights.sum()
+            wma_value = (df['price'].tail(window) * weights).sum()
+            if pd.isna(wma_value):
+                wma_value = df['price'].mean()
+            values = [float(wma_value)] * days
+            # Confidence interval
+            std_dev = df['price'].std()
+            confidence_lower = [v - std_dev * 0.3 for v in values] if include_confidence else None
+            confidence_upper = [v + std_dev * 0.3 for v in values] if include_confidence else None
+            return ForecastResult(
+                values=values,
+                confidence_lower=confidence_lower,
+                confidence_upper=confidence_upper,
+                model_name="WMA"
+            )
+        except Exception as e:
+            self.logger.error(f"WMA forecast failed: {str(e)}")
+            raise
+    def _generate_es_forecast(
+        self,
+        df: pd.DataFrame,
+        days: int,
+        include_confidence: bool = True
+    ) -> ForecastResult:
+        """Exponential Smoothing forecast"""
+        try:
+            if not STATS_MODELS_AVAILABLE:
+                raise ImportError("statsmodels not available")
+            if len(df) < 7:
+                raise ValueError("Insufficient data for Exponential Smoothing")
+            # Prepare data for exponential smoothing
+            ts_data = df.set_index('date')['price']
+            model = ExponentialSmoothing(ts_data, seasonal='add', seasonal_periods=7)
+            fitted_model = model.fit()
+            forecast = fitted_model.forecast(days)
+            values = forecast.values.tolist()
+            # Get confidence intervals if available
+            if include_confidence:
+                try:
+                    pred = fitted_model.get_prediction()
+                    confidence_intervals = pred.conf_int()
+                    confidence_lower = confidence_intervals.iloc[:, 0].tail(days).values.tolist()
+                    confidence_upper = confidence_intervals.iloc[:, 1].tail(days).values.tolist()
+                except:
+                    # Fallback confidence interval
+                    std_dev = df['price'].std()
+                    confidence_lower = [v - std_dev for v in values]
+                    confidence_upper = [v + std_dev for v in values]
+            else:
+                confidence_lower = None
+                confidence_upper = None
+            return ForecastResult(
+                values=values,
+                confidence_lower=confidence_lower,
+                confidence_upper=confidence_upper,
+                model_name="ES"
+            )
+        except Exception as e:
+            self.logger.error(f"ES forecast failed: {str(e)}")
+            raise
+    def _generate_arima_forecast(
+        self,
+        df: pd.DataFrame,
+        days: int,
+        include_confidence: bool = True
+    ) -> ForecastResult:
+        """ARIMA forecast"""
+        try:
+            if not STATS_MODELS_AVAILABLE:
+                raise ImportError("statsmodels not available")
+            if len(df) < 10:
+                raise ValueError("Insufficient data for ARIMA")
+            # Prepare data
+            ts_data = df.set_index('date')['price']
+            model = ARIMA(ts_data, order=(5, 1, 0))
+            fitted_model = model.fit()
+            forecast = fitted_model.forecast(days)
+            values = forecast.values.tolist()
+            # Get confidence intervals
+            if include_confidence:
+                try:
+                    pred = fitted_model.get_forecast(days)
+                    confidence_intervals = pred.conf_int()
+                    confidence_lower = confidence_intervals.iloc[:, 0].values.tolist()
+                    confidence_upper = confidence_intervals.iloc[:, 1].values.tolist()
+                except:
+                    # Fallback confidence interval
+                    std_dev = df['price'].std()
+                    confidence_lower = [v - std_dev for v in values]
+                    confidence_upper = [v + std_dev for v in values]
+            else:
+                confidence_lower = None
+                confidence_upper = None
+            return ForecastResult(
+                values=values,
+                confidence_lower=confidence_lower,
+                confidence_upper=confidence_upper,
+                model_name="ARIMA"
+            )
+        except Exception as e:
+            self.logger.error(f"ARIMA forecast failed: {str(e)}")
+            raise
+    def _generate_catboost_forecast(
+        self,
+        df: pd.DataFrame,
+        days: int,
+        include_confidence: bool = True
+    ) -> ForecastResult:
+        """CatBoost forecast (placeholder for future training)"""
+        try:
+            if not CATBOOST_AVAILABLE:
+                raise ImportError("CatBoost not available")
+            if len(df) < 10:
+                raise ValueError("Insufficient data for CatBoost")
+            # For now, use a simple fallback since model isn't trained yet
+            # This will be replaced with actual trained model later
+            self.logger.info("Using CatBoost placeholder (model not trained yet)")
+            # Simple trend-based forecast as placeholder
+            recent_trend = df['price'].pct_change().mean()
+            last_price = df['price'].iloc[-1]
+            values = []
+            for i in range(days):
+                trend_factor = 1 + (recent_trend * (i + 1) / days)
+                predicted_price = last_price * trend_factor
+                values.append(float(predicted_price))
+            # Simple confidence intervals
+            std_dev = df['price'].std()
+            confidence_lower = [v - std_dev for v in values] if include_confidence else None
+            confidence_upper = [v + std_dev for v in values] if include_confidence else None
+            return ForecastResult(
+                values=values,
+                confidence_lower=confidence_lower,
+                confidence_upper=confidence_upper,
+                model_name="CatBoost"
+            )
+        except Exception as e:
+            self.logger.error(f"CatBoost forecast failed: {str(e)}")
+            raise
+    def _generate_fallback_forecast(self, df: pd.DataFrame, days: int) -> ForecastResult:
+        """Fallback forecast using simple average"""
+        try:
+            avg_price = df['price'].mean()
+            values = [float(avg_price)] * days
+            # Wide confidence intervals for fallback
+            std_dev = df['price'].std() if len(df) > 1 else avg_price * 0.1
+            confidence_lower = [v - std_dev * 2 for v in values]
+            confidence_upper = [v + std_dev * 2 for v in values]
+            return ForecastResult(
+                values=values,
+                confidence_lower=confidence_lower,
+                confidence_upper=confidence_upper,
+                model_name="Fallback"
+            )
+        except Exception as e:
+            self.logger.error(f"Fallback forecast failed: {str(e)}")
+            # Ultimate fallback
+            return ForecastResult(
+                values=[100.0] * days,
+                confidence_lower=[80.0] * days,
+                confidence_upper=[120.0] * days,
+                model_name="Fallback"
+            )
+    def _generate_ensemble_forecast(
+        self,
+        model_results: Dict[str, ForecastResult],
+        days: int,
+        include_confidence: bool = True
+    ) -> ForecastResult:
+        """Generate ensemble forecast from multiple models"""
+        try:
+            if not model_results:
+                raise ValueError("No model results available for ensemble")
+            # Average predictions from all models
+            all_values = []
+            for result in model_results.values():
+                if len(result.values) >= days:
+                    all_values.append(result.values[:days])
+            if not all_values:
+                raise ValueError("No valid predictions for ensemble")
+            # Calculate ensemble predictions
+            ensemble_values = []
+            for i in range(days):
+                day_predictions = [values[i] for values in all_values if i < len(values)]
+                ensemble_values.append(np.mean(day_predictions))
+            # Calculate ensemble confidence intervals
+            if include_confidence:
+                all_lower = []
+                all_upper = []
+                for result in model_results.values():
+                    if result.confidence_lower and len(result.confidence_lower) >= days:
+                        all_lower.append(result.confidence_lower[:days])
+                    if result.confidence_upper and len(result.confidence_upper) >= days:
+                        all_upper.append(result.confidence_upper[:days])
+                if all_lower and all_upper:
+                    confidence_lower = [np.mean([lower[i] for lower in all_lower]) for i in range(days)]
+                    confidence_upper = [np.mean([upper[i] for upper in all_upper]) for i in range(days)]
+                else:
+                    # Fallback confidence intervals
+                    std_dev = np.std(ensemble_values)
+                    confidence_lower = [v - std_dev for v in ensemble_values]
+                    confidence_upper = [v + std_dev for v in ensemble_values]
+            else:
+                confidence_lower = None
+                confidence_upper = None
+            return ForecastResult(
+                values=ensemble_values,
+                confidence_lower=confidence_lower,
+                confidence_upper=confidence_upper,
+                model_name="Ensemble"
+            )
+        except Exception as e:
+            self.logger.error(f"Ensemble forecast failed: {str(e)}")
+            raise
+    def _prepare_forecast_data(
+        self,
+        model_results: Dict[str, ForecastResult],
+        df: pd.DataFrame,
+        days: int
+    ) -> List[Dict[str, Any]]:
+        """Prepare final forecast data for API response"""
+        try:
+            last_date = df['date'].max()
+            forecast_data = []
+            for i in range(days):
+                forecast_date = last_date + timedelta(days=i+1)
+                # Use ensemble if available, otherwise use first available model
+                if 'Ensemble' in model_results:
+                    result = model_results['Ensemble']
+                else:
+                    result = next(iter(model_results.values()))
+                data_point = {
+                    "date": forecast_date.isoformat(),
+                    "predicted_value": round(result.values[i], 2),
+                    "model_used": result.model_name
+                }
+                if result.confidence_lower and i < len(result.confidence_lower):
+                    data_point["confidence_lower"] = round(result.confidence_lower[i], 2)
+                if result.confidence_upper and i < len(result.confidence_upper):
+                    data_point["confidence_upper"] = round(result.confidence_upper[i], 2)
+                forecast_data.append(data_point)
+            return forecast_data
+        except Exception as e:
+            self.logger.error(f"Forecast data preparation failed: {str(e)}")
+            raise
+    def calculate_revenue_projection(
+        self,
+        forecast_data: List[Dict[str, Any]],
+        selling_price: float,
+        historical_data: pd.DataFrame
+    ) -> List[Dict[str, Any]]:
+        """Calculate revenue projections"""
+        try:
+            # Use average quantity from historical data
+            avg_quantity = historical_data['quantity'].mean()
+            revenue_projection = []
+            for point in forecast_data:
+                projected_quantity = avg_quantity
+                projected_revenue = projected_quantity * selling_price
+                projection = {
+                    "date": point["date"],
+                    "projected_quantity": round(float(projected_quantity), 2),
+                    "selling_price": round(float(selling_price), 2),
+                    "projected_revenue": round(float(projected_revenue), 2)
+                }
+                # Add confidence intervals if available
+                if "confidence_lower" in point:
+                    projection["confidence_lower"] = round(point["confidence_lower"] * projected_quantity, 2)
+                if "confidence_upper" in point:
+                    projection["confidence_upper"] = round(point["confidence_upper"] * projected_quantity, 2)
+                revenue_projection.append(projection)
+            return revenue_projection
+        except Exception as e:
+            self.logger.error(f"Revenue projection calculation failed: {str(e)}")
+            return []
+    def generate_summary(
+        self,
+        forecast_data: List[Dict[str, Any]],
+        historical_data: pd.DataFrame,
+        models_used: List[str],
+        scenario: str
+    ) -> str:
+        """Generate AI-like summary of forecast results"""
+        try:
+            # Calculate key metrics
+            forecast_values = [point["predicted_value"] for point in forecast_data]
+            avg_forecast = np.mean(forecast_values)
+            avg_historical = historical_data['price'].mean()
+            trend = "increasing" if avg_forecast > avg_historical else "decreasing"
+            change_percent = abs((avg_forecast - avg_historical) / avg_historical * 100)
+            # Generate summary
+            summary = f"""# Price Forecast Summary
+## Overview
+Based on historical demand data, the forecast shows a **{trend}** trend over the next {len(forecast_data)} days using {scenario} scenario.
+## Key Metrics
+- **Average Historical Price**: ${avg_historical:.2f}
+- **Average Forecasted Price**: ${avg_forecast:.2f}
+- **Expected Change**: {change_percent:.1f}% {trend}
+- **Models Used**: {', '.join(models_used)}
+- **Forecast Horizon**: {len(forecast_data)} days
+## Analysis
+The forecast combines multiple statistical and machine learning models to provide reliable predictions. Confidence intervals are included to help assess prediction uncertainty.
+## Recommendations
+{'Consider increasing inventory to meet potential higher demand.' if trend == 'increasing' else 'Monitor market conditions closely as prices may decline.'}
+Track actual prices against this forecast and adjust strategies accordingly."""
+            return summary
+        except Exception as e:
+            self.logger.error(f"Summary generation failed: {str(e)}")
+            return "Forecast summary generation failed."
+    def calculate_overall_confidence(self, forecast_data: List[Dict[str, Any]]) -> Optional[float]:
+        """Calculate overall confidence score"""
+        try:
+            confidence_scores = []
+            for point in forecast_data:
+                if "confidence_lower" in point and "confidence_upper" in point:
+                    lower = point["confidence_lower"]
+                    upper = point["confidence_upper"]
+                    predicted = point["predicted_value"]
+                    # Calculate confidence interval width relative to prediction
+                    if predicted != 0:
+                        interval_width = (upper - lower) / predicted
+                        # Convert to confidence score (0-100)
+                        confidence = max(0, min(100, 100 - (interval_width * 50)))
+                        confidence_scores.append(confidence)
+            if confidence_scores:
+                return round(np.mean(confidence_scores), 1)
+            return None
+        except Exception as e:
+            self.logger.error(f"Confidence calculation failed: {str(e)}")
+            return None

requirements.txt ADDED Viewed

	@@ -0,0 +1,23 @@

+# Core FastAPI dependencies
+fastapi==0.104.1
+uvicorn[standard]==0.24.0
+pydantic==2.5.0
+# Data processing
+pandas==2.1.4
+numpy==1.26.2
+# Machine Learning & Statistics
+scikit-learn==1.3.2
+statsmodels==0.14.0
+catboost==1.2.2
+joblib==1.3.2
+# Utilities
+python-multipart==0.0.6
+httpx==0.25.2
+requests==2.31.0
+# Optional: For development and testing (can be removed for production)
+pytest==7.4.3
+pytest-asyncio==0.21.1

run.py ADDED Viewed

	@@ -0,0 +1,53 @@

+#!/usr/bin/env python3
+"""
+Development script for AgriPredict Analysis Service
+"""
+import subprocess
+import sys
+import os
+from pathlib import Path
+def install_dependencies():
+    """Install Python dependencies"""
+    print("Installing dependencies...")
+    subprocess.run([sys.executable, "-m", "pip", "install", "-r", "requirements.txt"], check=True)
+def run_service():
+    """Run the FastAPI service"""
+    print("Starting AgriPredict Analysis Service...")
+    print("API will be available at: http://localhost:8000")
+    print("API documentation at: http://localhost:8000/docs")
+    # Set environment variables
+    env = os.environ.copy()
+    env["PYTHONPATH"] = str(Path(__file__).parent)
+    subprocess.run([sys.executable, "main.py"], env=env)
+def train_model():
+    """Train the CatBoost model with artificial data"""
+    print("Training CatBoost model with artificial data...")
+    subprocess.run([sys.executable, "train_catboost.py"], check=True)
+def main():
+    if len(sys.argv) < 2:
+        print("Usage: python run.py [install|run|train]")
+        print("  install - Install dependencies")
+        print("  run     - Run the service")
+        print("  train   - Train the CatBoost model")
+        return
+    command = sys.argv[1].lower()
+    if command == "install":
+        install_dependencies()
+    elif command == "run":
+        run_service()
+    elif command == "train":
+        train_model()
+    else:
+        print(f"Unknown command: {command}")
+if __name__ == "__main__":
+    main()

test_api.py ADDED Viewed

	@@ -0,0 +1,125 @@

+#!/usr/bin/env python3
+"""
+Example script showing how to use the AgriPredict Analysis Service API
+"""
+import requests
+import json
+from datetime import datetime, timedelta
+import random
+def generate_sample_data(days: int = 30):
+    """Generate sample historical data for testing"""
+    data = []
+    base_date = datetime.now() - timedelta(days=days)
+    for i in range(days):
+        date = base_date + timedelta(days=i)
+        # Generate realistic agricultural data
+        quantity = random.randint(50, 150) + random.randint(-20, 20)
+        price = round(20 + random.uniform(-5, 5), 2)
+        data.append({
+            "date": date.strftime("%Y-%m-%d"),
+            "quantity": max(1, quantity),  # Ensure positive quantity
+            "price": max(5, price)  # Ensure positive price
+        })
+    return data
+def test_health_check(base_url: str = "http://localhost:8000"):
+    """Test the health check endpoint"""
+    print("Testing health check...")
+    try:
+        response = requests.get(f"{base_url}/health")
+        if response.status_code == 200:
+            print("✅ Health check passed")
+            print(f"Response: {response.json()}")
+        else:
+            print(f"❌ Health check failed: {response.status_code}")
+    except Exception as e:
+        print(f"❌ Health check error: {e}")
+def test_list_models(base_url: str = "http://localhost:8000"):
+    """Test the list models endpoint"""
+    print("\nTesting list models...")
+    try:
+        response = requests.get(f"{base_url}/models")
+        if response.status_code == 200:
+            print("✅ Models list retrieved")
+            models = response.json()["models"]
+            print(f"Available models: {len(models)}")
+            for model in models:
+                print(f"  - {model['name']} ({model['id']})")
+        else:
+            print(f"❌ Models list failed: {response.status_code}")
+    except Exception as e:
+        print(f"❌ Models list error: {e}")
+def test_forecast_generation(base_url: str = "http://localhost:8000"):
+    """Test forecast generation"""
+    print("\nTesting forecast generation...")
+    # Generate sample data
+    historical_data = generate_sample_data(30)
+    # Prepare forecast request
+    forecast_request = {
+        "product_id": "sample_crop",
+        "historical_data": historical_data,
+        "days": 14,
+        "selling_price": 25.0,
+        "models": ["ensemble"],
+        "include_confidence": True,
+        "scenario": "realistic"
+    }
+    try:
+        response = requests.post(
+            f"{base_url}/forecast",
+            json=forecast_request,
+            headers={"Content-Type": "application/json"}
+        )
+        if response.status_code == 200:
+            print("✅ Forecast generated successfully")
+            result = response.json()
+            print(f"Models used: {result['models_used']}")
+            print(f"Forecast points: {len(result['forecast_data'])}")
+            print(f"Confidence: {result.get('confidence', 'N/A')}%")
+            if result.get('revenue_projection'):
+                print(f"Revenue projections: {len(result['revenue_projection'])}")
+            # Show first few forecast points
+            print("\nFirst 3 forecast points:")
+            for i, point in enumerate(result['forecast_data'][:3]):
+                print(f"  Day {i+1}: {point['predicted_value']:.2f} "
+                      f"(±{point.get('confidence_upper', 0) - point.get('confidence_lower', 0):.2f})")
+        else:
+            print(f"❌ Forecast failed: {response.status_code}")
+            print(f"Error: {response.text}")
+    except Exception as e:
+        print(f"❌ Forecast error: {e}")
+def main():
+    """Main test function"""
+    print("🚀 AgriPredict Analysis Service API Test")
+    print("=" * 50)
+    # Test with local service (change URL for deployed service)
+    base_url = "http://localhost:8000"
+    # Run tests
+    test_health_check(base_url)
+    test_list_models(base_url)
+    test_forecast_generation(base_url)
+    print("\n" + "=" * 50)
+    print("API test completed!")
+if __name__ == "__main__":
+    main()

train_catboost.py ADDED Viewed

	@@ -0,0 +1,316 @@

+"""
+CatBoost Model Training Script for AgriPredict
+This script demonstrates how to train the CatBoost model with artificial agricultural data.
+"""
+import pandas as pd
+import numpy as np
+from datetime import datetime, timedelta
+from catboost import CatBoostRegressor, Pool
+from sklearn.model_selection import train_test_split
+from sklearn.metrics import mean_absolute_error, mean_squared_error
+import joblib
+import os
+from typing import Dict, Any
+import logging
+# Setup logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+class CatBoostTrainer:
+    """CatBoost model trainer for agricultural demand forecasting"""
+    def __init__(self):
+        self.model = None
+        self.feature_names = None
+    def generate_artificial_data(self, n_samples: int = 1000) -> pd.DataFrame:
+        """
+        Generate artificial agricultural data for training
+        Args:
+            n_samples: Number of samples to generate
+        Returns:
+            DataFrame with artificial agricultural data
+        """
+        logger.info(f"Generating {n_samples} artificial data samples")
+        # Generate date range
+        start_date = datetime(2023, 1, 1)
+        dates = [start_date + timedelta(days=i) for i in range(n_samples)]
+        np.random.seed(42)  # For reproducible results
+        data = []
+        for date in dates:
+            # Seasonal patterns
+            day_of_year = date.timetuple().tm_yday
+            seasonal_factor = 1 + 0.3 * np.sin(2 * np.pi * day_of_year / 365)
+            # Base demand with seasonal variation
+            base_quantity = np.random.normal(100, 20) * seasonal_factor
+            # Price influenced by season and demand
+            base_price = 25 + 5 * np.sin(2 * np.pi * day_of_year / 365)
+            price_noise = np.random.normal(0, 2)
+            price = base_price + price_noise
+            # Add some correlation between price and quantity
+            quantity_noise = np.random.normal(0, 15)
+            quantity = base_quantity + quantity_noise - 0.1 * (price - 25)
+            # Ensure positive values
+            quantity = max(1, quantity)
+            price = max(5, price)
+            data.append({
+                'date': date,
+                'quantity': round(quantity, 2),
+                'price': round(price, 2),
+                'day_of_week': date.weekday(),
+                'month': date.month,
+                'day_of_month': date.day,
+                'quarter': (date.month - 1) // 3 + 1,
+                'is_weekend': 1 if date.weekday() >= 5 else 0,
+                'season': self._get_season(date.month)
+            })
+        df = pd.DataFrame(data)
+        # Add lag features
+        for lag in [1, 7, 14, 30]:
+            df[f'price_lag_{lag}'] = df['price'].shift(lag)
+            df[f'quantity_lag_{lag}'] = df['quantity'].shift(lag)
+        # Add rolling statistics
+        for window in [7, 14, 30]:
+            df[f'price_rolling_mean_{window}'] = df['price'].rolling(window).mean()
+            df[f'price_rolling_std_{window}'] = df['price'].rolling(window).std()
+            df[f'quantity_rolling_mean_{window}'] = df['quantity'].rolling(window).mean()
+        # Add price change features
+        df['price_change'] = df['price'].pct_change()
+        df['price_change_7d'] = df['price'].pct_change(7)
+        # Drop rows with NaN values
+        df = df.dropna().reset_index(drop=True)
+        logger.info(f"Generated dataset with {len(df)} samples and {len(df.columns)} features")
+        return df
+    def _get_season(self, month: int) -> str:
+        """Get season based on month"""
+        if month in [12, 1, 2]:
+            return 'winter'
+        elif month in [3, 4, 5]:
+            return 'spring'
+        elif month in [6, 7, 8]:
+            return 'summer'
+        else:
+            return 'fall'
+    def prepare_features(self, df: pd.DataFrame) -> tuple:
+        """
+        Prepare features for training
+        Args:
+            df: Input DataFrame
+        Returns:
+            Tuple of (X, y, feature_names)
+        """
+        # Define feature columns (exclude target and non-feature columns)
+        exclude_cols = ['date', 'quantity', 'price']
+        feature_cols = [col for col in df.columns if col not in exclude_cols]
+        # Prepare features and target
+        X = df[feature_cols]
+        y = df['price']  # We're predicting price
+        logger.info(f"Prepared {len(feature_cols)} features for training")
+        return X, y, feature_cols
+    def train_model(self, X_train, y_train, X_val=None, y_val=None, **kwargs) -> CatBoostRegressor:
+        """
+        Train CatBoost model
+        Args:
+            X_train: Training features
+            y_train: Training target
+            X_val: Validation features (optional)
+            y_val: Validation target (optional)
+            **kwargs: Additional CatBoost parameters
+        Returns:
+            Trained CatBoost model
+        """
+        # Default parameters
+        default_params = {
+            'iterations': 1000,
+            'learning_rate': 0.1,
+            'depth': 6,
+            'loss_function': 'MAE',
+            'eval_metric': 'MAE',
+            'random_seed': 42,
+            'verbose': 100,
+            'early_stopping_rounds': 50
+        }
+        # Update with custom parameters
+        default_params.update(kwargs)
+        # Create model
+        model = CatBoostRegressor(**default_params)
+        # Prepare data
+        train_pool = Pool(X_train, y_train)
+        if X_val is not None and y_val is not None:
+            val_pool = Pool(X_val, y_val)
+            model.fit(train_pool, eval_set=val_pool)
+        else:
+            model.fit(train_pool)
+        self.model = model
+        self.feature_names = list(X_train.columns)
+        logger.info(f"Trained CatBoost model with {model.tree_count_} trees")
+        return model
+    def evaluate_model(self, X_test, y_test) -> Dict[str, float]:
+        """
+        Evaluate model performance
+        Args:
+            X_test: Test features
+            y_test: Test target
+        Returns:
+            Dictionary with evaluation metrics
+        """
+        if self.model is None:
+            raise ValueError("Model not trained yet")
+        # Make predictions
+        y_pred = self.model.predict(X_test)
+        # Calculate metrics
+        mae = mean_absolute_error(y_test, y_pred)
+        mse = mean_squared_error(y_test, y_pred)
+        rmse = np.sqrt(mse)
+        # Calculate MAPE (Mean Absolute Percentage Error)
+        mape = np.mean(np.abs((y_test - y_pred) / y_test)) * 100
+        metrics = {
+            'mae': mae,
+            'mse': mse,
+            'rmse': rmse,
+            'mape': mape
+        }
+        logger.info(".2f")
+        return metrics
+    def save_model(self, filepath: str):
+        """
+        Save trained model to file
+        Args:
+            filepath: Path to save the model
+        """
+        if self.model is None:
+            raise ValueError("Model not trained yet")
+        # Create directory if it doesn't exist
+        os.makedirs(os.path.dirname(filepath), exist_ok=True)
+        # Save model
+        joblib.dump({
+            'model': self.model,
+            'feature_names': self.feature_names,
+            'training_date': datetime.now().isoformat()
+        }, filepath)
+        logger.info(f"Model saved to {filepath}")
+    def load_model(self, filepath: str):
+        """
+        Load trained model from file
+        Args:
+            filepath: Path to the saved model
+        """
+        if not os.path.exists(filepath):
+            raise FileNotFoundError(f"Model file not found: {filepath}")
+        # Load model
+        model_data = joblib.load(filepath)
+        self.model = model_data['model']
+        self.feature_names = model_data['feature_names']
+        logger.info(f"Model loaded from {filepath}")
+    def predict(self, features: pd.DataFrame) -> np.ndarray:
+        """
+        Make predictions with trained model
+        Args:
+            features: Input features
+        Returns:
+            Predictions array
+        """
+        if self.model is None:
+            raise ValueError("Model not trained or loaded yet")
+        # Ensure features are in correct order
+        if self.feature_names:
+            features = features[self.feature_names]
+        return self.model.predict(features)
+def main():
+    """Main training function"""
+    logger.info("Starting CatBoost model training")
+    # Initialize trainer
+    trainer = CatBoostTrainer()
+    # Generate artificial data
+    df = trainer.generate_artificial_data(n_samples=2000)
+    # Prepare features
+    X, y, feature_names = trainer.prepare_features(df)
+    # Split data
+    X_train, X_test, y_train, y_test = train_test_split(
+        X, y, test_size=0.2, random_state=42
+    )
+    # Further split training data for validation
+    X_train, X_val, y_train, y_val = train_test_split(
+        X_train, y_train, test_size=0.2, random_state=42
+    )
+    # Train model
+    model = trainer.train_model(X_train, y_train, X_val, y_val)
+    # Evaluate model
+    metrics = trainer.evaluate_model(X_test, y_test)
+    # Save model
+    model_path = "models/catboost_model.pkl"
+    trainer.save_model(model_path)
+    logger.info("Training completed successfully!")
+    logger.info(f"Model saved to: {model_path}")
+    logger.info(f"Test Metrics: {metrics}")
+    return trainer
+if __name__ == "__main__":
+    trained_trainer = main()

utils/config.py ADDED Viewed

	@@ -0,0 +1,45 @@

+"""
+Configuration settings for AgriPredict Analysis Service
+"""
+import os
+from typing import List
+class Settings:
+    """Application settings"""
+    # API Settings
+    API_HOST: str = os.getenv("API_HOST", "0.0.0.0")
+    API_PORT: int = int(os.getenv("PORT", 8000))
+    API_WORKERS: int = int(os.getenv("API_WORKERS", 1))
+    # CORS Settings
+    ALLOWED_ORIGINS: List[str] = [
+        "http://localhost:3000",
+        "http://localhost:3001",
+        "https://*.huggingface.co",
+        "https://huggingface.co",
+        os.getenv("FRONTEND_URL", "*")
+    ]
+    # Model Settings
+    DEFAULT_MODELS: List[str] = ["ensemble"]
+    MAX_FORECAST_DAYS: int = 365
+    MIN_HISTORICAL_DATA_POINTS: int = 3
+    # CatBoost Settings (for future training)
+    CATBOOST_ITERATIONS: int = 100
+    CATBOOST_LEARNING_RATE: float = 0.1
+    CATBOOST_DEPTH: int = 6
+    CATBOOST_VERBOSE: bool = False
+    # Logging
+    LOG_LEVEL: str = os.getenv("LOG_LEVEL", "INFO")
+    LOG_FORMAT: str = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
+    # Data Processing
+    DATE_FORMAT: str = "%Y-%m-%d"
+    MAX_DATA_POINTS: int = 10000
+# Global settings instance
+settings = Settings()

utils/logger.py ADDED Viewed

	@@ -0,0 +1,31 @@

+"""
+Logging configuration for AgriPredict Analysis Service
+"""
+import logging
+import sys
+from utils.config import settings
+def setup_logger(name: str) -> logging.Logger:
+    """Setup logger with proper configuration"""
+    logger = logging.getLogger(name)
+    logger.setLevel(getattr(logging, settings.LOG_LEVEL))
+    # Remove existing handlers to avoid duplicates
+    logger.handlers.clear()
+    # Create console handler
+    console_handler = logging.StreamHandler(sys.stdout)
+    console_handler.setLevel(getattr(logging, settings.LOG_LEVEL))
+    # Create formatter
+    formatter = logging.Formatter(settings.LOG_FORMAT)
+    console_handler.setFormatter(formatter)
+    # Add handler to logger
+    logger.addHandler(console_handler)
+    return logger
+# Global logger instance
+logger = setup_logger(__name__)