Spaces:
Sleeping
Sleeping
Fix: Ensure models train during HF build + Fix RAG pipeline loading
Browse files- Created app.py entry point that trains models if they don't exist
- Updated README.md to use new app_file path
- Fixed RAG pipeline import paths for HF compatibility
- Added build.py and start_dashboard.py helper scripts
- Added comprehensive deployment documentation
This fixes the 'Models Not Loaded' and 'AI Q&A Inactive' errors on Hugging Face Spaces.
- DEPLOYMENT.md +99 -0
- FIX_SUMMARY.md +169 -0
- README.md +1 -1
- fmcg_genai/app.py +67 -0
- fmcg_genai/build.py +124 -0
- fmcg_genai/src/dashboard_app.py +15 -1
- fmcg_genai/start_dashboard.py +110 -0
DEPLOYMENT.md
ADDED
|
@@ -0,0 +1,99 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Hugging Face Spaces Deployment Guide
|
| 2 |
+
|
| 3 |
+
## Current Issue Diagnosis
|
| 4 |
+
|
| 5 |
+
Based on the error screenshot showing "Models: β Not Loaded" and "AI Q&A: β Inactive", the issue is that **models are not being trained during the Hugging Face build process**.
|
| 6 |
+
|
| 7 |
+
## Solution
|
| 8 |
+
|
| 9 |
+
### 1. **Updated Files**
|
| 10 |
+
The following files have been updated to fix the deployment:
|
| 11 |
+
|
| 12 |
+
- **`README.md`**: Changed `app_file` from `fmcg_genai/src/dashboard_app_enhanced.py` to `fmcg_genai/app.py`
|
| 13 |
+
- **`fmcg_genai/app.py`**: New entry point that ensures models are trained before launching dashboard
|
| 14 |
+
- **`fmcg_genai/src/dashboard_app.py`**: Fixed RAG pipeline import paths for better compatibility
|
| 15 |
+
|
| 16 |
+
### 2. **How It Works**
|
| 17 |
+
|
| 18 |
+
The new `app.py` entry point:
|
| 19 |
+
1. Checks if models exist (`prophet.pkl`, `xgboost_sales.pkl`)
|
| 20 |
+
2. Checks if vector store exists (`faiss_index.bin`)
|
| 21 |
+
3. If missing, runs `run_pipeline.py` to train models (takes ~5-10 minutes on first build)
|
| 22 |
+
4. Then launches the dashboard
|
| 23 |
+
|
| 24 |
+
### 3. **Deployment Steps**
|
| 25 |
+
|
| 26 |
+
#### Option A: Push to Hugging Face (Recommended)
|
| 27 |
+
```bash
|
| 28 |
+
# From the project root
|
| 29 |
+
cd c:\Users\91880\Downloads\archive\fmcg_demand_forecasting
|
| 30 |
+
|
| 31 |
+
# Add all changes
|
| 32 |
+
git add .
|
| 33 |
+
|
| 34 |
+
# Commit with a clear message
|
| 35 |
+
git commit -m "Fix: Ensure models are trained during HF build process"
|
| 36 |
+
|
| 37 |
+
# Push to Hugging Face
|
| 38 |
+
git push
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
#### Option B: Manual Rebuild on Hugging Face
|
| 42 |
+
1. Go to your Hugging Face Space settings
|
| 43 |
+
2. Click "Factory reboot" to trigger a fresh build
|
| 44 |
+
3. The new `app.py` will run and train models automatically
|
| 45 |
+
|
| 46 |
+
### 4. **Expected Build Time**
|
| 47 |
+
|
| 48 |
+
- **First build**: ~10-15 minutes (includes model training)
|
| 49 |
+
- **Subsequent builds**: ~2-3 minutes (models are cached)
|
| 50 |
+
|
| 51 |
+
### 5. **Verification**
|
| 52 |
+
|
| 53 |
+
After deployment, you should see:
|
| 54 |
+
- β
Models: Loaded
|
| 55 |
+
- β
AI Q&A: Active
|
| 56 |
+
- Dashboard loads without "Could not load data" error
|
| 57 |
+
|
| 58 |
+
### 6. **Troubleshooting**
|
| 59 |
+
|
| 60 |
+
If the build fails:
|
| 61 |
+
|
| 62 |
+
1. **Check build logs** on Hugging Face Spaces
|
| 63 |
+
2. **Common issues**:
|
| 64 |
+
- Out of memory: Reduce batch size in `config.yaml`
|
| 65 |
+
- Timeout: Models take too long to train (HF has 1-hour build limit)
|
| 66 |
+
- Missing dependencies: Check `requirements.txt`
|
| 67 |
+
|
| 68 |
+
3. **Quick fix**: If build times out, you can:
|
| 69 |
+
- Train models locally
|
| 70 |
+
- Upload trained models to Hugging Face using Git LFS
|
| 71 |
+
- Skip training in `app.py`
|
| 72 |
+
|
| 73 |
+
### 7. **Git LFS Setup (If Needed)**
|
| 74 |
+
|
| 75 |
+
If you want to commit trained models instead of training during build:
|
| 76 |
+
|
| 77 |
+
```bash
|
| 78 |
+
# Install Git LFS
|
| 79 |
+
git lfs install
|
| 80 |
+
|
| 81 |
+
# Track large model files
|
| 82 |
+
git lfs track "fmcg_genai/models/*.pkl"
|
| 83 |
+
git lfs track "fmcg_genai/vector_store/*.bin"
|
| 84 |
+
git lfs track "fmcg_genai/vector_store/*.pkl"
|
| 85 |
+
|
| 86 |
+
# Add .gitattributes
|
| 87 |
+
git add .gitattributes
|
| 88 |
+
|
| 89 |
+
# Commit and push
|
| 90 |
+
git add fmcg_genai/models/* fmcg_genai/vector_store/*
|
| 91 |
+
git commit -m "Add pre-trained models via Git LFS"
|
| 92 |
+
git push
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
Then modify `app.py` to skip training if models exist.
|
| 96 |
+
|
| 97 |
+
## Summary
|
| 98 |
+
|
| 99 |
+
The main fix is the new `app.py` entry point that ensures models are trained during the Hugging Face build process. Push the changes and rebuild your Space to fix the issue.
|
FIX_SUMMARY.md
ADDED
|
@@ -0,0 +1,169 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# FMCG Dashboard - Issue Diagnosis & Fix Summary
|
| 2 |
+
|
| 3 |
+
## π **What Was Wrong**
|
| 4 |
+
|
| 5 |
+
### Issue 1: Models Not Training During Build
|
| 6 |
+
**Problem**: The Hugging Face Space was trying to load pre-trained models that don't exist in the repository.
|
| 7 |
+
|
| 8 |
+
**Root Cause**:
|
| 9 |
+
- The `app_file` in README.md pointed directly to `dashboard_app_enhanced.py`
|
| 10 |
+
- This file expects models to already exist
|
| 11 |
+
- Models were never trained during the HF build process
|
| 12 |
+
- Result: "Models: β Not Loaded" error
|
| 13 |
+
|
| 14 |
+
### Issue 2: RAG Pipeline Not Loading
|
| 15 |
+
**Problem**: Even though vector store files exist locally, they weren't being created on HF deployment.
|
| 16 |
+
|
| 17 |
+
**Root Cause**:
|
| 18 |
+
- Vector store is generated by `run_pipeline.py`
|
| 19 |
+
- This script was never executed during HF build
|
| 20 |
+
- Import path issues in `dashboard_app.py` (hardcoded `from src.rag_pipeline`)
|
| 21 |
+
- Result: "AI Q&A: β Inactive" error
|
| 22 |
+
|
| 23 |
+
### Issue 3: Data Loading Error
|
| 24 |
+
**Problem**: "Could not load data. Please ensure data preprocessing has been completed."
|
| 25 |
+
|
| 26 |
+
**Root Cause**:
|
| 27 |
+
- Processed data files are generated by `run_pipeline.py`
|
| 28 |
+
- Without running the pipeline, `data/processed/cleaned.csv` doesn't exist on HF
|
| 29 |
+
- Dashboard can't load non-existent files
|
| 30 |
+
|
| 31 |
+
## β
**What Was Fixed**
|
| 32 |
+
|
| 33 |
+
### Fix 1: Created `app.py` Entry Point
|
| 34 |
+
**File**: `fmcg_genai/app.py`
|
| 35 |
+
|
| 36 |
+
**What it does**:
|
| 37 |
+
```python
|
| 38 |
+
1. Checks if models exist (prophet.pkl, xgboost_sales.pkl)
|
| 39 |
+
2. Checks if vector store exists (faiss_index.bin)
|
| 40 |
+
3. If missing β runs run_pipeline.py to train everything
|
| 41 |
+
4. Then launches the dashboard
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
**Impact**: Models are now trained automatically during HF build
|
| 45 |
+
|
| 46 |
+
### Fix 2: Updated README.md
|
| 47 |
+
**Change**: `app_file: fmcg_genai/src/dashboard_app_enhanced.py` β `app_file: fmcg_genai/app.py`
|
| 48 |
+
|
| 49 |
+
**Impact**: HF Spaces now uses the new entry point that handles model training
|
| 50 |
+
|
| 51 |
+
### Fix 3: Fixed RAG Pipeline Import
|
| 52 |
+
**File**: `fmcg_genai/src/dashboard_app.py`
|
| 53 |
+
|
| 54 |
+
**Changes**:
|
| 55 |
+
- Added fallback import paths for RAG pipeline
|
| 56 |
+
- Better error logging
|
| 57 |
+
- Handles both local and HF deployment paths
|
| 58 |
+
|
| 59 |
+
**Impact**: RAG pipeline loads correctly regardless of deployment environment
|
| 60 |
+
|
| 61 |
+
### Fix 4: Created Helper Scripts
|
| 62 |
+
**Files**:
|
| 63 |
+
- `build.py`: For local builds, checks if models need training
|
| 64 |
+
- `start_dashboard.py`: For local testing, validates environment
|
| 65 |
+
- `DEPLOYMENT.md`: Comprehensive deployment guide
|
| 66 |
+
|
| 67 |
+
## π **Deployment Checklist**
|
| 68 |
+
|
| 69 |
+
- [x] Created `app.py` entry point with model training logic
|
| 70 |
+
- [x] Updated `README.md` to use new app_file
|
| 71 |
+
- [x] Fixed RAG pipeline import paths
|
| 72 |
+
- [x] Added comprehensive error logging
|
| 73 |
+
- [x] Created deployment documentation
|
| 74 |
+
|
| 75 |
+
## π **Next Steps**
|
| 76 |
+
|
| 77 |
+
### To Deploy to Hugging Face:
|
| 78 |
+
|
| 79 |
+
```bash
|
| 80 |
+
# 1. Navigate to project root
|
| 81 |
+
cd c:\Users\91880\Downloads\archive\fmcg_demand_forecasting
|
| 82 |
+
|
| 83 |
+
# 2. Stage all changes
|
| 84 |
+
git add .
|
| 85 |
+
|
| 86 |
+
# 3. Commit with descriptive message
|
| 87 |
+
git commit -m "Fix: Ensure models train during HF build + Fix RAG pipeline loading"
|
| 88 |
+
|
| 89 |
+
# 4. Push to Hugging Face
|
| 90 |
+
git push
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
### Expected Outcome:
|
| 94 |
+
|
| 95 |
+
After pushing and HF rebuilds:
|
| 96 |
+
1. β
Build takes ~10-15 minutes (first time)
|
| 97 |
+
2. β
Models are trained automatically
|
| 98 |
+
3. β
Vector store is created
|
| 99 |
+
4. β
Dashboard shows "Models: β
Loaded"
|
| 100 |
+
5. β
Dashboard shows "AI Q&A: β
Active"
|
| 101 |
+
6. β
No "Could not load data" error
|
| 102 |
+
|
| 103 |
+
## π§ **Technical Details**
|
| 104 |
+
|
| 105 |
+
### Build Process Flow (New):
|
| 106 |
+
```
|
| 107 |
+
HF Starts Build
|
| 108 |
+
β
|
| 109 |
+
Runs app.py
|
| 110 |
+
β
|
| 111 |
+
Checks if models exist
|
| 112 |
+
β (No)
|
| 113 |
+
Runs run_pipeline.py
|
| 114 |
+
β
|
| 115 |
+
1. Data Preprocessing
|
| 116 |
+
2. Feature Engineering
|
| 117 |
+
3. Model Training (Prophet + XGBoost)
|
| 118 |
+
4. Model Evaluation
|
| 119 |
+
5. SHAP Explainability
|
| 120 |
+
6. RAG Pipeline Setup
|
| 121 |
+
β
|
| 122 |
+
Models + Vector Store Created
|
| 123 |
+
β
|
| 124 |
+
Launches dashboard_app.py
|
| 125 |
+
β
|
| 126 |
+
β
Dashboard Ready
|
| 127 |
+
```
|
| 128 |
+
|
| 129 |
+
### Build Process Flow (Old - BROKEN):
|
| 130 |
+
```
|
| 131 |
+
HF Starts Build
|
| 132 |
+
β
|
| 133 |
+
Runs dashboard_app_enhanced.py directly
|
| 134 |
+
β
|
| 135 |
+
Tries to load models
|
| 136 |
+
β (Not found)
|
| 137 |
+
β Models: Not Loaded
|
| 138 |
+
β AI Q&A: Inactive
|
| 139 |
+
β Could not load data
|
| 140 |
+
```
|
| 141 |
+
|
| 142 |
+
## π **File Changes Summary**
|
| 143 |
+
|
| 144 |
+
| File | Change Type | Purpose |
|
| 145 |
+
|------|-------------|---------|
|
| 146 |
+
| `README.md` | Modified | Updated app_file path |
|
| 147 |
+
| `fmcg_genai/app.py` | Created | New entry point with model training |
|
| 148 |
+
| `fmcg_genai/src/dashboard_app.py` | Modified | Fixed RAG import paths |
|
| 149 |
+
| `fmcg_genai/build.py` | Created | Local build helper |
|
| 150 |
+
| `fmcg_genai/start_dashboard.py` | Created | Local startup validator |
|
| 151 |
+
| `DEPLOYMENT.md` | Created | Deployment guide |
|
| 152 |
+
|
| 153 |
+
## β οΈ **Important Notes**
|
| 154 |
+
|
| 155 |
+
1. **First build will be slow**: Training models takes time (~10-15 min)
|
| 156 |
+
2. **Subsequent builds are fast**: Models are cached
|
| 157 |
+
3. **Memory requirements**: Ensure HF Space has enough RAM (recommend 16GB tier)
|
| 158 |
+
4. **Alternative approach**: Use Git LFS to commit pre-trained models (see DEPLOYMENT.md)
|
| 159 |
+
|
| 160 |
+
## π― **Success Criteria**
|
| 161 |
+
|
| 162 |
+
The deployment is successful when:
|
| 163 |
+
- [ ] HF build completes without errors
|
| 164 |
+
- [ ] Dashboard loads without "Could not load data" error
|
| 165 |
+
- [ ] System Status shows "Models: β
Loaded"
|
| 166 |
+
- [ ] System Status shows "AI Q&A: β
Active"
|
| 167 |
+
- [ ] Forecasting tab works and shows predictions
|
| 168 |
+
- [ ] AI Q&A Portal responds to queries
|
| 169 |
+
- [ ] All visualizations render correctly
|
README.md
CHANGED
|
@@ -5,7 +5,7 @@ colorFrom: blue
|
|
| 5 |
colorTo: purple
|
| 6 |
sdk: streamlit
|
| 7 |
sdk_version: "1.25.0"
|
| 8 |
-
app_file: fmcg_genai/
|
| 9 |
pinned: false
|
| 10 |
license: mit
|
| 11 |
python_version: "3.10"
|
|
|
|
| 5 |
colorTo: purple
|
| 6 |
sdk: streamlit
|
| 7 |
sdk_version: "1.25.0"
|
| 8 |
+
app_file: fmcg_genai/app.py
|
| 9 |
pinned: false
|
| 10 |
license: mit
|
| 11 |
python_version: "3.10"
|
fmcg_genai/app.py
ADDED
|
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Hugging Face Spaces Entry Point
|
| 3 |
+
This script ensures models are trained before launching the dashboard
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import os
|
| 7 |
+
import sys
|
| 8 |
+
from pathlib import Path
|
| 9 |
+
import subprocess
|
| 10 |
+
import logging
|
| 11 |
+
|
| 12 |
+
# Setup logging
|
| 13 |
+
logging.basicConfig(level=logging.INFO)
|
| 14 |
+
logger = logging.getLogger(__name__)
|
| 15 |
+
|
| 16 |
+
def ensure_models_trained():
|
| 17 |
+
"""Ensure models and vector store are created"""
|
| 18 |
+
project_root = Path(__file__).parent
|
| 19 |
+
models_dir = project_root / "models"
|
| 20 |
+
vector_store_dir = project_root / "vector_store"
|
| 21 |
+
|
| 22 |
+
# Check if models exist
|
| 23 |
+
prophet_exists = (models_dir / "prophet.pkl").exists()
|
| 24 |
+
xgboost_exists = (models_dir / "xgboost_sales.pkl").exists()
|
| 25 |
+
vector_store_exists = (vector_store_dir / "faiss_index.bin").exists()
|
| 26 |
+
|
| 27 |
+
if prophet_exists and xgboost_exists and vector_store_exists:
|
| 28 |
+
logger.info("Models and vector store already exist. Skipping training.")
|
| 29 |
+
return True
|
| 30 |
+
|
| 31 |
+
logger.info("Models not found. Running pipeline to train models...")
|
| 32 |
+
logger.info("This will take several minutes on first deployment...")
|
| 33 |
+
|
| 34 |
+
try:
|
| 35 |
+
# Run the pipeline
|
| 36 |
+
result = subprocess.run(
|
| 37 |
+
[sys.executable, "run_pipeline.py"],
|
| 38 |
+
cwd=project_root,
|
| 39 |
+
check=True,
|
| 40 |
+
capture_output=True,
|
| 41 |
+
text=True
|
| 42 |
+
)
|
| 43 |
+
logger.info("Pipeline completed successfully!")
|
| 44 |
+
logger.info(result.stdout)
|
| 45 |
+
return True
|
| 46 |
+
except subprocess.CalledProcessError as e:
|
| 47 |
+
logger.error(f"Pipeline failed: {e}")
|
| 48 |
+
logger.error(f"STDOUT: {e.stdout}")
|
| 49 |
+
logger.error(f"STDERR: {e.stderr}")
|
| 50 |
+
return False
|
| 51 |
+
except Exception as e:
|
| 52 |
+
logger.error(f"Unexpected error: {e}")
|
| 53 |
+
return False
|
| 54 |
+
|
| 55 |
+
if __name__ == "__main__":
|
| 56 |
+
logger.info("Starting FMCG Analytics Dashboard...")
|
| 57 |
+
|
| 58 |
+
# Ensure models are trained
|
| 59 |
+
if not ensure_models_trained():
|
| 60 |
+
logger.error("Failed to train models. Dashboard may not work correctly.")
|
| 61 |
+
|
| 62 |
+
# Import and run the dashboard
|
| 63 |
+
logger.info("Launching dashboard...")
|
| 64 |
+
sys.path.insert(0, str(Path(__file__).parent / "src"))
|
| 65 |
+
|
| 66 |
+
from dashboard_app import main
|
| 67 |
+
main()
|
fmcg_genai/build.py
ADDED
|
@@ -0,0 +1,124 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Build script for FMCG Dashboard
|
| 3 |
+
Ensures models are trained and all components are ready before deployment
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import os
|
| 7 |
+
import sys
|
| 8 |
+
from pathlib import Path
|
| 9 |
+
import logging
|
| 10 |
+
import subprocess
|
| 11 |
+
|
| 12 |
+
# Setup logging
|
| 13 |
+
logging.basicConfig(
|
| 14 |
+
level=logging.INFO,
|
| 15 |
+
format='%(asctime)s - %(levelname)s - %(message)s'
|
| 16 |
+
)
|
| 17 |
+
logger = logging.getLogger(__name__)
|
| 18 |
+
|
| 19 |
+
def check_files_exist():
|
| 20 |
+
"""Check if required files exist"""
|
| 21 |
+
project_root = Path(__file__).parent
|
| 22 |
+
|
| 23 |
+
# Check data
|
| 24 |
+
data_file = project_root / "data" / "raw" / "FMCG_2022_2024.csv"
|
| 25 |
+
if not data_file.exists():
|
| 26 |
+
logger.error(f"β Raw data file not found: {data_file}")
|
| 27 |
+
return False
|
| 28 |
+
|
| 29 |
+
logger.info(f"β
Raw data file found: {data_file}")
|
| 30 |
+
return True
|
| 31 |
+
|
| 32 |
+
def check_models_trained():
|
| 33 |
+
"""Check if models are already trained"""
|
| 34 |
+
project_root = Path(__file__).parent
|
| 35 |
+
models_dir = project_root / "models"
|
| 36 |
+
|
| 37 |
+
required_models = ["prophet.pkl", "xgboost_sales.pkl"]
|
| 38 |
+
all_exist = all((models_dir / model).exists() for model in required_models)
|
| 39 |
+
|
| 40 |
+
if all_exist:
|
| 41 |
+
logger.info("β
All models already trained")
|
| 42 |
+
return True
|
| 43 |
+
else:
|
| 44 |
+
logger.warning("β οΈ Models not found or incomplete")
|
| 45 |
+
return False
|
| 46 |
+
|
| 47 |
+
def check_vector_store():
|
| 48 |
+
"""Check if vector store exists"""
|
| 49 |
+
project_root = Path(__file__).parent
|
| 50 |
+
vector_store_dir = project_root / "vector_store"
|
| 51 |
+
|
| 52 |
+
required_files = ["faiss_index.bin", "documents.pkl", "embeddings.pkl"]
|
| 53 |
+
all_exist = all((vector_store_dir / file).exists() for file in required_files)
|
| 54 |
+
|
| 55 |
+
if all_exist:
|
| 56 |
+
logger.info("β
Vector store already exists")
|
| 57 |
+
return True
|
| 58 |
+
else:
|
| 59 |
+
logger.warning("β οΈ Vector store not found or incomplete")
|
| 60 |
+
return False
|
| 61 |
+
|
| 62 |
+
def run_pipeline():
|
| 63 |
+
"""Run the full pipeline to train models and create vector store"""
|
| 64 |
+
logger.info("=" * 80)
|
| 65 |
+
logger.info("Running FMCG Pipeline - This may take several minutes...")
|
| 66 |
+
logger.info("=" * 80)
|
| 67 |
+
|
| 68 |
+
try:
|
| 69 |
+
result = subprocess.run(
|
| 70 |
+
[sys.executable, "run_pipeline.py"],
|
| 71 |
+
check=True,
|
| 72 |
+
capture_output=False,
|
| 73 |
+
text=True
|
| 74 |
+
)
|
| 75 |
+
logger.info("β
Pipeline completed successfully")
|
| 76 |
+
return True
|
| 77 |
+
except subprocess.CalledProcessError as e:
|
| 78 |
+
logger.error(f"β Pipeline failed with error: {e}")
|
| 79 |
+
return False
|
| 80 |
+
except Exception as e:
|
| 81 |
+
logger.error(f"β Unexpected error running pipeline: {e}")
|
| 82 |
+
return False
|
| 83 |
+
|
| 84 |
+
def main():
|
| 85 |
+
"""Main build function"""
|
| 86 |
+
print("\n" + "=" * 80)
|
| 87 |
+
print("FMCG Analytics Dashboard - Build Script")
|
| 88 |
+
print("=" * 80 + "\n")
|
| 89 |
+
|
| 90 |
+
# Check if data exists
|
| 91 |
+
if not check_files_exist():
|
| 92 |
+
print("\nβ BUILD FAILED: Required data files not found")
|
| 93 |
+
sys.exit(1)
|
| 94 |
+
|
| 95 |
+
# Check if models are trained
|
| 96 |
+
models_exist = check_models_trained()
|
| 97 |
+
vector_store_exists = check_vector_store()
|
| 98 |
+
|
| 99 |
+
if models_exist and vector_store_exists:
|
| 100 |
+
print("\nβ
All components already built!")
|
| 101 |
+
print("\nπ‘ To rebuild from scratch, delete the 'models' and 'vector_store' directories")
|
| 102 |
+
print(" Then run this script again.")
|
| 103 |
+
return
|
| 104 |
+
|
| 105 |
+
# Need to run pipeline
|
| 106 |
+
print("\nπ Building models and vector store...")
|
| 107 |
+
print(" This will take several minutes. Please wait...\n")
|
| 108 |
+
|
| 109 |
+
if not run_pipeline():
|
| 110 |
+
print("\nβ BUILD FAILED: Pipeline execution failed")
|
| 111 |
+
print("\nπ‘ Check the logs above for error details")
|
| 112 |
+
sys.exit(1)
|
| 113 |
+
|
| 114 |
+
print("\n" + "=" * 80)
|
| 115 |
+
print("β
BUILD SUCCESSFUL!")
|
| 116 |
+
print("=" * 80)
|
| 117 |
+
print("\nYou can now start the dashboard with:")
|
| 118 |
+
print(" python start_dashboard.py")
|
| 119 |
+
print(" OR")
|
| 120 |
+
print(" streamlit run src/dashboard_app.py")
|
| 121 |
+
print()
|
| 122 |
+
|
| 123 |
+
if __name__ == "__main__":
|
| 124 |
+
main()
|
fmcg_genai/src/dashboard_app.py
CHANGED
|
@@ -106,7 +106,19 @@ def get_models(config):
|
|
| 106 |
def get_rag_pipeline():
|
| 107 |
"""Load and setup RAG pipeline with caching"""
|
| 108 |
try:
|
| 109 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 110 |
|
| 111 |
config_path = project_root / "config.yaml"
|
| 112 |
if not config_path.exists():
|
|
@@ -119,8 +131,10 @@ def get_rag_pipeline():
|
|
| 119 |
vector_store_path = project_root / "vector_store" / "faiss_index.bin"
|
| 120 |
if not vector_store_path.exists():
|
| 121 |
logger.warning(f"Vector store not found at {vector_store_path}")
|
|
|
|
| 122 |
return None
|
| 123 |
|
|
|
|
| 124 |
if rag_pipeline.load_vector_store():
|
| 125 |
logger.info("RAG pipeline loaded successfully")
|
| 126 |
return rag_pipeline
|
|
|
|
| 106 |
def get_rag_pipeline():
|
| 107 |
"""Load and setup RAG pipeline with caching"""
|
| 108 |
try:
|
| 109 |
+
# Try different import paths
|
| 110 |
+
try:
|
| 111 |
+
from src.rag_pipeline import FMCGRAGPipeline
|
| 112 |
+
except ImportError:
|
| 113 |
+
try:
|
| 114 |
+
from rag_pipeline import FMCGRAGPipeline
|
| 115 |
+
except ImportError:
|
| 116 |
+
# Add src to path and try again
|
| 117 |
+
import sys
|
| 118 |
+
src_path = project_root / "src"
|
| 119 |
+
if str(src_path) not in sys.path:
|
| 120 |
+
sys.path.insert(0, str(src_path))
|
| 121 |
+
from rag_pipeline import FMCGRAGPipeline
|
| 122 |
|
| 123 |
config_path = project_root / "config.yaml"
|
| 124 |
if not config_path.exists():
|
|
|
|
| 131 |
vector_store_path = project_root / "vector_store" / "faiss_index.bin"
|
| 132 |
if not vector_store_path.exists():
|
| 133 |
logger.warning(f"Vector store not found at {vector_store_path}")
|
| 134 |
+
logger.warning("Run 'python run_pipeline.py' to create the vector store")
|
| 135 |
return None
|
| 136 |
|
| 137 |
+
logger.info(f"Loading vector store from {vector_store_path}")
|
| 138 |
if rag_pipeline.load_vector_store():
|
| 139 |
logger.info("RAG pipeline loaded successfully")
|
| 140 |
return rag_pipeline
|
fmcg_genai/start_dashboard.py
ADDED
|
@@ -0,0 +1,110 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Startup script for FMCG Dashboard
|
| 3 |
+
Ensures all components are properly initialized before launching
|
| 4 |
+
"""
|
| 5 |
+
|
| 6 |
+
import os
|
| 7 |
+
import sys
|
| 8 |
+
from pathlib import Path
|
| 9 |
+
import logging
|
| 10 |
+
|
| 11 |
+
# Setup logging
|
| 12 |
+
logging.basicConfig(
|
| 13 |
+
level=logging.INFO,
|
| 14 |
+
format='%(asctime)s - %(levelname)s - %(message)s'
|
| 15 |
+
)
|
| 16 |
+
logger = logging.getLogger(__name__)
|
| 17 |
+
|
| 18 |
+
def check_environment():
|
| 19 |
+
"""Check if all required files and directories exist"""
|
| 20 |
+
logger.info("Checking environment...")
|
| 21 |
+
|
| 22 |
+
project_root = Path(__file__).parent
|
| 23 |
+
issues = []
|
| 24 |
+
|
| 25 |
+
# Check config
|
| 26 |
+
config_path = project_root / "config.yaml"
|
| 27 |
+
if not config_path.exists():
|
| 28 |
+
issues.append(f"β Config file not found: {config_path}")
|
| 29 |
+
else:
|
| 30 |
+
logger.info(f"β
Config file found: {config_path}")
|
| 31 |
+
|
| 32 |
+
# Check data
|
| 33 |
+
data_dir = project_root / "data" / "processed"
|
| 34 |
+
required_files = ["cleaned.csv", "test_features.csv"]
|
| 35 |
+
for file in required_files:
|
| 36 |
+
file_path = data_dir / file
|
| 37 |
+
if not file_path.exists():
|
| 38 |
+
issues.append(f"β Data file not found: {file_path}")
|
| 39 |
+
else:
|
| 40 |
+
logger.info(f"β
Data file found: {file_path}")
|
| 41 |
+
|
| 42 |
+
# Check models
|
| 43 |
+
models_dir = project_root / "models"
|
| 44 |
+
model_files = ["prophet.pkl", "xgboost_sales.pkl"]
|
| 45 |
+
for file in model_files:
|
| 46 |
+
file_path = models_dir / file
|
| 47 |
+
if not file_path.exists():
|
| 48 |
+
issues.append(f"β Model file not found: {file_path}")
|
| 49 |
+
else:
|
| 50 |
+
logger.info(f"β
Model file found: {file_path}")
|
| 51 |
+
|
| 52 |
+
# Check vector store
|
| 53 |
+
vector_store_dir = project_root / "vector_store"
|
| 54 |
+
vector_files = ["faiss_index.bin", "documents.pkl", "embeddings.pkl"]
|
| 55 |
+
for file in vector_files:
|
| 56 |
+
file_path = vector_store_dir / file
|
| 57 |
+
if not file_path.exists():
|
| 58 |
+
issues.append(f"β Vector store file not found: {file_path}")
|
| 59 |
+
else:
|
| 60 |
+
logger.info(f"β
Vector store file found: {file_path}")
|
| 61 |
+
|
| 62 |
+
# Check dashboard app
|
| 63 |
+
dashboard_path = project_root / "src" / "dashboard_app.py"
|
| 64 |
+
if not dashboard_path.exists():
|
| 65 |
+
issues.append(f"β Dashboard app not found: {dashboard_path}")
|
| 66 |
+
else:
|
| 67 |
+
logger.info(f"β
Dashboard app found: {dashboard_path}")
|
| 68 |
+
|
| 69 |
+
return issues
|
| 70 |
+
|
| 71 |
+
def main():
|
| 72 |
+
"""Main startup function"""
|
| 73 |
+
print("=" * 80)
|
| 74 |
+
print("FMCG Analytics Dashboard - Startup Check")
|
| 75 |
+
print("=" * 80)
|
| 76 |
+
|
| 77 |
+
# Check environment
|
| 78 |
+
issues = check_environment()
|
| 79 |
+
|
| 80 |
+
if issues:
|
| 81 |
+
print("\nβ STARTUP FAILED - Issues detected:\n")
|
| 82 |
+
for issue in issues:
|
| 83 |
+
print(f" {issue}")
|
| 84 |
+
print("\nπ‘ Solution:")
|
| 85 |
+
print(" Run the pipeline first to generate all required files:")
|
| 86 |
+
print(" python run_pipeline.py")
|
| 87 |
+
sys.exit(1)
|
| 88 |
+
|
| 89 |
+
print("\nβ
All checks passed! Starting dashboard...\n")
|
| 90 |
+
print("=" * 80)
|
| 91 |
+
|
| 92 |
+
# Launch dashboard
|
| 93 |
+
import subprocess
|
| 94 |
+
dashboard_path = Path(__file__).parent / "src" / "dashboard_app.py"
|
| 95 |
+
|
| 96 |
+
try:
|
| 97 |
+
subprocess.run([
|
| 98 |
+
sys.executable, "-m", "streamlit", "run",
|
| 99 |
+
str(dashboard_path),
|
| 100 |
+
"--server.port=8501",
|
| 101 |
+
"--server.address=localhost"
|
| 102 |
+
], check=True)
|
| 103 |
+
except KeyboardInterrupt:
|
| 104 |
+
print("\n\nπ Dashboard stopped by user")
|
| 105 |
+
except Exception as e:
|
| 106 |
+
print(f"\nβ Error starting dashboard: {e}")
|
| 107 |
+
sys.exit(1)
|
| 108 |
+
|
| 109 |
+
if __name__ == "__main__":
|
| 110 |
+
main()
|