Spaces:
Sleeping
A newer version of the Streamlit SDK is available: 1.56.0
title: FMCG Demand Forecasting with RAG
emoji: ๐
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.25.0
app_file: fmcg_genai/app.py
pinned: false
license: mit
python_version: '3.10'
๐ FMCG Demand Forecasting with RAG
An advanced AI-powered analytics platform for FMCG (Fast-Moving Consumer Goods) sales forecasting and business intelligence. This system combines Machine Learning, Time Series Forecasting, and Retrieval-Augmented Generation (RAG) to provide comprehensive sales insights and predictions.
๐ Live Demo
Try it on Hugging Face Spaces โ (Link will be available after deployment)
โจ Key Features
๐ Advanced Sales Analytics
- Real-time KPI Dashboard: Track total sales, revenue, average pricing, and product portfolio metrics
- Interactive Visualizations: Dynamic charts with Plotly for sales trends, regional performance, and category distribution
- Trend Analysis: 30-day moving averages, growth comparisons, and seasonal pattern detection
- Promotion Impact Analysis: Measure the effectiveness of promotional campaigns with sales lift calculations
๐ฎ AI-Powered Forecasting
- Prophet Time Series Model: Facebook's Prophet for robust seasonal forecasting
- XGBoost ML Model: Gradient boosting for feature-based predictions
- Multi-Scenario Forecasting: Best case, worst case, and confidence interval predictions
- Customizable Horizons: Forecast from 7 to 90 days ahead
- Trend Decomposition: Understand seasonal, weekly, and trend components
๐ค RAG-Based Q&A System
- Natural Language Queries: Ask questions about your data in plain English
- Intelligent Context Retrieval: FAISS vector database for semantic search
- Analytical Answers: Get data-driven insights, not just text extraction
- Pre-built Query Templates: Quick access to common business questions
- Query History: Track and revisit previous questions and answers
๐ Business Intelligence
- Feature Importance Analysis: Understand which factors drive sales the most
- Regional Performance Breakdown: Compare sales across different regions
- Category Distribution: Analyze product category contributions
- Seasonal Insights: Identify peak and low sales periods
- Promotion Effectiveness: Quantify promotional impact on sales
๐๏ธ System Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Streamlit Dashboard UI โ
โ โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Analytics & KPIs โ โ AI Q&A Portal (RAG) โ โ
โ โ - Sales Trends โ โ - Natural Language Queries โ โ
โ โ - Forecasting โ โ - Semantic Search โ โ
โ โ - Visualizations โ โ - Context Retrieval โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ML/AI Engine โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โ
โ โ Prophet โ โ XGBoost โ โ RAG Pipeline โ โ
โ โ Forecasting โ โ ML Model โ โ - FAISS Vector โ โ
โ โ โ โ โ โ - Transformers โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Data Layer โ
โ - Processed FMCG Sales Data (2022-2024) โ
โ - Feature Engineering Pipeline โ
โ - Vector Store (Embeddings) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ ๏ธ Technology Stack
Core ML/AI
- Prophet: Time series forecasting with seasonality detection
- XGBoost: Gradient boosting for feature-based predictions
- Sentence Transformers: Text embeddings for semantic search
- FAISS: Efficient similarity search and clustering
- LangChain: RAG pipeline orchestration
Data Processing
- Pandas & NumPy: Data manipulation and numerical computing
- Scikit-learn: Feature engineering and preprocessing
Visualization
- Plotly: Interactive charts and graphs
- Streamlit: Web application framework
- Matplotlib & Seaborn: Statistical visualizations
Deep Learning
- PyTorch: Neural network framework
- Transformers (Hugging Face): Pre-trained language models
๐ฆ Installation & Setup
Prerequisites
- Python 3.8 or higher
- 4GB+ RAM recommended
- Git
Local Installation
- Clone the repository
git clone https://github.com/Ameya-Bhingurde/FMCG-Demand-Forecasting-with-RAG-.git
cd FMCG-Demand-Forecasting-with-RAG-
- Create virtual environment
python -m venv .venv
# Windows
.venv\Scripts\activate
# Linux/Mac
source .venv/bin/activate
- Install dependencies
cd fmcg_genai
pip install -r requirements.txt
- Run the pipeline (First time setup)
# From the fmcg_genai directory
python run_pipeline.py
This will:
- Process the raw data
- Train ML models
- Create vector store for RAG
- Launch the dashboard
streamlit run src/dashboard_app_enhanced.py
The dashboard will open at http://localhost:8501
๐ฏ How to Use
Dashboard & Forecasting Page
- View KPIs: See real-time metrics for sales, revenue, pricing, and product portfolio
- Analyze Trends: Explore interactive charts showing sales patterns, regional performance, and category distribution
- Generate Forecasts:
- Use the slider to select forecast horizon (7-90 days)
- Choose confidence level (80-95%)
- Toggle scenario analysis for best/worst case predictions
- Understand Drivers: Review feature importance to see what factors influence sales most
AI Q&A Portal
Quick Questions: Click pre-built query buttons for common analyses
- Sales Performance: "What were total sales in 2023?"
- Promotions: "How did promotions affect sales?"
- Trends: "What are the seasonal sales patterns?"
Custom Queries: Type your own questions in natural language
Examples: - "Which region had the highest sales growth in Q2 2024?" - "What is the average price for beverages?" - "How does stock availability impact sales?"View Sources: Expand the sources section to see the data context used for answers
Review History: Check recent questions in the query history section
๐ Data Overview
The system analyzes FMCG sales data with the following attributes:
- Time Period: 2022-2024
- Products: Multiple SKUs across various categories
- Regions: Multi-regional sales data
- Features:
- Sales volume (units sold)
- Pricing information
- Promotion flags
- Stock availability
- Seasonal indicators
- Regional data
- Category classifications
๐ง Model Details
Prophet Forecasting Model
- Purpose: Time series forecasting with trend and seasonality
- Strengths:
- Handles missing data
- Detects seasonal patterns (weekly, monthly, yearly)
- Provides uncertainty intervals
- Robust to outliers
XGBoost Model
- Purpose: Feature-based sales prediction
- Features Used:
- Temporal features (day, month, year, day of week)
- Lag features (previous sales)
- Promotion indicators
- Stock availability
- Regional and category encodings
- Strengths:
- High accuracy
- Feature importance analysis
- Handles non-linear relationships
RAG Pipeline
- Embedding Model: Sentence Transformers (all-MiniLM-L6-v2)
- Vector Store: FAISS for efficient similarity search
- Retrieval: Top-k semantic search (k=5)
- Generation: Context-aware analytical answers
- Strengths:
- Natural language understanding
- Accurate data retrieval
- Analytical insights generation
๐ง Configuration
Edit config.yaml to customize:
data:
raw_dir: "data/raw"
processed_dir: "data/processed"
models:
prophet_model: "models/prophet_model.pkl"
xgboost_model: "models/xgboost_model.pkl"
rag:
vector_store_path: "vector_store"
embedding_model: "sentence-transformers/all-MiniLM-L6-v2"
chunk_size: 500
chunk_overlap: 50
๐ Project Structure
FMCG-Demand-Forecasting-with-RAG-/
โโโ fmcg_genai/
โ โโโ src/
โ โ โโโ dashboard_app_enhanced.py # Main Streamlit dashboard
โ โ โโโ rag_pipeline.py # RAG implementation
โ โ โโโ data_preprocessing.py # Data cleaning & feature engineering
โ โ โโโ model_training.py # ML model training
โ โ โโโ forecasting.py # Prophet forecasting
โ โโโ data/
โ โ โโโ raw/ # Original datasets
โ โ โโโ processed/ # Cleaned & engineered features
โ โโโ models/ # Trained model files
โ โโโ vector_store/ # FAISS index & embeddings
โ โโโ requirements.txt # Python dependencies
โ โโโ config.yaml # Configuration file
โ โโโ run_pipeline.py # Pipeline orchestration
โโโ README.md
โโโ LICENSE
๐ Deployment
Hugging Face Spaces (Recommended)
This app is optimized for Hugging Face Spaces deployment:
- Fork/Clone this repository
- Create a new Space on Hugging Face
- Connect your GitHub repository
- Configure Space settings:
- SDK: Streamlit
- Python version: 3.8+
- Deploy - Automatic build and deployment
The app will be available at: https://huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME
Other Platforms
- Railway: Supports Python apps with 1GB+ RAM
- Google Cloud Run: Serverless deployment with auto-scaling
- AWS EC2: Full control with custom instance sizing
๐ Performance Metrics
- Forecast Accuracy: MAPE < 15% on test set
- RAG Retrieval: 95%+ relevant context retrieval
- Dashboard Load Time: < 3 seconds
- Query Response Time: < 2 seconds
๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐จโ๐ป Author
Ameya Bhingurde
- GitHub: @Ameya-Bhingurde
- LinkedIn: Connect with me
๐ Acknowledgments
- Facebook Prophet for the excellent time series forecasting library
- Hugging Face for Transformers and hosting platform
- Streamlit for the amazing web app framework
- LangChain for RAG pipeline tools
๐ง Contact
For questions or feedback, please open an issue or reach out via GitHub.
โญ If you find this project useful, please consider giving it a star! โญ
Made with โค๏ธ and AI