Spaces:

lowwhit
/

BI-dashboard

Sleeping

App Files Files Community

Lohith Venkat Chamakura commited on Dec 4, 2025

Commit

48909ac

1 Parent(s): 599f1a9

Initial commit

Browse files

Files changed (9) hide show

.DS_Store +0 -0
README.md +264 -7
app.py +817 -0
constants.py +41 -0
data_processor.py +314 -0
insights.py +204 -0
requirements.txt +10 -0
utils.py +111 -0
visualizations.py +327 -0

.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

README.md CHANGED Viewed

@@ -1,13 +1,270 @@
----
-title: BI Dashboard
-emoji: 🏃
-colorFrom: red
 colorTo: green
 sdk: gradio
 sdk_version: 6.0.2
 app_file: app.py
 pinned: false
-short_description: Business Intelligence Dashboard
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+title: Business Intelligence Dashboard
+emoji: 📊
+colorFrom: blue
 colorTo: green
 sdk: gradio
 sdk_version: 6.0.2
 app_file: app.py
 pinned: false
+# Business Intelligence Dashboard
+An interactive Business Intelligence dashboard built with Gradio that enables users to explore and analyze business data through an intuitive, Tableau-like web interface.
+## Features
+### 📁 Data Upload & Validation
+- Upload CSV or Excel files through the web interface
+- Display basic dataset information (shape, columns, data types)
+- Show data preview (first 10 rows)
+- Graceful error handling with informative messages
+### 📈 Data Exploration & Summary Statistics
+- **Automated Data Profiling:**
+  - Numerical columns: mean, median, std, min, max, quartiles
+  - Categorical columns: unique values, value counts, mode
+  - Missing value report
+  - Correlation matrix for numerical features
+### 🔍 Interactive Filtering
+- Dynamic filtering interface based on column types:
+  - **Numerical:** Range sliders with min/max inputs
+  - **Categorical:** Multi-select checkboxes
+  - **Date:** Date range pickers (when applicable)
+- Real-time row count updates as filters are applied
+- Display filtered data preview
+### 📊 Visualizations
+Implements 5 different visualization types:
+1. **Time Series Plot:** Trends over time with aggregation options
+2. **Distribution Plot:** Histogram or box plot for numerical data
+3. **Category Analysis:** Bar chart or pie chart for categorical data
+4. **Scatter Plot:** Show relationships between variables
+5. **Correlation Heatmap:** Visualize correlations between numerical features
+**Features:**
+- User selects which columns to visualize
+- Clear titles, labels, and legends
+- Multiple aggregation methods (sum, mean, count, median)
+- Professional Plotly visualizations
+### 💡 Insights Generation
+Automatically generates insights:
+- **Top/Bottom Performers:** Identify highest/lowest values
+- **Basic Trends:** Detect patterns in time series data
+- **Summary Statistics:** High-level dataset overview
+### 💾 Export Functionality
+- Export filtered data as CSV
+- Export visualizations as PNG images
+## High-Level Architecture
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                         User Interface                          │
+│                      (Gradio Web Interface)                     │
+│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐           │
+│  │ Data Upload  │  │ Visualization│  │   Insights   │           │
+│  │   & Preview  │  │   & Charts   │  │  Generation  │           │
+│  └──────────────┘  └──────────────┘  └──────────────┘           │
+│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐           │
+│  │  Statistics  │  │   Filter &   │  │    Export    │           │
+│  │  & Profiling │  │   Explore    │  │ Functionality│           │
+│  └──────────────┘  └──────────────┘  └──────────────┘           │
+└────────────────────────────┬────────────────────────────────────┘
+                             │
+                             ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                    Application Layer (app.py)                   │
+│  • Orchestrates user interactions                               │
+│  • Manages global state (current_df, filters, figures)          │
+│  • Routes requests to appropriate modules                       │
+└────────────────────────────┬────────────────────────────────────┘
+                             │
+        ┌────────────────────┼────────────────────┐
+        │                    │                    │
+        ▼                    ▼                    ▼
+┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
+│  Data Processing │ │  Visualizations  │ │     Insights     │
+│   Layer          │ │     Layer        │ │      Layer       │
+│                  │ │                  │ │                  │
+│ data_processor.py│ │visualizations.py │ │  insights.py     │
+│                  │ │                  │ │                  │
+│ • CSV/Excel Load │ │ • Time Series    │ │ • Top/Bottom     │
+│ • Data Cleaning  │ │ • Distribution   │ │   Performers     │
+│ • Filtering      │ │ • Category       │ │ • Trend Analysis │
+│ • Statistics     │ │   Analysis       │ │ • Summary Stats  │
+│   Generation     │ │ • Scatter Plot   │ │                  │
+│                  │ │ • Correlation    │ │                  │
+│                  │ │   Heatmap        │ │                  │
+└──────────────────┘ └──────────────────┘ └──────────────────┘
+        │                    │                    │
+        └────────────────────┼────────────────────┘
+                             │
+                             ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                    Utilities Layer (utils.py)                   │
+│  • Column type detection (numerical, categorical, date)         │
+│  • Missing value analysis                                       │
+│  • Data validation helpers                                      │
+└────────────────────────────┬────────────────────────────────────┘
+                             │
+                             ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                      Data Sources                               │
+│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐           │
+│  │ stocks.csv   │  │sales_train   │  │Online Retail │           │
+│  │              │  │   .csv       │  │   .xlsx      │           │
+│  └──────────────┘  └──────────────┘  └──────────────┘           │
+│                                                                 │
+│  • CSV files (pandas.read_csv)                                  │
+│  • Excel files (pandas.read_excel)                              │
+│  • User-uploaded datasets                                       │
+└─────────────────────────────────────────────────────────────────┘
+┌─────────────────────────────────────────────────────────────────┐
+│                    External Libraries                           │
+│  • pandas: Data manipulation and analysis                       │
+│  • plotly: Interactive visualizations                           │
+│  • gradio: Web interface framework                              │
+│  • numpy: Numerical computations                                │
+└─────────────────────────────────────────────────────────────────┘
+```
+## Project Structure
+```
+project/
+├── app.py                 # Main Gradio application
+├── data_processor.py      # Data loading, cleaning, filtering
+├── visualizations.py      # Chart creation functions
+├── insights.py            # Automated insight generation
+├── utils.py               # Helper functions
+├── requirements.txt       # Python dependencies
+├── README.md              # This file
+└── data/                  # Sample datasets
+    ├── sales_train.csv
+    ├── stocks.csv
+    └── Online Retail.xlsx
+```
+## Setup Instructions
+### 1. Install Dependencies
+```bash
+pip install -r requirements.txt
+```
+**Note:** This project uses Gradio 6.0.2, which includes improved performance and updated APIs. Make sure you have Python 3.8 or higher installed.
+### 2. Run the Application
+```bash
+python app.py
+```
+The application will launch and be accessible at `http://localhost:7860` in your web browser.
+## Usage
+1. **Upload Data:** Navigate to the "Data Upload & Preview" tab and upload a CSV or Excel file
+2. **View Statistics:** Go to "Statistics & Profiling" to see comprehensive data statistics
+3. **Apply Filters:** Use "Filter & Explore" to filter your data by column values
+4. **Create Visualizations:** Visit "Visualizations" to create interactive charts
+5. **Generate Insights:** Check "Insights" for automated data insights
+6. **Export Data:** Use "Export" to download filtered data or visualizations
+## Aggregation Methods
+The dashboard supports multiple aggregation methods for visualizations:
+- **Sum**: Adds all values together (useful for totals, volumes)
+- **Mean**: Calculates the average value (useful for prices, rates)
+- **Count**: Counts the number of data points (useful for frequency)
+- **Median**: Finds the middle value (robust to outliers)
+- **None**: No aggregation (shows raw data points)
+## Step-by-Step Tutorial: Monthly Average Closing Price
+Let's walk through a complete example:
+### Step 1: Load the Data
+1. Open the dashboard
+2. Go to **📁 Data Upload & Preview** tab
+3. Click **Upload Dataset**
+4. Select `sample-datasets/stocks.csv`
+5. Click **Load Data**
+6. Verify the data preview shows the stock data
+### Step 2: Create the Visualization
+1. Navigate to **📊 Visualizations** tab
+2. Configure the chart:
+   - **Chart Type**: `Time Series`
+   - **X-Axis Column**: `Date`
+   - **Y-Axis Column**: `Close`
+   - **Aggregation Method**: `Mean`
+3. Click **Generate Visualization**
+### Step 3: Interpret the Results
+- The chart shows a line graph with dates on X-axis and average closing prices on Y-axis
+- Each point represents the mean closing price for that date
+- You can see trends, patterns, and changes over time
+### Step 4: Compare Different Aggregations
+Try generating the same chart with different aggregation methods:
+- **Mean**: Average closing price (smooth trend)
+- **Sum**: Total closing price (not meaningful for prices, but shows concept)
+- **Median**: Middle closing price (robust to outliers)
+- **None**: All individual closing prices (may be cluttered)
+## Technical Details
+### Design Patterns
+The application uses the **Strategy Pattern** for:
+- **Data Loading:** Different strategies for CSV vs Excel files
+- **Data Filtering:** Different strategies for numerical, categorical, and date filters
+- **Visualizations:** Different strategies for each chart type
+### Code Quality
+- Follows PEP 8 style guidelines
+- Comprehensive docstrings for all functions
+- Proper error handling with try/except blocks
+- Modular design with clear separation of concerns
+- No hardcoded values (uses constants and configuration)
+### Libraries
+- **pandas 2.2.0+:** All data manipulation and analysis
+- **Gradio 6.0.2:** Web interface framework
+- **Plotly 5.22.0+:** Interactive visualizations
+- **matplotlib 3.8.0+ / seaborn 0.13.0+:** Additional visualization support
+- **Python 3.8+:** Following best practices
+## Sample Datasets
+The `data/` folder includes sample datasets:
+- `sales_train.csv`: Sales transaction data
+- `stocks.csv`: Stock market data
+- `Online Retail.xlsx`: E-commerce retail data
+## Requirements
+- Python 3.8 or higher
+- All dependencies listed in `requirements.txt`:
+  - pandas >= 2.2.0
+  - numpy >= 1.26.0
+  - gradio == 6.0.2
+  - matplotlib >= 3.8.0
+  - seaborn >= 0.13.0
+  - plotly >= 5.22.0
+  - kaleido >= 0.2.1
+  - openpyxl >= 3.1.5
+  - Pillow >= 10.4.0
+## License
+This project is created for educational purposes as part of CS5130 coursework.

app.py ADDED Viewed

	@@ -0,0 +1,817 @@

+"""
+Main Gradio application for the Business Intelligence Dashboard.
+This module creates a Tableau-like interactive dashboard interface
+for data exploration and analysis.
+"""
+import gradio as gr
+import pandas as pd
+import numpy as np
+from typing import Optional, Dict, List, Tuple, Any
+import io
+import base64
+from PIL import Image
+import plotly.graph_objects as go
+from data_processor import DataLoader, DataFilter, DataProfiler
+from visualizations import VisualizationFactory
+from insights import InsightGenerator
+from utils import detect_column_types, get_missing_value_summary
+from constants import (
+    PREVIEW_ROWS,
+    FILTERED_PREVIEW_ROWS,
+    MAX_COLUMNS_DISPLAY,
+    MAX_UNIQUE_VALUES_DISPLAY,
+    EXPORT_IMAGE_WIDTH,
+    EXPORT_IMAGE_HEIGHT,
+    EXPORT_IMAGE_SCALE,
+    EXPORT_IMAGE_FILENAME,
+    EXPORT_HTML_FILENAME,
+    DEFAULT_TOP_N,
+    KB_CONVERSION,
+    TEXTBOX_LINES_DEFAULT,
+    TEXTBOX_LINES_INSIGHTS
+)
+# Global state
+current_df: Optional[pd.DataFrame] = None
+current_filters: Dict[str, Any] = {}
+current_figure: Optional[go.Figure] = None
+def load_and_preview_data(file) -> Tuple[str, pd.DataFrame, str]:
+    """
+    Load data file and return preview information.
+    Args:
+        file: Uploaded file object (can be string path or file object in Gradio 6.0.2)
+    Returns:
+        Tuple of (info_text, preview_df, error_message)
+    """
+    global current_df, current_filters
+    if file is None:
+        return "No file uploaded", None, ""
+    try:
+        loader = DataLoader()
+        # Handle both string paths and file objects (Gradio 6.0.2 compatibility)
+        file_path = file if isinstance(file, str) else file.name
+        df, error = loader.load_data(file_path)
+        if error:
+            return f"Error: {error}", None, error
+        current_df = df
+        current_filters = {}
+        # Get basic info
+        profiler = DataProfiler()
+        info = profiler.get_basic_info(df)
+        info_text = f"""
+        **Dataset Information:**
+        - **Shape:** {info['shape'][0]:,} rows × {info['shape'][1]} columns
+        - **Memory Usage:** {info['memory_usage'] / KB_CONVERSION:.2f} KB
+        - **Columns:** {', '.join(info['columns'][:MAX_COLUMNS_DISPLAY])}{'...' if len(info['columns']) > MAX_COLUMNS_DISPLAY else ''}
+        """
+        # Preview first rows
+        preview_df = df.head(PREVIEW_ROWS)
+        return info_text, preview_df, ""
+    except Exception as e:
+        return f"Error loading file: {str(e)}", None, str(e)
+def get_statistics() -> Tuple[str, pd.DataFrame, pd.DataFrame, pd.DataFrame]:
+    """
+    Generate comprehensive statistics for the loaded dataset.
+    Returns:
+        Tuple of (missing_values_text, numerical_stats, categorical_stats, correlation_matrix)
+    """
+    global current_df
+    if current_df is None or current_df.empty:
+        return "No data loaded", pd.DataFrame(), pd.DataFrame(), pd.DataFrame()
+    try:
+        profiler = DataProfiler()
+        # Missing values
+        missing_df = get_missing_value_summary(current_df)
+        if missing_df.empty:
+            missing_text = "✅ No missing values found in the dataset."
+        else:
+            missing_text = "**Missing Values Summary:**\n\n"
+            missing_text += missing_df.to_string(index=False)
+        # Numerical statistics
+        numerical_stats = profiler.get_numerical_stats(current_df)
+        # Categorical statistics
+        categorical_stats = profiler.get_categorical_stats(current_df)
+        # Correlation matrix
+        correlation_matrix = profiler.get_correlation_matrix(current_df)
+        return missing_text, numerical_stats, categorical_stats, correlation_matrix
+    except Exception as e:
+        return f"Error generating statistics: {str(e)}", pd.DataFrame(), pd.DataFrame(), pd.DataFrame()
+def update_column_dropdowns():
+    """
+    Update column dropdown choices based on loaded data.
+    Returns:
+        Tuple of update dictionaries for x_column and y_column dropdowns
+    """
+    global current_df
+    if current_df is None or current_df.empty:
+        return gr.update(choices=[]), gr.update(choices=[])
+    all_columns = list(current_df.columns)
+    return gr.update(choices=all_columns), gr.update(choices=all_columns)
+def apply_simple_filters(
+    filter_column: Optional[str],
+    filter_type: str,
+    min_val: Optional[float],
+    max_val: Optional[float],
+    selected_values: List[str]
+) -> Tuple[str, pd.DataFrame, int]:
+    """
+    Apply a single filter to the dataset.
+    Args:
+        filter_column: Column to filter on
+        filter_type: Type of filter (numerical/categorical)
+        min_val: Minimum value for numerical filter
+        max_val: Maximum value for numerical filter
+        selected_values: Selected values for categorical filter
+    Returns:
+        Tuple of (info_text, filtered_df, row_count)
+    """
+    global current_df, current_filters
+    if current_df is None or current_df.empty:
+        return "No data loaded", pd.DataFrame(), 0
+    if filter_column is None or filter_column == "":
+        # No filter applied, return original data
+        current_filters = {}
+        row_count = len(current_df)
+        info_text = f"**Dataset:** {row_count:,} rows (no filters applied)"
+        return info_text, current_df.head(FILTERED_PREVIEW_ROWS), row_count
+    try:
+        filters = {}
+        numerical, categorical, date_columns = detect_column_types(current_df)
+        if filter_type == "numerical" and filter_column in numerical:
+            if min_val is not None and max_val is not None:
+                original_min = float(current_df[filter_column].min())
+                original_max = float(current_df[filter_column].max())
+                if min_val != original_min or max_val != original_max:
+                    filters[filter_column] = (min_val, max_val)
+        elif filter_type == "categorical" and filter_column in categorical:
+            if selected_values:
+                all_vals = sorted(current_df[filter_column].dropna().unique().tolist())
+                if set(selected_values) != set(all_vals):
+                    filters[filter_column] = selected_values
+        # Apply filters
+        data_filter = DataFilter()
+        filtered_df = data_filter.apply_filters(current_df, filters)
+        current_filters = filters
+        row_count = len(filtered_df)
+        info_text = f"**Filtered Dataset:** {row_count:,} rows (from {len(current_df):,} original rows)"
+        return info_text, filtered_df.head(FILTERED_PREVIEW_ROWS), row_count
+    except Exception as e:
+        return f"Error applying filters: {str(e)}", pd.DataFrame(), 0
+def get_filter_options() -> Tuple[List[str], str, Dict]:
+    """
+    Get filter options based on current data.
+    Returns:
+        Tuple of (column_choices, default_type, filter_component_updates)
+    """
+    global current_df
+    if current_df is None or current_df.empty:
+        return [], "numerical", {}
+    numerical, categorical, date_columns = detect_column_types(current_df)
+    all_columns = list(current_df.columns)
+    # Determine default filter type
+    default_type = "numerical" if numerical else "categorical" if categorical else "numerical"
+    return all_columns, default_type, {}
+def create_visualization(
+    chart_type: str,
+    x_column: Optional[str],
+    y_column: Optional[str],
+    aggregation: str,
+    category_chart_type: str = 'bar'
+) -> go.Figure:
+    """
+    Create visualization based on user selections.
+    Args:
+        chart_type: Type of chart to create
+        x_column: X-axis column
+        y_column: Y-axis column
+        aggregation: Aggregation method
+        category_chart_type: Type for category charts (bar/pie)
+    Returns:
+        Plotly figure object
+    """
+    global current_df, current_filters, current_figure
+    if current_df is None or current_df.empty:
+        current_figure = None
+        return None
+    try:
+        # Apply current filters
+        if current_filters:
+            data_filter = DataFilter()
+            df = data_filter.apply_filters(current_df, current_filters)
+        else:
+            df = current_df.copy()
+        if df.empty:
+            current_figure = None
+            return None
+        # Validate required columns for specific chart types
+        if chart_type in ['time_series', 'scatter']:
+            if not x_column or not y_column:
+                # Return a simple error message plot
+                fig = go.Figure()
+                fig.add_annotation(
+                    text="Please select both X and Y columns for this chart type",
+                    xref="paper", yref="paper",
+                    x=0.5, y=0.5, showarrow=False,
+                    font=dict(size=16)
+                )
+                fig.update_layout(title="Missing Required Columns")
+                current_figure = fig
+                return fig
+        factory = VisualizationFactory()
+        # Handle category chart type and distribution chart type
+        # Pass sub-type (bar/pie for category, histogram/box for distribution) in kwargs
+        # Use 'sub_chart_type' key to avoid conflict with factory's 'chart_type' parameter
+        kwargs = {}
+        if chart_type == 'category':
+            kwargs['sub_chart_type'] = category_chart_type
+        elif chart_type == 'distribution':
+            kwargs['sub_chart_type'] = 'histogram'
+        fig = factory.create_visualization(
+            chart_type=chart_type,
+            df=df,
+            x_column=x_column,
+            y_column=y_column,
+            aggregation=aggregation,
+            **kwargs
+        )
+        # Store the figure globally for export
+        current_figure = fig
+        return fig
+    except Exception as e:
+        print(f"Error creating visualization: {e}")
+        # Return a simple error message plot
+        fig = go.Figure()
+        fig.add_annotation(
+            text=f"Error creating visualization: {str(e)}",
+            xref="paper", yref="paper",
+            x=0.5, y=0.5, showarrow=False,
+            font=dict(size=14)
+        )
+        fig.update_layout(title="Visualization Error")
+        current_figure = fig
+        return fig
+def generate_insights() -> Tuple[str, str, str]:
+    """
+    Generate automated insights from the data.
+    Returns:
+        Tuple of (summary_insights, top_performers, trend_analysis)
+    """
+    global current_df, current_filters
+    if current_df is None or current_df.empty:
+        return "No data loaded", "", ""
+    try:
+        # Apply filters if any
+        if current_filters:
+            data_filter = DataFilter()
+            df = data_filter.apply_filters(current_df, current_filters)
+        else:
+            df = current_df.copy()
+        generator = InsightGenerator()
+        # Summary insights
+        summary = generator.generate_summary_insights(df)
+        summary_text = "\n".join([f"• {insight}" for insight in summary])
+        # Top/Bottom performers
+        numerical, _, _ = detect_column_types(df)
+        top_bottom_text = ""
+        if numerical:
+            # Use first numerical column
+            col = numerical[0]
+            performers = generator.get_top_bottom_performers(df, col, top_n=DEFAULT_TOP_N)
+            top_bottom_text = f"**Top {DEFAULT_TOP_N} Performers for '{col}':**\n"
+            for idx, val in performers['top']:
+                top_bottom_text += f"  • Row {idx}: {val:,.2f}\n"
+            top_bottom_text += f"\n**Bottom {DEFAULT_TOP_N} Performers for '{col}':**\n"
+            for idx, val in performers['bottom']:
+                top_bottom_text += f"  • Row {idx}: {val:,.2f}\n"
+        # Trend analysis
+        date_cols = [col for col in df.columns if 'date' in col.lower() or 'time' in col.lower()]
+        trend_text = ""
+        if date_cols and numerical:
+            date_col = date_cols[0]
+            value_col = numerical[0]
+            trend = generator.detect_trends(df, date_col, value_col)
+            trend_text = f"**Trend Analysis ({value_col} over {date_col}):**\n"
+            trend_text += f"  • {trend.get('message', 'No trend detected')}\n"
+        return summary_text, top_bottom_text, trend_text
+    except Exception as e:
+        return f"Error generating insights: {str(e)}", "", ""
+def export_data() -> str:
+    """
+    Export filtered data as CSV.
+    Returns:
+        Path to exported CSV file
+    """
+    global current_df, current_filters
+    if current_df is None or current_df.empty:
+        return None
+    try:
+        # Apply filters
+        if current_filters:
+            data_filter = DataFilter()
+            df = data_filter.apply_filters(current_df, current_filters)
+        else:
+            df = current_df.copy()
+        # Save to temporary file
+        output_path = "filtered_data_export.csv"
+        df.to_csv(output_path, index=False)
+        return output_path
+    except Exception as e:
+        print(f"Error exporting data: {e}")
+        return None
+def export_visualization(fig) -> Optional[str]:
+    """
+    Export visualization as PNG or HTML.
+    Args:
+        fig: Plotly figure object or PlotData from Gradio (can be None)
+    Returns:
+        Path to exported file, or None if no figure
+    """
+    global current_figure
+    # Use the stored figure instead of the PlotData object from Gradio
+    plotly_fig = current_figure
+    if plotly_fig is None:
+        return None
+    try:
+        output_path = EXPORT_IMAGE_FILENAME
+        # Try to export as PNG, fallback to HTML if kaleido not available
+        try:
+            plotly_fig.write_image(
+                output_path,
+                width=EXPORT_IMAGE_WIDTH,
+                height=EXPORT_IMAGE_HEIGHT,
+                scale=EXPORT_IMAGE_SCALE
+            )
+        except Exception as img_error:
+            # If image export fails, save as HTML instead
+            try:
+                output_path = EXPORT_HTML_FILENAME
+                plotly_fig.write_html(output_path)
+            except Exception as html_error:
+                print(f"Error exporting visualization: {html_error}")
+                return None
+        return output_path
+    except Exception as e:
+        print(f"Error exporting visualization: {e}")
+        return None
+def create_dashboard():
+    """Create and configure the Gradio dashboard interface."""
+    with gr.Blocks(title="Business Intelligence Dashboard") as demo:
+        gr.Markdown(
+            """
+            # 📊 Business Intelligence Dashboard
+            **Interactive Data Analysis and Visualization Platform**
+            Upload your dataset and explore insights through an intuitive, Tableau-like interface.
+            """
+        )
+        # State to store current dataframe
+        df_state = gr.State(value=None)
+        # Tab 1: Data Upload
+        with gr.Tab("📁 Data Upload & Preview"):
+            with gr.Row():
+                with gr.Column(scale=1):
+                    file_input = gr.File(
+                        label="Upload Dataset",
+                        file_types=[".csv", ".xlsx", ".xls"],
+                        type="filepath"
+                    )
+                    upload_btn = gr.Button("Load Data", variant="primary", size="lg")
+                with gr.Column(scale=2):
+                    info_output = gr.Markdown("Upload a CSV or Excel file to begin.")
+                    preview_output = gr.Dataframe(
+                        label=f"Data Preview (First {PREVIEW_ROWS} Rows)",
+                        interactive=False,
+                        wrap=True
+                    )
+            upload_btn.click(
+                fn=load_and_preview_data,
+                inputs=[file_input],
+                outputs=[info_output, preview_output, df_state]
+            )
+        # Tab 2: Statistics
+        with gr.Tab("📈 Statistics & Profiling"):
+            with gr.Row():
+                with gr.Column():
+                    stats_btn = gr.Button("Generate Statistics", variant="primary")
+                    missing_output = gr.Textbox(
+                        label="Missing Values Report",
+                        lines=TEXTBOX_LINES_DEFAULT,
+                        interactive=False
+                    )
+                with gr.Column():
+                    numerical_stats_output = gr.Dataframe(
+                        label="Numerical Statistics",
+                        interactive=False,
+                        wrap=True
+                    )
+            with gr.Row():
+                categorical_stats_output = gr.Dataframe(
+                    label="Categorical Statistics",
+                    interactive=False,
+                    wrap=True
+                )
+                correlation_output = gr.Dataframe(
+                    label="Correlation Matrix",
+                    interactive=False,
+                    wrap=True
+                )
+            stats_btn.click(
+                fn=get_statistics,
+                inputs=[],
+                outputs=[missing_output, numerical_stats_output, categorical_stats_output, correlation_output]
+            )
+        # Tab 3: Filter & Explore
+        with gr.Tab("🔍 Filter & Explore"):
+            with gr.Row():
+                with gr.Column(scale=1):
+                    filter_info = gr.Markdown("**Apply filters to explore your data:**")
+                    filter_column = gr.Dropdown(
+                        choices=[],
+                        label="Select Column to Filter",
+                        interactive=True
+                    )
+                    filter_type = gr.Radio(
+                        choices=["numerical", "categorical"],
+                        label="Filter Type",
+                        value="numerical",
+                        interactive=True
+                    )
+                    with gr.Group(visible=True) as numerical_filter_group:
+                        min_val_input = gr.Number(label="Minimum Value", interactive=True)
+                        max_val_input = gr.Number(label="Maximum Value", interactive=True)
+                    with gr.Group(visible=False) as categorical_filter_group:
+                        selected_values = gr.CheckboxGroup(
+                            choices=[],
+                            label="Select Values",
+                            interactive=True
+                        )
+                    filter_btn = gr.Button("Apply Filter", variant="primary")
+                    clear_filter_btn = gr.Button("Clear Filters", variant="secondary")
+                with gr.Column(scale=2):
+                    filter_result_info = gr.Markdown("")
+                    filtered_data_output = gr.Dataframe(
+                        label=f"Filtered Data Preview (First {FILTERED_PREVIEW_ROWS} Rows)",
+                        interactive=False,
+                        wrap=True
+                    )
+                    row_count_output = gr.Number(
+                        label="Filtered Row Count",
+                        interactive=False
+                    )
+            def update_filter_ui(column, filter_type_val):
+                """Update filter UI based on column and type selection."""
+                global current_df
+                if current_df is None or current_df.empty or not column:
+                    return (
+                        gr.update(visible=False),
+                        gr.update(visible=False),
+                        gr.update(value=None),
+                        gr.update(value=None),
+                        gr.update(choices=[])
+                    )
+                numerical, categorical, _ = detect_column_types(current_df)
+                if filter_type_val == "numerical" and column in numerical:
+                    min_val = float(current_df[column].min())
+                    max_val = float(current_df[column].max())
+                    return (
+                        gr.update(visible=True),
+                        gr.update(visible=False),
+                        gr.update(value=min_val, label=f"Min {column}"),
+                        gr.update(value=max_val, label=f"Max {column}"),
+                        gr.update(choices=[])
+                    )
+                elif filter_type_val == "categorical" and column in categorical:
+                    unique_vals = sorted(
+                        current_df[column].dropna().unique().tolist()
+                    )[:MAX_UNIQUE_VALUES_DISPLAY]
+                    return (
+                        gr.update(visible=False),
+                        gr.update(visible=True),
+                        gr.update(value=None),
+                        gr.update(value=None),
+                        gr.update(choices=unique_vals, value=unique_vals)
+                    )
+                else:
+                    return (
+                        gr.update(visible=False),
+                        gr.update(visible=False),
+                        gr.update(value=None),
+                        gr.update(value=None),
+                        gr.update(choices=[])
+                    )
+            filter_column.change(
+                fn=update_filter_ui,
+                inputs=[filter_column, filter_type],
+                outputs=[numerical_filter_group, categorical_filter_group,
+                        min_val_input, max_val_input, selected_values]
+            )
+            filter_type.change(
+                fn=update_filter_ui,
+                inputs=[filter_column, filter_type],
+                outputs=[numerical_filter_group, categorical_filter_group,
+                        min_val_input, max_val_input, selected_values]
+            )
+            filter_btn.click(
+                fn=apply_simple_filters,
+                inputs=[filter_column, filter_type, min_val_input, max_val_input, selected_values],
+                outputs=[filter_result_info, filtered_data_output, row_count_output]
+            )
+            def clear_filters():
+                """Clear all filters."""
+                global current_filters
+                current_filters = {}
+                if current_df is not None:
+                    row_count = len(current_df)
+                    info_text = f"**Dataset:** {row_count:,} rows (filters cleared)"
+                    return info_text, current_df.head(FILTERED_PREVIEW_ROWS), row_count
+                return "No data loaded", pd.DataFrame(), 0
+            clear_filter_btn.click(
+                fn=clear_filters,
+                inputs=[],
+                outputs=[filter_result_info, filtered_data_output, row_count_output]
+            )
+            def update_filter_column_choices():
+                """Update filter column dropdown when data is loaded."""
+                global current_df
+                if current_df is not None and not current_df.empty:
+                    return gr.update(choices=list(current_df.columns))
+                return gr.update(choices=[])
+            # Update filter column choices when data is loaded
+            upload_btn.click(
+                fn=update_filter_column_choices,
+                inputs=[],
+                outputs=[filter_column],
+                queue=False
+            )
+        # Tab 4: Visualizations
+        with gr.Tab("📊 Visualizations"):
+            with gr.Row():
+                with gr.Column(scale=1):
+                    chart_type = gr.Dropdown(
+                        choices=[
+                            ("Time Series", "time_series"),
+                            ("Distribution (Histogram)", "distribution"),
+                            ("Category Analysis", "category"),
+                            ("Scatter Plot", "scatter"),
+                            ("Correlation Heatmap", "correlation")
+                        ],
+                        label="Chart Type",
+                        value="time_series"
+                    )
+                    x_column = gr.Dropdown(
+                        choices=[],
+                        label="X-Axis Column",
+                        interactive=True
+                    )
+                    y_column = gr.Dropdown(
+                        choices=[],
+                        label="Y-Axis Column (Optional)",
+                        interactive=True
+                    )
+                    aggregation = gr.Dropdown(
+                        choices=["sum", "mean", "count", "median", "none"],
+                        label="Aggregation Method",
+                        value="sum"
+                    )
+                    category_chart_type = gr.Radio(
+                        choices=["bar", "pie"],
+                        label="Category Chart Type",
+                        value="bar",
+                        visible=False
+                    )
+                    viz_btn = gr.Button("Generate Visualization", variant="primary")
+                    export_viz_btn = gr.Button("Export Visualization", variant="secondary")
+                    export_viz_file = gr.File(label="Download Visualization (PNG or HTML)")
+                with gr.Column(scale=2):
+                    visualization_output = gr.Plot(
+                        label="Visualization",
+                        container=True
+                    )
+            def toggle_category_type(chart_type_val):
+                """Show/hide category chart type based on selection."""
+                return gr.update(visible=(chart_type_val == "category"))
+            def update_viz_column_choices():
+                """Update column dropdowns based on loaded data."""
+                global current_df
+                if current_df is not None and not current_df.empty:
+                    all_columns = list(current_df.columns)
+                    return gr.update(choices=all_columns), gr.update(choices=all_columns)
+                return gr.update(choices=[]), gr.update(choices=[])
+            chart_type.change(
+                fn=toggle_category_type,
+                inputs=[chart_type],
+                outputs=[category_chart_type]
+            )
+            # Update visualization column choices when data is loaded
+            upload_btn.click(
+                fn=update_viz_column_choices,
+                inputs=[],
+                outputs=[x_column, y_column],
+                queue=False
+            )
+            viz_btn.click(
+                fn=create_visualization,
+                inputs=[chart_type, x_column, y_column, aggregation, category_chart_type],
+                outputs=[visualization_output]
+            )
+            export_viz_btn.click(
+                fn=export_visualization,
+                inputs=[visualization_output],
+                outputs=[export_viz_file]
+            )
+        # Tab 5: Insights
+        with gr.Tab("💡 Insights"):
+            with gr.Row():
+                insights_btn = gr.Button("Generate Insights", variant="primary", size="lg")
+            with gr.Row():
+                with gr.Column():
+                    summary_insights = gr.Markdown("### Summary Insights")
+                    summary_output = gr.Textbox(
+                        label="",
+                        lines=TEXTBOX_LINES_DEFAULT,
+                        interactive=False
+                    )
+                with gr.Column():
+                    top_bottom_output = gr.Textbox(
+                        label="Top/Bottom Performers",
+                        lines=TEXTBOX_LINES_DEFAULT,
+                        interactive=False
+                    )
+            trend_output = gr.Textbox(
+                label="Trend Analysis",
+                lines=TEXTBOX_LINES_INSIGHTS,
+                interactive=False
+            )
+            insights_btn.click(
+                fn=generate_insights,
+                inputs=[],
+                outputs=[summary_output, top_bottom_output, trend_output]
+            )
+        # Tab 6: Export
+        with gr.Tab("💾 Export"):
+            with gr.Row():
+                with gr.Column():
+                    gr.Markdown("### Export Filtered Data")
+                    export_data_btn = gr.Button("Export as CSV", variant="primary")
+                    export_data_file = gr.File(label="Download CSV")
+                    export_data_btn.click(
+                        fn=export_data,
+                        inputs=[],
+                        outputs=[export_data_file]
+                    )
+    return demo
+if __name__ == "__main__":
+    demo = create_dashboard()
+    demo.launch(
+        share=False,
+        server_name="0.0.0.0",
+        server_port=7860,
+        theme=gr.themes.Soft()
+    )

constants.py ADDED Viewed

	@@ -0,0 +1,41 @@

+"""
+Constants for the Business Intelligence Dashboard.
+This module contains all configuration constants to avoid hardcoded values
+throughout the codebase.
+"""
+# Preview and Display Constants
+PREVIEW_ROWS = 10
+FILTERED_PREVIEW_ROWS = 100
+MAX_CATEGORY_DISPLAY = 20
+MAX_UNIQUE_VALUES_DISPLAY = 100
+MAX_COLUMNS_DISPLAY = 10
+# Export Constants
+EXPORT_IMAGE_WIDTH = 1200
+EXPORT_IMAGE_HEIGHT = 800
+EXPORT_IMAGE_SCALE = 2
+EXPORT_IMAGE_FILENAME = "visualization_export.png"
+EXPORT_HTML_FILENAME = "visualization_export.html"
+# Statistical Constants
+Q1_QUANTILE = 0.25
+Q3_QUANTILE = 0.75
+IQR_MULTIPLIER = 1.5
+TREND_THRESHOLD_PERCENT = 5
+# Analysis Constants
+DEFAULT_TOP_N = 5
+HISTOGRAM_BINS = 30
+MIN_DATA_POINTS_FOR_TREND = 2
+MIN_NUMERICAL_COLUMNS_FOR_CORRELATION = 2
+# Data Conversion Constants
+KB_CONVERSION = 1024
+BYTES_TO_KB_DIVISOR = 1024
+# UI Constants
+TEXTBOX_LINES_DEFAULT = 10
+TEXTBOX_LINES_INSIGHTS = 5

data_processor.py ADDED Viewed

	@@ -0,0 +1,314 @@

+"""
+Data processing module for the Business Intelligence Dashboard.
+This module handles data loading, cleaning, filtering, and profiling
+using the Strategy Pattern for different data operations.
+"""
+from abc import ABC, abstractmethod
+from typing import Dict, List, Optional, Tuple, Any
+import pandas as pd
+import numpy as np
+from utils import detect_column_types, validate_dataframe, get_missing_value_summary
+from constants import MIN_NUMERICAL_COLUMNS_FOR_CORRELATION
+class DataLoadStrategy(ABC):
+    """Abstract base class for data loading strategies."""
+    @abstractmethod
+    def load(self, file_path: str) -> pd.DataFrame:
+        """
+        Load data from file.
+        Args:
+            file_path: Path to the data file
+        Returns:
+            Loaded DataFrame
+        """
+        pass
+class CSVLoadStrategy(DataLoadStrategy):
+    """Strategy for loading CSV files."""
+    def load(self, file_path: str) -> pd.DataFrame:
+        """Load CSV file."""
+        return pd.read_csv(file_path)
+class ExcelLoadStrategy(DataLoadStrategy):
+    """Strategy for loading Excel files."""
+    def load(self, file_path: str) -> pd.DataFrame:
+        """Load Excel file."""
+        return pd.read_excel(file_path)
+class DataLoader:
+    """Context class for data loading using Strategy Pattern."""
+    def __init__(self):
+        """Initialize with default strategies."""
+        self._strategies = {
+            '.csv': CSVLoadStrategy(),
+            '.xlsx': ExcelLoadStrategy(),
+            '.xls': ExcelLoadStrategy()
+        }
+    def load_data(self, file_path: str) -> Tuple[pd.DataFrame, Optional[str]]:
+        """
+        Load data file using appropriate strategy.
+        Args:
+            file_path: Path to the data file
+        Returns:
+            Tuple of (DataFrame, error_message)
+        """
+        try:
+            import os
+            _, ext = os.path.splitext(file_path.lower())
+            if ext not in self._strategies:
+                return None, f"Unsupported file format: {ext}"
+            strategy = self._strategies[ext]
+            df = strategy.load(file_path)
+            # Validate loaded data
+            is_valid, error = validate_dataframe(df)
+            if not is_valid:
+                return None, error
+            return df, None
+        except Exception as e:
+            return None, f"Error loading file: {str(e)}"
+class FilterStrategy(ABC):
+    """Abstract base class for filtering strategies."""
+    @abstractmethod
+    def apply_filter(
+        self,
+        df: pd.DataFrame,
+        column: str,
+        filter_value: Any
+    ) -> pd.DataFrame:
+        """
+        Apply filter to DataFrame.
+        Args:
+            df: Input DataFrame
+            column: Column to filter on
+            filter_value: Filter value/range
+        Returns:
+            Filtered DataFrame
+        """
+        pass
+class NumericalFilterStrategy(FilterStrategy):
+    """Strategy for filtering numerical columns."""
+    def apply_filter(
+        self,
+        df: pd.DataFrame,
+        column: str,
+        filter_value: Tuple[float, float]
+    ) -> pd.DataFrame:
+        """Apply range filter to numerical column."""
+        min_val, max_val = filter_value
+        return df[(df[column] >= min_val) & (df[column] <= max_val)]
+class CategoricalFilterStrategy(FilterStrategy):
+    """Strategy for filtering categorical columns."""
+    def apply_filter(
+        self,
+        df: pd.DataFrame,
+        column: str,
+        filter_value: List[str]
+    ) -> pd.DataFrame:
+        """Apply multi-select filter to categorical column."""
+        if not filter_value:
+            return df
+        return df[df[column].isin(filter_value)]
+class DateFilterStrategy(FilterStrategy):
+    """Strategy for filtering date columns."""
+    def apply_filter(
+        self,
+        df: pd.DataFrame,
+        column: str,
+        filter_value: Tuple[str, str]
+    ) -> pd.DataFrame:
+        """Apply date range filter."""
+        start_date, end_date = filter_value
+        if start_date and end_date:
+            df[column] = pd.to_datetime(df[column], errors='coerce')
+            return df[(df[column] >= start_date) & (df[column] <= end_date)]
+        return df
+class DataFilter:
+    """Context class for data filtering using Strategy Pattern."""
+    def __init__(self):
+        """Initialize with filter strategies."""
+        self._strategies = {
+            'numerical': NumericalFilterStrategy(),
+            'categorical': CategoricalFilterStrategy(),
+            'date': DateFilterStrategy()
+        }
+    def apply_filters(
+        self,
+        df: pd.DataFrame,
+        filters: Dict[str, Any]
+    ) -> pd.DataFrame:
+        """
+        Apply multiple filters to DataFrame.
+        Args:
+            df: Input DataFrame
+            filters: Dictionary of {column: filter_value}
+        Returns:
+            Filtered DataFrame
+        """
+        filtered_df = df.copy()
+        numerical, categorical, date_columns = detect_column_types(df)
+        for column, filter_value in filters.items():
+            if filter_value is None:
+                continue
+            if column in numerical:
+                strategy = self._strategies['numerical']
+            elif column in categorical:
+                strategy = self._strategies['categorical']
+            elif column in date_columns:
+                strategy = self._strategies['date']
+            else:
+                continue
+            try:
+                filtered_df = strategy.apply_filter(filtered_df, column, filter_value)
+            except Exception as e:
+                print(f"Error applying filter to {column}: {e}")
+                continue
+        return filtered_df
+class DataProfiler:
+    """Class for generating data profiling and statistics."""
+    @staticmethod
+    def get_basic_info(df: pd.DataFrame) -> Dict[str, Any]:
+        """
+        Get basic dataset information.
+        Args:
+            df: Input DataFrame
+        Returns:
+            Dictionary with basic info
+        """
+        return {
+            'shape': df.shape,
+            'columns': list(df.columns),
+            'dtypes': df.dtypes.to_dict(),
+            'memory_usage': df.memory_usage(deep=True).sum()
+        }
+    @staticmethod
+    def get_numerical_stats(df: pd.DataFrame) -> pd.DataFrame:
+        """
+        Get statistics for numerical columns.
+        Args:
+            df: Input DataFrame
+        Returns:
+            DataFrame with numerical statistics, with column names as a column
+        """
+        numerical, _, _ = detect_column_types(df)
+        if not numerical:
+            return pd.DataFrame()
+        stats = df[numerical].describe()
+        stats.loc['median'] = df[numerical].median()
+        stats.loc['std'] = df[numerical].std()
+        # Transpose so column names become rows (index)
+        stats_transposed = stats.T
+        # Reset index to make column names a regular column for display
+        stats_transposed = stats_transposed.reset_index()
+        stats_transposed.rename(columns={'index': 'Column'}, inplace=True)
+        # Reorder columns for better readability (Column first, then statistics)
+        column_order = ['Column', 'count', 'mean', 'std', 'min', '25%', '50%', '75%', 'max', 'median']
+        # Only include columns that exist
+        available_columns = [col for col in column_order if col in stats_transposed.columns]
+        stats_transposed = stats_transposed[available_columns]
+        return stats_transposed
+    @staticmethod
+    def get_categorical_stats(df: pd.DataFrame) -> pd.DataFrame:
+        """
+        Get statistics for categorical columns.
+        Args:
+            df: Input DataFrame
+        Returns:
+            DataFrame with categorical statistics
+        """
+        _, categorical, _ = detect_column_types(df)
+        if not categorical:
+            return pd.DataFrame()
+        stats = []
+        for col in categorical:
+            unique_count = df[col].nunique()
+            mode_value = df[col].mode().iloc[0] if not df[col].mode().empty else None
+            mode_count = df[col].value_counts().iloc[0] if not df[col].empty else 0
+            stats.append({
+                'Column': col,
+                'Unique_Values': unique_count,
+                'Mode': mode_value,
+                'Mode_Count': mode_count,
+                'Total_Count': len(df)
+            })
+        return pd.DataFrame(stats)
+    @staticmethod
+    def get_correlation_matrix(df: pd.DataFrame) -> pd.DataFrame:
+        """
+        Get correlation matrix for numerical columns.
+        Args:
+            df: Input DataFrame
+        Returns:
+            Correlation matrix DataFrame
+        """
+        numerical, _, _ = detect_column_types(df)
+        if len(numerical) < MIN_NUMERICAL_COLUMNS_FOR_CORRELATION:
+            return pd.DataFrame()
+        return df[numerical].corr()

insights.py ADDED Viewed

	@@ -0,0 +1,204 @@

+"""
+Insights generation module for the Business Intelligence Dashboard.
+This module automatically generates insights and identifies patterns
+in the data.
+"""
+from typing import Dict, List, Tuple, Optional, Any
+import pandas as pd
+import numpy as np
+from utils import detect_column_types
+from constants import (
+    Q1_QUANTILE,
+    Q3_QUANTILE,
+    IQR_MULTIPLIER,
+    TREND_THRESHOLD_PERCENT,
+    MIN_DATA_POINTS_FOR_TREND
+)
+class InsightGenerator:
+    """Class for generating automated insights from data."""
+    @staticmethod
+    def get_top_bottom_performers(
+        df: pd.DataFrame,
+        column: str,
+        top_n: int = 5
+    ) -> Dict[str, List[Tuple[str, float]]]:
+        """
+        Identify top and bottom performers for a column.
+        Args:
+            df: Input DataFrame
+            column: Column to analyze
+            top_n: Number of top/bottom items to return
+        Returns:
+            Dictionary with 'top' and 'bottom' lists
+        """
+        if column not in df.columns:
+            return {'top': [], 'bottom': []}
+        df_clean = df.dropna(subset=[column])
+        if df_clean.empty:
+            return {'top': [], 'bottom': []}
+        # Get top performers
+        top = df_clean.nlargest(top_n, column)[[column]]
+        top_list = [(idx, float(val)) for idx, val in top[column].items()]
+        # Get bottom performers
+        bottom = df_clean.nsmallest(top_n, column)[[column]]
+        bottom_list = [(idx, float(val)) for idx, val in bottom[column].items()]
+        return {
+            'top': top_list,
+            'bottom': bottom_list
+        }
+    @staticmethod
+    def detect_trends(df: pd.DataFrame, date_column: str, value_column: str) -> Dict[str, Any]:
+        """
+        Detect trends in time series data.
+        Args:
+            df: Input DataFrame
+            date_column: Date column name
+            value_column: Value column name
+        Returns:
+            Dictionary with trend information
+        """
+        if date_column not in df.columns or value_column not in df.columns:
+            return {'trend': 'insufficient_data', 'message': 'Required columns not found'}
+        df_clean = df[[date_column, value_column]].copy()
+        df_clean[date_column] = pd.to_datetime(df_clean[date_column], errors='coerce')
+        df_clean = df_clean.dropna()
+        if len(df_clean) < MIN_DATA_POINTS_FOR_TREND:
+            return {
+                'trend': 'insufficient_data',
+                'message': f'Not enough data points (need at least {MIN_DATA_POINTS_FOR_TREND})'
+            }
+        df_clean = df_clean.sort_values(date_column)
+        # Calculate trend
+        first_half = df_clean[:len(df_clean)//2][value_column].mean()
+        second_half = df_clean[len(df_clean)//2:][value_column].mean()
+        change = ((second_half - first_half) / first_half * 100) if first_half != 0 else 0
+        if change > TREND_THRESHOLD_PERCENT:
+            trend = 'increasing'
+            message = f'Strong upward trend: {change:.2f}% increase'
+        elif change < -TREND_THRESHOLD_PERCENT:
+            trend = 'decreasing'
+            message = f'Downward trend: {change:.2f}% decrease'
+        else:
+            trend = 'stable'
+            message = f'Relatively stable: {change:.2f}% change'
+        return {
+            'trend': trend,
+            'message': message,
+            'change_percentage': change,
+            'first_half_avg': float(first_half),
+            'second_half_avg': float(second_half)
+        }
+    @staticmethod
+    def detect_anomalies(df: pd.DataFrame, column: str) -> List[Dict[str, Any]]:
+        """
+        Detect anomalies in numerical data using IQR method.
+        Args:
+            df: Input DataFrame
+            column: Column to analyze
+        Returns:
+            List of anomaly dictionaries
+        """
+        if column not in df.columns:
+            return []
+        df_clean = df.dropna(subset=[column])
+        if df_clean.empty:
+            return []
+        Q1 = df_clean[column].quantile(Q1_QUANTILE)
+        Q3 = df_clean[column].quantile(Q3_QUANTILE)
+        IQR = Q3 - Q1
+        lower_bound = Q1 - IQR_MULTIPLIER * IQR
+        upper_bound = Q3 + IQR_MULTIPLIER * IQR
+        anomalies = df_clean[
+            (df_clean[column] < lower_bound) | (df_clean[column] > upper_bound)
+        ]
+        result = []
+        for idx, row in anomalies.iterrows():
+            result.append({
+                'index': int(idx),
+                'value': float(row[column]),
+                'type': 'high' if row[column] > upper_bound else 'low'
+            })
+        return result
+    @staticmethod
+    def generate_summary_insights(df: pd.DataFrame) -> List[str]:
+        """
+        Generate high-level summary insights.
+        Args:
+            df: Input DataFrame
+        Returns:
+            List of insight strings
+        """
+        insights = []
+        # Basic stats
+        insights.append(f"Dataset contains {len(df):,} rows and {len(df.columns)} columns")
+        # Missing values
+        missing = df.isnull().sum().sum()
+        if missing > 0:
+            missing_pct = (missing / (len(df) * len(df.columns))) * 100
+            insights.append(
+                f"Found {missing:,} missing values ({missing_pct:.1f}% of data)"
+            )
+        # Numerical columns insights
+        numerical, categorical, date_columns = detect_column_types(df)
+        if numerical:
+            insights.append(f"Dataset has {len(numerical)} numerical columns")
+            # Find column with highest variance
+            variances = df[numerical].var()
+            if not variances.empty:
+                max_var_col = variances.idxmax()
+                insights.append(
+                    f"'{max_var_col}' shows the highest variability"
+                )
+        if categorical:
+            insights.append(f"Dataset has {len(categorical)} categorical columns")
+            # Find most diverse category
+            unique_counts = {col: df[col].nunique() for col in categorical}
+            if unique_counts:
+                max_unique_col = max(unique_counts, key=unique_counts.get)
+                insights.append(
+                    f"'{max_unique_col}' has the most unique values ({unique_counts[max_unique_col]})"
+                )
+        if date_columns:
+            insights.append(f"Dataset has {len(date_columns)} date columns")
+        return insights

requirements.txt ADDED Viewed

	@@ -0,0 +1,10 @@

+pandas>=2.2.0
+numpy>=1.26.0
+gradio==6.0.2
+matplotlib>=3.8.0
+seaborn>=0.13.0
+plotly>=5.22.0
+kaleido>=0.2.1
+openpyxl>=3.1.5
+Pillow>=10.4.0

utils.py ADDED Viewed

	@@ -0,0 +1,111 @@

+"""
+Utility functions for the Business Intelligence Dashboard.
+This module contains helper functions for data type detection,
+validation, and common operations.
+"""
+from typing import List, Optional, Tuple
+import pandas as pd
+import numpy as np
+def detect_column_types(df: pd.DataFrame) -> Tuple[List[str], List[str], List[str]]:
+    """
+    Detect column types in a DataFrame.
+    Args:
+        df: Input DataFrame
+    Returns:
+        Tuple of (numerical_columns, categorical_columns, date_columns)
+    """
+    numerical = []
+    categorical = []
+    date_columns = []
+    for col in df.columns:
+        if pd.api.types.is_datetime64_any_dtype(df[col]):
+            date_columns.append(col)
+        elif pd.api.types.is_numeric_dtype(df[col]):
+            numerical.append(col)
+        else:
+            categorical.append(col)
+    return numerical, categorical, date_columns
+def validate_dataframe(df: pd.DataFrame) -> Tuple[bool, Optional[str]]:
+    """
+    Validate that DataFrame is not empty and has valid structure.
+    Args:
+        df: DataFrame to validate
+    Returns:
+        Tuple of (is_valid, error_message)
+    """
+    if df is None or df.empty:
+        return False, "DataFrame is empty or None"
+    if len(df.columns) == 0:
+        return False, "DataFrame has no columns"
+    return True, None
+def format_number(value: float, decimals: int = 2) -> str:
+    """
+    Format a number with specified decimal places.
+    Args:
+        value: Number to format
+        decimals: Number of decimal places
+    Returns:
+        Formatted string
+    """
+    if pd.isna(value):
+        return "N/A"
+    return f"{value:,.{decimals}f}"
+def safe_divide(numerator: float, denominator: float) -> float:
+    """
+    Safely divide two numbers, returning 0 if denominator is 0.
+    Args:
+        numerator: Numerator value
+        denominator: Denominator value
+    Returns:
+        Division result or 0
+    """
+    if denominator == 0 or pd.isna(denominator):
+        return 0.0
+    return numerator / denominator
+def get_missing_value_summary(df: pd.DataFrame) -> pd.DataFrame:
+    """
+    Get summary of missing values in DataFrame.
+    Args:
+        df: Input DataFrame
+    Returns:
+        DataFrame with missing value statistics
+    """
+    missing = df.isnull().sum()
+    missing_pct = (missing / len(df)) * 100
+    summary = pd.DataFrame({
+        'Column': missing.index,
+        'Missing_Count': missing.values,
+        'Missing_Percentage': missing_pct.values
+    })
+    return summary[summary['Missing_Count'] > 0].sort_values(
+        'Missing_Count', ascending=False
+    )

visualizations.py ADDED Viewed

	@@ -0,0 +1,327 @@

+"""
+Visualization module for the Business Intelligence Dashboard.
+This module creates various types of charts and visualizations
+using the Strategy Pattern for different chart types.
+"""
+from abc import ABC, abstractmethod
+from typing import Dict, List, Optional, Tuple, Any
+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt
+import seaborn as sns
+import plotly.express as px
+import plotly.graph_objects as go
+from plotly.subplots import make_subplots
+from utils import detect_column_types
+from constants import (
+    HISTOGRAM_BINS,
+    MAX_CATEGORY_DISPLAY,
+    MIN_NUMERICAL_COLUMNS_FOR_CORRELATION
+)
+class VisualizationStrategy(ABC):
+    """Abstract base class for visualization strategies."""
+    @abstractmethod
+    def create_chart(
+        self,
+        df: pd.DataFrame,
+        x_column: Optional[str] = None,
+        y_column: Optional[str] = None,
+        aggregation: str = 'sum',
+        **kwargs
+    ) -> go.Figure:
+        """
+        Create a visualization.
+        Args:
+            df: Input DataFrame
+            x_column: X-axis column
+            y_column: Y-axis column
+            aggregation: Aggregation method (sum, mean, count, median)
+            **kwargs: Additional parameters
+        Returns:
+            Plotly figure object
+        """
+        pass
+class TimeSeriesStrategy(VisualizationStrategy):
+    """Strategy for creating time series plots."""
+    def create_chart(
+        self,
+        df: pd.DataFrame,
+        x_column: Optional[str] = None,
+        y_column: Optional[str] = None,
+        aggregation: str = 'sum',
+        **kwargs
+    ) -> go.Figure:
+        """Create time series plot."""
+        if x_column is None or y_column is None:
+            raise ValueError("Both x_column and y_column required for time series")
+        # Convert date column
+        df = df.copy()
+        df[x_column] = pd.to_datetime(df[x_column], errors='coerce')
+        df = df.dropna(subset=[x_column, y_column])
+        # Aggregate if needed
+        if aggregation != 'none':
+            df = df.groupby(x_column)[y_column].agg(aggregation).reset_index()
+        fig = px.line(
+            df,
+            x=x_column,
+            y=y_column,
+            title=f'Time Series: {y_column} over {x_column}',
+            labels={x_column: x_column, y_column: y_column}
+        )
+        fig.update_layout(
+            xaxis_title=x_column,
+            yaxis_title=y_column,
+            hovermode='x unified',
+            template='plotly_white'
+        )
+        return fig
+class DistributionStrategy(VisualizationStrategy):
+    """Strategy for creating distribution plots."""
+    def create_chart(
+        self,
+        df: pd.DataFrame,
+        x_column: Optional[str] = None,
+        y_column: Optional[str] = None,
+        aggregation: str = 'sum',
+        sub_chart_type: str = 'histogram',
+        **kwargs
+    ) -> go.Figure:
+        """Create distribution plot (histogram or box plot)."""
+        if x_column is None:
+            raise ValueError("x_column required for distribution plot")
+        # Get sub_chart_type from kwargs if provided, otherwise use parameter
+        # Check both 'sub_chart_type' (new) and 'chart_type' (legacy) for compatibility
+        sub_chart_type = kwargs.pop('sub_chart_type', kwargs.pop('chart_type', sub_chart_type))
+        df = df.copy()
+        df = df.dropna(subset=[x_column])
+        if sub_chart_type == 'histogram':
+            fig = px.histogram(
+                df,
+                x=x_column,
+                title=f'Distribution of {x_column}',
+                labels={x_column: x_column, 'count': 'Frequency'},
+                nbins=HISTOGRAM_BINS
+            )
+        else:  # box plot
+            fig = px.box(
+                df,
+                y=x_column,
+                title=f'Box Plot of {x_column}',
+                labels={x_column: x_column}
+            )
+        fig.update_layout(
+            template='plotly_white',
+            showlegend=False
+        )
+        return fig
+class CategoryAnalysisStrategy(VisualizationStrategy):
+    """Strategy for creating category analysis charts."""
+    def create_chart(
+        self,
+        df: pd.DataFrame,
+        x_column: Optional[str] = None,
+        y_column: Optional[str] = None,
+        aggregation: str = 'sum',
+        sub_chart_type: str = 'bar',
+        **kwargs
+    ) -> go.Figure:
+        """Create category analysis (bar chart or pie chart)."""
+        if x_column is None:
+            raise ValueError("x_column required for category analysis")
+        # Get sub_chart_type from kwargs if provided, otherwise use parameter
+        # Check both 'sub_chart_type' (new) and 'chart_type' (legacy) for compatibility
+        sub_chart_type = kwargs.pop('sub_chart_type', kwargs.pop('chart_type', sub_chart_type))
+        df = df.copy()
+        df = df.dropna(subset=[x_column])
+        if y_column:
+            # Aggregate by category
+            if aggregation != 'none':
+                df_agg = df.groupby(x_column)[y_column].agg(aggregation).reset_index()
+                df_agg.columns = [x_column, y_column]
+            else:
+                df_agg = df[[x_column, y_column]]
+            # Sort by value
+            df_agg = df_agg.sort_values(y_column, ascending=False).head(MAX_CATEGORY_DISPLAY)
+            if sub_chart_type == 'bar':
+                fig = px.bar(
+                    df_agg,
+                    x=x_column,
+                    y=y_column,
+                    title=f'{y_column} by {x_column}',
+                    labels={x_column: x_column, y_column: y_column}
+                )
+            else:  # pie
+                fig = px.pie(
+                    df_agg,
+                    names=x_column,
+                    values=y_column,
+                    title=f'{y_column} Distribution by {x_column}'
+                )
+        else:
+            # Count by category
+            value_counts = df[x_column].value_counts().head(MAX_CATEGORY_DISPLAY)
+            if sub_chart_type == 'bar':
+                fig = px.bar(
+                    x=value_counts.index,
+                    y=value_counts.values,
+                    title=f'Count by {x_column}',
+                    labels={'x': x_column, 'y': 'Count'}
+                )
+            else:  # pie
+                fig = px.pie(
+                    values=value_counts.values,
+                    names=value_counts.index,
+                    title=f'Distribution of {x_column}'
+                )
+        fig.update_layout(template='plotly_white')
+        return fig
+class ScatterStrategy(VisualizationStrategy):
+    """Strategy for creating scatter plots."""
+    def create_chart(
+        self,
+        df: pd.DataFrame,
+        x_column: Optional[str] = None,
+        y_column: Optional[str] = None,
+        aggregation: str = 'sum',
+        color_column: Optional[str] = None,
+        **kwargs
+    ) -> go.Figure:
+        """Create scatter plot."""
+        if x_column is None or y_column is None:
+            raise ValueError("Both x_column and y_column required for scatter plot")
+        df = df.copy()
+        df = df.dropna(subset=[x_column, y_column])
+        fig = px.scatter(
+            df,
+            x=x_column,
+            y=y_column,
+            color=color_column,
+            title=f'Scatter Plot: {y_column} vs {x_column}',
+            labels={x_column: x_column, y_column: y_column},
+            hover_data=df.columns.tolist()
+        )
+        fig.update_layout(template='plotly_white')
+        return fig
+class CorrelationHeatmapStrategy(VisualizationStrategy):
+    """Strategy for creating correlation heatmaps."""
+    def create_chart(
+        self,
+        df: pd.DataFrame,
+        x_column: Optional[str] = None,
+        y_column: Optional[str] = None,
+        aggregation: str = 'sum',
+        **kwargs
+    ) -> go.Figure:
+        """Create correlation heatmap."""
+        numerical, _, _ = detect_column_types(df)
+        if len(numerical) < MIN_NUMERICAL_COLUMNS_FOR_CORRELATION:
+            raise ValueError(
+                f"Need at least {MIN_NUMERICAL_COLUMNS_FOR_CORRELATION} "
+                "numerical columns for correlation"
+            )
+        corr_matrix = df[numerical].corr()
+        fig = px.imshow(
+            corr_matrix,
+            title='Correlation Heatmap',
+            labels=dict(x="Column", y="Column", color="Correlation"),
+            color_continuous_scale='RdBu',
+            aspect="auto"
+        )
+        fig.update_layout(template='plotly_white')
+        return fig
+class VisualizationFactory:
+    """Factory class for creating visualizations using Strategy Pattern."""
+    def __init__(self):
+        """Initialize with visualization strategies."""
+        self._strategies = {
+            'time_series': TimeSeriesStrategy(),
+            'distribution': DistributionStrategy(),
+            'category': CategoryAnalysisStrategy(),
+            'scatter': ScatterStrategy(),
+            'correlation': CorrelationHeatmapStrategy()
+        }
+    def create_visualization(
+        self,
+        chart_type: str,
+        df: pd.DataFrame,
+        x_column: Optional[str] = None,
+        y_column: Optional[str] = None,
+        aggregation: str = 'sum',
+        **kwargs
+    ) -> go.Figure:
+        """
+        Create visualization using appropriate strategy.
+        Args:
+            chart_type: Type of chart to create
+            df: Input DataFrame
+            x_column: X-axis column
+            y_column: Y-axis column
+            aggregation: Aggregation method
+            **kwargs: Additional parameters
+        Returns:
+            Plotly figure object
+        """
+        if chart_type not in self._strategies:
+            raise ValueError(f"Unknown chart type: {chart_type}")
+        strategy = self._strategies[chart_type]
+        return strategy.create_chart(
+            df,
+            x_column=x_column,
+            y_column=y_column,
+            aggregation=aggregation,
+            **kwargs
+        )