Spaces:

puligadda
/

rag12-analytics

Running

App Files Files Community

npuliga commited on Jan 2

Commit

a657e9e

1 Parent(s): 463fa64

added files

Browse files

Files changed (11) hide show

Dockerfile +21 -0
HUGGINGFACE_DEPLOYMENT.md +288 -0
README.md +71 -6
app.py +335 -0
config.py +78 -0
data/Biomedical-pubmedqa.csv +7 -0
data/Finance-finqa.csv +7 -0
data/General-msmarco.csv +7 -0
data/Legal-cuad.csv +6 -0
data_loader.py +160 -0
requirements.txt +6 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,21 @@

+# Use the official Python 3.9 image
+FROM python:3.9
+# Set the working directory to /code
+WORKDIR /code
+# Copy the requirements file into the container at /code
+COPY ./requirements.txt /code/requirements.txt
+# Install the dependencies
+RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
+# Copy the rest of your application code
+COPY . .
+# Create a writable directory for cache/temporary files if needed (good practice)
+RUN mkdir -p /code/cache && chmod 777 /code/cache
+# Command to run the application
+# We use host 0.0.0.0 and port 7860 (Hugging Face's default port)
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]

HUGGINGFACE_DEPLOYMENT.md ADDED Viewed

	@@ -0,0 +1,288 @@

+# 🚀 Deploying RAG Analytics to Hugging Face Spaces
+## 📦 Required Files (Keep These)
+### Core Application Files:
+- ✅ `app.py` - Main application
+- ✅ `config.py` - Configuration settings
+- ✅ `data_loader.py` - Data loading logic
+- ✅ `requirements.txt` - Python dependencies
+- ✅ `data/` folder with CSV files:
+  - `Biomedical-pubmedqa.csv`
+  - `Finance-finqa.csv`
+  - `General-msmarco.csv`
+  - `Legal-cuad.csv`
+### Optional Files:
+- ✅ `README.md` - Documentation (recommended)
+- ✅ `.gitattributes` - Git settings (auto-generated)
+## 🗑️ Files to DELETE (Not Needed for Deployment)
+These are just test/debug files:
+- ❌ `debug_plot.py`
+- ❌ `test_data.py`
+- ❌ `test_fix.py`
+- ❌ `show_expected_data.py`
+- ❌ `RESTART_INSTRUCTIONS.md`
+- ❌ `Dockerfile` (unless you specifically want Docker support)
+- ❌ `__pycache__/` (auto-generated, will be ignored)
+---
+## 🎯 Deployment Steps
+### Step 1: Create Hugging Face Space
+1. Go to https://huggingface.co/spaces
+2. Click **"Create new Space"**
+3. Configure:
+   - **Owner:** Your username/organization
+   - **Space name:** `rag-analytics-dashboard` (or your choice)
+   - **License:** Apache 2.0 (recommended)
+   - **Select SDK:** **Gradio**
+   - **Space hardware:** CPU basic (free tier is fine)
+   - **Visibility:** Public or Private (your choice)
+4. Click **"Create Space"**
+### Step 2: Upload Files to Space
+#### Option A: Using Git (Recommended)
+```bash
+# Clone your new space
+git clone https://huggingface.co/spaces/YOUR_USERNAME/rag-analytics-dashboard
+cd rag-analytics-dashboard
+# Copy your files (from rag12-analytics folder)
+copy app.py .
+copy config.py .
+copy data_loader.py .
+copy requirements.txt .
+copy README.md .
+# Copy data folder
+xcopy /E /I data data
+# Commit and push
+git add .
+git commit -m "Initial deployment"
+git push
+```
+#### Option B: Using Web UI (Easier)
+1. In your Space page, click **"Files"** tab
+2. Click **"Add file"** → **"Upload files"**
+3. Upload these files one by one:
+   - `app.py`
+   - `config.py`
+   - `data_loader.py`
+   - `requirements.txt`
+4. Create `data` folder:
+   - Click **"Add file"** → **"Create a new file"**
+   - Name it `data/.gitkeep` (this creates the folder)
+   - Click "Commit"
+5. Upload CSV files to `data/` folder:
+   - Click on `data` folder
+   - Click **"Add file"** → **"Upload files"**
+   - Upload all 4 CSV files
+### Step 3: Update requirements.txt
+Make sure your `requirements.txt` contains:
+```
+gradio>=4.0.0
+plotly>=5.18.0
+pandas>=2.0.0
+fastapi
+uvicorn
+python-multipart
+```
+**Delete these lines if present:**
+- `huggingface-hub<1.0.0` (causes conflicts)
+- Any pydantic version restrictions
+### Step 4: Verify Deployment
+1. After upload, Hugging Face will automatically:
+   - Install dependencies from `requirements.txt`
+   - Run `app.py`
+   - Build the Gradio interface
+2. Wait for build to complete (1-3 minutes)
+   - You'll see logs in the **"Logs"** tab
+   - Look for: "Running on local URL: http://0.0.0.0:7860"
+3. Your app will be live at:
+   ```
+   https://huggingface.co/spaces/YOUR_USERNAME/rag-analytics-dashboard
+   ```
+---
+## ⚙️ Configuration for Hugging Face
+### Update config.py (if needed)
+The app is already configured to use `./data` folder by default, which works on HF Spaces:
+```python
+DATA_FOLDER = os.environ.get("DATA_FOLDER", "./data")
+```
+No changes needed! ✅
+### Environment Variables (Optional)
+If you want to change the data folder location:
+1. Go to Space **Settings** → **Variables and secrets**
+2. Add: `DATA_FOLDER` = `/path/to/data`
+---
+## ✅ Verification Checklist
+After deployment, check:
+1. **App loads successfully**
+   - Visit your Space URL
+   - Should see "RAG Pipeline Analytics" header
+   - Should show "Version: v2.1.0-fixed"
+2. **Data loads automatically**
+   - Status box should show: "Successfully loaded 23 test runs from 4 file(s)"
+   - Should list all 4 CSV files
+3. **Dropdowns populate**
+   - Domain dropdown should show: msmarco, pubmedqa, finqa, cuad
+4. **Graphs display correctly**
+   - Select a domain
+   - RMSE graph should show values like 0.325, 0.200, 0.436
+   - Performance graph should show values like 0.595, 0.513
+   - NOT showing 0.000, 1.000, 2.000
+5. **Inter-domain comparison works**
+   - Click "Generate Comparison" button
+   - Table should show configuration differences
+   - Bar chart should show different F1 scores per domain
+---
+## 🐛 Troubleshooting
+### Build Fails
+- Check **Logs** tab for errors
+- Common issues:
+  - Missing dependencies → Add to `requirements.txt`
+  - Incompatible versions → Use version ranges (>=) not exact (==)
+### App Loads but No Data
+- Verify CSV files are in `data/` folder
+- Check file names match exactly:
+  - `Biomedical-pubmedqa.csv`
+  - `Finance-finqa.csv`
+  - `General-msmarco.csv`
+  - `Legal-cuad.csv`
+### Graphs Show Wrong Values
+- This was the local issue - should NOT happen on HF Spaces
+- HF Spaces runs fresh code every time
+- If it happens, restart the Space: Settings → Factory reboot
+### Out of Memory
+- Upgrade to better hardware tier (Settings → Change hardware)
+- Or reduce data size
+---
+## 🎨 Customization (Optional)
+### Update README.md
+Create a nice README for your Space visitors:
+```markdown
+---
+title: RAG Analytics Dashboard
+emoji: 🧬
+colorFrom: blue
+colorTo: green
+sdk: gradio
+sdk_version: 4.0.0
+app_file: app.py
+pinned: false
+---
+# RAG Pipeline Analytics Dashboard
+Analyzes RAG (Retrieval-Augmented Generation) system performance across multiple domains.
+## Features
+- Intra-domain analysis with filtering
+- Inter-domain comparison
+- Interactive visualizations
+- Supports multiple domains: Biomedical, Finance, Legal, General
+## Usage
+1. Select a domain from the dropdown
+2. Apply filters to compare specific configurations
+3. View RMSE and performance metrics
+4. Compare peak performance across domains
+```
+### Add Space Thumbnail
+- Add `thumbnail.png` or `thumbnail.jpg` to root
+- Recommended size: 1200x630 pixels
+---
+## 📱 Sharing Your Space
+Once deployed, share with:
+```
+https://huggingface.co/spaces/YOUR_USERNAME/rag-analytics-dashboard
+```
+Or embed in websites:
+```html
+<iframe
+  src="https://YOUR_USERNAME-rag-analytics-dashboard.hf.space"
+  frameborder="0"
+  width="100%"
+  height="800"
+></iframe>
+```
+---
+## 🔄 Updating Your Space
+To update after deployment:
+### Via Git:
+```bash
+cd rag-analytics-dashboard
+# Make changes to files
+git add .
+git commit -m "Update: description of changes"
+git push
+```
+### Via Web:
+1. Click on file to edit
+2. Click pencil icon (Edit)
+3. Make changes
+4. Commit changes
+Space will automatically rebuild and deploy! 🚀
+---
+## 💰 Cost
+- **Free tier:** CPU basic (sufficient for this app)
+- **Upgrade options:** If you need faster performance
+  - CPU upgrade: $0.03/hour
+  - GPU T4: $0.60/hour (overkill for this app)
+For this analytics dashboard, **FREE tier is perfectly fine**! ✅

README.md CHANGED Viewed

@@ -1,11 +1,76 @@
 ---
-title: Rag12 Analytics
-emoji: 🐠
 colorFrom: blue
-colorTo: blue
-sdk: docker
 pinned: false
-short_description: rag12 analytics dashboard
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: RAG Analytics Dashboard
 colorFrom: blue
+colorTo: green
+sdk: gradio
+sdk_version: 4.0.0
+app_file: app.py
 pinned: false
+license: apache-2.0
+short_description: Compare RAG system performance across multiple domains
 ---
+# RAG Pipeline Analytics Dashboard
+Interactive dashboard for analyzing RAG (Retrieval-Augmented Generation) system performance across multiple domains.
+## Features
+- **Intra-Domain Analysis:** Compare different RAG configurations within a single domain
+- **Performance Metrics:** RMSE (Relevance, Utilization, Completeness), F1 Score, AUC-ROC
+- **Interactive Filtering:** Filter tests by reranker model, summarization model, and chunking strategy
+- **Inter-Domain Comparison:** Compare peak performance across different domains
+- **Data Preview:** Inspect raw data and configuration parameters
+## Supported Domains
+- **Biomedical** (PubMedQA)
+- **Finance** (FinQA)
+- **General** (MS MARCO)
+- **Legal** (CUAD)
+## Usage
+1. **Load Data:** Click "Load/Refresh Data" to load all test results
+2. **Select Domain:** Choose a domain from the dropdown
+3. **Apply Filters:** Use the filter dropdowns to compare specific configurations
+4. **View Metrics:**
+   - RMSE graph shows relevance, utilization, and completeness (lower is better)
+   - Performance graph shows F1 Score and AUC-ROC (higher is better)
+5. **Compare Domains:** Switch to "Inter-Domain Comparison" tab to see overall best configurations
+## Interpreting Results
+### RMSE Metrics (Lower is Better)
+- **Relevance:** How well retrieved documents match the query
+- **Utilization:** How efficiently the context is used
+- **Completeness:** Coverage of required information
+### Performance Metrics (Higher is Better)
+- **F1 Score:** Balance of precision and recall
+- **AUC-ROC:** Overall classification performance
+## Configuration Parameters
+The dashboard analyzes variations in:
+- Embedding models
+- Reranker models
+- Summarization strategies
+- Chunking strategies
+- Retrieval strategies (Dense, Sparse, Hybrid)
+- Hyperparameters (chunk size, overlap, alpha, top-k)
+## Technology Stack
+- **Framework:** Gradio 4.0+
+- **Visualization:** Plotly Express
+- **Data Processing:** Pandas
+- **Backend:** FastAPI
+## License
+Apache 2.0
+---
+**Version:** v2.1.0-fixed | Built for AIML @ IIIT Hyderabad - TalentSprint

app.py ADDED Viewed

	@@ -0,0 +1,335 @@

+import pandas as pd
+import gradio as gr
+import plotly.express as px
+from fastapi import FastAPI
+from typing import Dict
+from config import METADATA_COLUMNS, DATA_FOLDER
+from data_loader import load_csv_from_folder, get_available_datasets
+app = FastAPI()
+DB: Dict[str, pd.DataFrame] = {}
+# --- 1. DATA PROCESSING FUNCTIONS ---
+def analyze_domain_configs(df_subset):
+    """Separates configuration columns into constants and variables for a domain."""
+    actual_cols = [c for c in df_subset.columns if c not in METADATA_COLUMNS]
+    constants = {}
+    variables = []
+    for col in actual_cols:
+        unique_vals = df_subset[col].astype(str).unique()
+        if len(unique_vals) <= 1:
+            constants[col] = unique_vals[0] if len(unique_vals) > 0 else "N/A"
+        else:
+            variables.append(col)
+    return constants, variables
+def load_data() -> str:
+    """Loads data from the configured data folder."""
+    try:
+        df, status_msg = load_csv_from_folder(DATA_FOLDER)
+        if not df.empty:
+            DB["data"] = df
+        return status_msg
+    except Exception as e:
+        return f"Error loading data: {str(e)}"
+# --- 2. UI LOGIC ---
+def get_dataset_choices():
+    """Safely retrieves dataset choices for dropdown."""
+    try:
+        if "data" in DB and not DB["data"].empty:
+            return get_available_datasets(DB["data"])
+        return []
+    except Exception as e:
+        print(f"Error getting dataset choices: {e}")
+        return []
+def get_data_preview():
+    """Returns the raw dataframe for inspection"""
+    if "data" not in DB:
+        return pd.DataFrame()
+    return DB["data"].head(10)
+def get_domain_state(dataset):
+    empty_update = gr.update(visible=False, value=None, choices=[])
+    if "data" not in DB:
+        return "", empty_update, empty_update, empty_update
+    df = DB["data"]
+    subset = df[df['dataset_name'] == dataset]
+    if subset.empty:
+        return "No data for this domain.", empty_update, empty_update, empty_update
+    consts, vars_list = analyze_domain_configs(subset)
+    const_text = "CONSTANTS (Fixed for this domain):\n" + "\n".join([f"{k}: {v}" for k,v in consts.items()])
+    updates = []
+    for i in range(3):
+        if i < len(vars_list):
+            col_name = vars_list[i]
+            unique_choices = list(subset[col_name].astype(str).unique())
+            unique_choices.insert(0, "All")
+            updates.append(gr.update(
+                label=f"Filter by {col_name}",
+                choices=unique_choices,
+                value="All",
+                visible=True,
+                interactive=True
+            ))
+        else:
+            updates.append(empty_update)
+    return const_text, updates[0], updates[1], updates[2]
+def plot_metrics_on_x_axis(dataset, f1_val, f2_val, f3_val):
+    """Generates RMSE and Performance metric plots for selected domain and filters."""
+    if "data" not in DB or not dataset:
+        return None, None
+    try:
+        df = DB["data"]
+        subset = df[df['dataset_name'] == dataset].copy()
+    except Exception as e:
+        print(f"Error accessing data: {e}")
+        return None, None
+    # Filter Logic
+    _, vars_list = analyze_domain_configs(subset)
+    filters = [f1_val, f2_val, f3_val]
+    for i, val in enumerate(filters):
+        if i < len(vars_list) and val != "All" and val is not None:
+            col = vars_list[i]
+            subset = subset[subset[col].astype(str) == str(val)].copy()  # Explicit copy
+    if subset.empty:
+        return None, None
+    # Reset index to avoid any index-related issues
+    subset = subset.reset_index(drop=True)
+    # Create Legend Label
+    # Ensure test_id is string to prevent errors
+    subset['Legend'] = "Test " + subset['test_id'].astype(str) + ": " + subset['config_purpose'].astype(str)
+    # --- PLOT 1: RMSE ---
+    # Check if columns exist before melting
+    rmse_cols = ['rmse_relevance', 'rmse_utilization', 'rmse_completeness']
+    available_rmse = [c for c in rmse_cols if c in subset.columns]
+    if available_rmse:
+        rmse_melted = subset.melt(
+            id_vars=['Legend', 'test_id'],
+            value_vars=available_rmse,
+            var_name='Metric Name',
+            value_name='Score'
+        )
+        # Explicitly ensure Score is numeric float
+        rmse_melted['Score'] = pd.to_numeric(rmse_melted['Score'], errors='coerce').fillna(0.0).astype(float)
+        rmse_melted['Metric Name'] = rmse_melted['Metric Name'].str.replace('rmse_', '').str.capitalize()
+        rmse_melted = rmse_melted.reset_index(drop=True)
+        # DEBUG: Print to verify values
+        print(f"[DEBUG] RMSE melted data - Score range: {rmse_melted['Score'].min():.4f} to {rmse_melted['Score'].max():.4f}")
+        print(f"[DEBUG] Sample scores: {rmse_melted['Score'].head(6).tolist()}")
+        fig_rmse = px.bar(
+            rmse_melted,
+            x="Metric Name",
+            y="Score",
+            color="Legend",
+            barmode="group",
+            title=f"RMSE Breakdown (Lower is Better) - {len(subset)} Tests",
+            text_auto='.3f'
+        )
+        fig_rmse.update_traces(textposition='outside')
+        fig_rmse.update_layout(legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1))
+    else:
+        fig_rmse = None
+    # --- PLOT 2: Performance ---
+    perf_cols = ['f1_score', 'aucroc']
+    available_perf = [c for c in perf_cols if c in subset.columns]
+    if available_perf:
+        perf_melted = subset.melt(
+            id_vars=['Legend', 'test_id'],
+            value_vars=available_perf,
+            var_name='Metric Name',
+            value_name='Score'
+        )
+        # Explicitly ensure Score is numeric float
+        perf_melted['Score'] = pd.to_numeric(perf_melted['Score'], errors='coerce').fillna(0.0).astype(float)
+        perf_melted['Metric Name'] = perf_melted['Metric Name'].replace({
+            'f1_score': 'F1 Score', 'aucroc': 'AUC-ROC'
+        })
+        perf_melted = perf_melted.reset_index(drop=True)
+        # DEBUG: Print to verify values
+        print(f"[DEBUG] Performance melted data - Score range: {perf_melted['Score'].min():.4f} to {perf_melted['Score'].max():.4f}")
+        print(f"[DEBUG] Sample scores: {perf_melted['Score'].head(6).tolist()}")
+        fig_perf = px.bar(
+            perf_melted,
+            x="Metric Name",
+            y="Score",
+            color="Legend",
+            barmode="group",
+            title=f"Performance Metrics (Higher is Better) - {len(subset)} Tests",
+            text_auto='.3f'
+        )
+        fig_perf.update_traces(textposition='outside')
+        fig_perf.update_layout(legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1))
+    else:
+        fig_perf = None
+    return fig_rmse, fig_perf
+def generate_inter_domain_comparison():
+    """Generates comparison table and plot across all domains."""
+    if "data" not in DB:
+        return pd.DataFrame(), None
+    try:
+        df = DB["data"]
+    except Exception as e:
+        print(f"Error accessing data: {e}")
+        return pd.DataFrame(), None
+    datasets = df['dataset_name'].unique()
+    all_keys = set()
+    domain_constants = {}
+    for ds in datasets:
+        subset = df[df['dataset_name'] == ds]
+        consts, _ = analyze_domain_configs(subset)
+        domain_constants[ds] = consts
+        all_keys.update(consts.keys())
+    table_rows = []
+    for key in sorted(list(all_keys)):
+        row = {"Configuration Parameter": key}
+        for ds in datasets:
+            val = domain_constants[ds].get(key, "Variable")
+            row[ds] = val
+        table_rows.append(row)
+    comp_df = pd.DataFrame(table_rows)
+    best_results = []
+    for ds in datasets:
+        subset = df[df['dataset_name'] == ds]
+        if 'f1_score' in subset.columns:
+            max_f1 = subset['f1_score'].max()
+            best_idx = subset['f1_score'].idxmax()
+            best_row = subset.loc[best_idx]
+            best_results.append({
+                "Domain": ds,
+                "Max F1 Score": max_f1,
+                "Best Config": best_row['config_purpose']
+            })
+    if best_results:
+        best_df = pd.DataFrame(best_results)
+        fig_global = px.bar(
+            best_df, x="Domain", y="Max F1 Score",
+            color="Domain",
+            text_auto='.4f',
+            hover_data=["Best Config"],
+            title="Peak Performance per Domain (Max F1 Score)"
+        )
+        fig_global.update_traces(textposition='outside')
+    else:
+        fig_global = None
+    return comp_df, fig_global
+# --- 3. UI ---
+APP_VERSION = "v2.1.0-fixed"  # Version stamp to verify code is updated
+with gr.Blocks(title="RAG Analytics Pro", theme=gr.themes.Soft()) as demo:
+    gr.Markdown("## RAG Pipeline Analytics")
+    gr.Markdown(f"**Data Source:** `{DATA_FOLDER}` | **Version:** {APP_VERSION}")
+    with gr.Row():
+        refresh_data_btn = gr.Button("Load/Refresh Data", variant="primary")
+        status = gr.Textbox(label="Status (Check here for debug info)", interactive=False, scale=3)
+    with gr.Tabs():
+        # TAB 1: Main Analytics
+        with gr.TabItem("Intra-Domain Analysis"):
+            with gr.Row():
+                with gr.Column(scale=1):
+                    ds_dropdown = gr.Dropdown(label="1. Select Domain", choices=[], interactive=True)
+                    constants_box = gr.Textbox(label="Domain Constants", lines=5, interactive=False)
+                    gr.Markdown("### Filter Tests")
+                    filter_1 = gr.Dropdown(visible=False)
+                    filter_2 = gr.Dropdown(visible=False)
+                    filter_3 = gr.Dropdown(visible=False)
+                with gr.Column(scale=3):
+                    plot_r = gr.Plot(label="RMSE Comparison")
+                    plot_p = gr.Plot(label="Performance Comparison")
+        # TAB 2: Data Inspector
+        with gr.TabItem("Data Preview"):
+            gr.Markdown("### Verify your data loaded correctly here")
+            preview_table = gr.Dataframe(interactive=False)
+            preview_btn = gr.Button("Refresh Data Preview")
+        # TAB 3: Comparison
+        with gr.TabItem("Inter-Domain Comparison"):
+            refresh_btn = gr.Button("Generate Comparison")
+            gr.Markdown("### Configuration Differences")
+            comp_table = gr.Dataframe(interactive=False)
+            gr.Markdown("### Peak Performance")
+            global_plot = gr.Plot()
+    # EVENTS
+    refresh_data_btn.click(
+        load_data, inputs=None, outputs=[status]
+    ).then(
+        lambda: gr.Dropdown(choices=get_dataset_choices()),
+        outputs=[ds_dropdown]
+    )
+    ds_dropdown.change(
+        get_domain_state,
+        inputs=[ds_dropdown],
+        outputs=[constants_box, filter_1, filter_2, filter_3]
+    ).then(
+        plot_metrics_on_x_axis,
+        inputs=[ds_dropdown, filter_1, filter_2, filter_3],
+        outputs=[plot_r, plot_p]
+    )
+    gr.on(
+        triggers=[filter_1.change, filter_2.change, filter_3.change],
+        fn=plot_metrics_on_x_axis,
+        inputs=[ds_dropdown, filter_1, filter_2, filter_3],
+        outputs=[plot_r, plot_p]
+    )
+    # Debug Preview Events
+    preview_btn.click(get_data_preview, inputs=None, outputs=preview_table)
+    refresh_btn.click(
+        generate_inter_domain_comparison,
+        inputs=None,
+        outputs=[comp_table, global_plot]
+    )
+# Auto-load data on startup
+print(f"Loading data from {DATA_FOLDER}...")
+startup_status = load_data()
+print(startup_status)
+app = gr.mount_gradio_app(app, demo, path="/")

config.py ADDED Viewed

	@@ -0,0 +1,78 @@

+"""
+Configuration file for RAG Analytics Application
+"""
+import os
+# Data folder configuration
+DATA_FOLDER = os.environ.get("DATA_FOLDER", "./data")
+# Required columns after normalization
+REQUIRED_COLUMNS = {
+    'test_id',
+    'config_purpose',
+    'dataset_name'
+}
+# Metric columns that need numeric conversion to float
+METRIC_COLUMNS = [
+    'rmse_relevance',
+    'rmse_utilization',
+    'rmse_completeness',
+    'f1_score',
+    'aucroc',
+    'failed_samples'
+]
+# Numeric configuration columns (also need float conversion)
+NUMERIC_CONFIG_COLUMNS = [
+    'chunk_size',
+    'overlap',
+    'stride',
+    'alpha',
+    'retr_k',
+    'final_k',
+    'summ_max',
+    'summ_min',
+    'test_id'
+]
+# Column mapping for normalization
+COLUMN_MAP = {
+    'test': 'test_id',
+    'configurationpurpose': 'config_purpose',
+    'subsets': 'dataset_name',
+    'embeddingmodel': 'embedding_model',
+    'rerankermodel': 'reranker_model',
+    'summarizationmodel': 'summarization_model',
+    'chunkingstrategy': 'chunking_strategy',
+    'chunksize': 'chunk_size',
+    'overlap': 'overlap',
+    'stride': 'stride',
+    'retreivalstrategy': 'retrieval_strategy',
+    'retrievalstrategy': 'retrieval_strategy',  # Catch typo
+    'alpha': 'alpha',
+    'retrk': 'retr_k',
+    'finalk': 'final_k',
+    'repacking': 'repacking',
+    'summmax': 'summ_max',
+    'summmin': 'summ_min',
+    '8bgptlabel': 'gpt_label',
+    # Metrics
+    'rmsetracerelevance': 'rmse_relevance',
+    'rmsetraceutilization': 'rmse_utilization',
+    'rmsetracecompleteness': 'rmse_completeness',
+    'aucroc': 'aucroc',
+    'f1score': 'f1_score',
+    'failedtotalsamples': 'failed_samples'
+}
+# Metadata columns (excluded from constant/variable analysis)
+METADATA_COLUMNS = [
+    'rmse_relevance', 'rmse_utilization', 'rmse_completeness',
+    'aucroc', 'f1_score', 'failed_samples',
+    'test_id', 'config_purpose', 'dataset_name'
+]
+# Debug mode
+DEBUG = True

data/Biomedical-pubmedqa.csv ADDED Viewed

	@@ -0,0 +1,7 @@

+Test #,Configuration purpose,Subset(s),Embedding Model,Reranker Model,Summarization Model,Chunking Strategy,Chunk size,Overlap,Stride,Retreival Strategy,Alpha,Retr. K,Final K,Repacking,Summ. Max,Summ. Min,8B GPT Label,RMSE=trace relevance,RMSE=trace utilization,RMSE=trace completeness,AUCROC,F1-score,# Failed/Total Samples
+1,Efficiency Baseline,pubmedqa,NeuML/pubmedbert-base-embeddings,cross-encoder/ms-marco-MiniLM-L-6-v2,None,hard_cut,256,50,N/A,Hybrid,0.8,50,5,forward,N/A,N/A,short,0.3677,0.3011,0.5556,0.604,0.5049,32/100
+2,Chunking Proof,pubmedqa,NeuML/pubmedbert-base-embeddings,cross-encoder/ms-marco-MiniLM-L-6-v2,None,sliding_window,256,50,206,Hybrid,0.8,50,5,forward,N/A,N/A,short,0.3632,0.2886,0.5074,0.604,0.5049,29/100
+3,Reranking proof,pubmedqa,NeuML/pubmedbert-base-embeddings,BAAI/bge-reranker-base,None,sliding_window,256,50,206,Hybrid,0.8,50,5,forward,N/A,N/A,long,0.3289,0.2663,0.6015,0.482,0.38,8/100
+4,Repacking proof,pubmedqa,NeuML/pubmedbert-base-embeddings,BAAI/bge-reranker-base,None,hard_cut,256,50,206,Hybrid,0.8,50,5,reverse,N/A,N/A,long,0.2752,0.252,0.6246,0.5951,0.449,8/100
+5,Prove Summarization,pubmedqa,NeuML/pubmedbert-base-embeddings,BAAI/bge-reranker-base,fangyuan/nq_abstractive_compressor,sliding_window,256,50,206,Hybrid,0.8,50,5,reverse,150,20,long_cot,0.4934,1.0537,0.5161,cannot compute,0,9/100
+6,Optimal Medical Hybrid,pubmedqa,NeuML/pubmedbert-base-embeddings,BAAI/bge-reranker-base,N/A,sliding_window,256,50,206,Hybrid,0.8,50,5,reverse,N/A,N/A,long,0.3223,0.2733,0.6561,0.5053,0.3542,13/100

data/Finance-finqa.csv ADDED Viewed

	@@ -0,0 +1,7 @@

+Test #,Configuration purpose,Subset(s),Embedding Model,Reranker Model,Summarization Model,Chunking Strategy,Chunk size,Overlap,Stride,Retreival Strategy,Alpha,Retr. K,Final K,Repacking,Summ. Max,Summ. Min,8B GPT Label,RMSE=trace relevance,RMSE=trace utilization,RMSE=trace completeness,AUCROC,F1-score,# Failed/Total Samples
+1,Efficiency Baseline,finqa,BAAI/bge-m3,cross-encoder/ms-marco-MiniLM-L-6-v2,N/A,hard-cut,512,50,N/A,Hybrid,0.6,50,5,forward,N/A,N/A,short,0.1409,0.0831,0.6365,0.4263,0.099,18/100
+2,Prove Chunking,finqa,BAAI/bge-m3,cross-encoder/ms-marco-MiniLM-L-6-v2,N/A,sliding-window,512,100,412,Hybrid,0.6,50,5,forward,N/A,N/A,short,0.1667,0.1188,0.6431,0.4316,0.1176,23/100
+3,Prove Hybrid/Rerank,finqa,BAAI/bge-m3,BAAI/bge-reranker-v2-m3,N/A,sliding-window,512,100,412,Hybrid,0.6,50,5,forward,N/A,N/A,long,0.1316,0.0693,0.6763,0.4263,0.099,11/100
+4,Max Raw Context,finqa,BAAI/bge-m3,BAAI/bge-reranker-v2-m3,N/A,hard-cut,512,100,412,Hybrid,0.6,50,5,reverse,N/A,N/A,long,0.1947,0.0795,0.7239,0.4316,0.1176,14/100
+5,Golden Setup,finqa,BAAI/bge-m3,BAAI/bge-reranker-v2-m3,fangyuan/nq_abstractive_compressor,sliding-window,512,100,412,Hybrid,0.6,50,5,reverse,200,20,long_cot,0.4158,0.8363,0.7073,Cannot compute (insufficient class variance),0,4/100
+6,Optimized Financial,finqa,BAAI/bge-m3,BAAI/bge-reranker-v2-m3,N/A,sliding-window,512,100,412,Hybrid,0.8,50,3,forward,N/A,N/A,long,0.2468,0.1679,0.6177,0.5474,0.1731,6/100

data/General-msmarco.csv ADDED Viewed

	@@ -0,0 +1,7 @@

+Test #,Configuration purpose,Subset(s),Embedding Model,Reranker Model,Summarization Model,Chunking Strategy,Chunk size,Overlap,Stride,Retreival Strategy,Alpha,Retr. K,Final K,Repacking,Summ. Max,Summ. Min,8B GPT Label,RMSE=trace relevance,RMSE=trace utilization,RMSE=trace completeness,AUCROC,F1-score,# Failed/Total Samples
+1,Efficiency Baseline,msmarco,BAAI/bge-base-en-v1.5,cross-encoder/ms-marco-MiniLM-L-6-v2,N/A,hard-cut,256,50,N/A,Hybrid,0.8,50,5,forward,N/A,N/A,short,0.3252,0.1998,0.4362,0.5125,0.5954,23/100
+2,Prove Chunking,msmarco,BAAI/bge-base-en-v1.5,cross-encoder/ms-marco-MiniLM-L-6-v2,N/A,sliding-window,256,50,206,Hybrid,0.8,50,5,forward,N/A,N/A,short,0.3449,0.1947,0.4248,0.495,0.5625,30/100
+3,Prove Hybrid/Rerank,msmarco,BAAI/bge-base-en-v1.5,BAAI/bge-reranker-base,N/A,sliding-window,256,50,206,Hybrid,0.8,50,5,forward,N/A,N/A,short,0.3183,0.1793,0.407,0.5183,0.6061,22/100
+4,Prove Repacking,msmarco,BAAI/bge-base-en-v1.5,BAAI/bge-reranker-base,N/A,hard-cut,256,50,206,Hybrid,0.8,50,5,reverse,N/A,N/A,long,0.3416,0.1837,0.4491,0.559,0.6763,11/100
+5,Prove Summarization,msmarco,BAAI/bge-base-en-v1.5,BAAI/bge-reranker-base,fangyuan/nq_abstractive_compressor,sliding-window,256,50,206,Hybrid,0.8,50,5,reverse,150,20,long_cot,0.5066,0.8781,0.5049,N/A,0,3/100
+6,Optimized Hybrid,msmarco,BAAI/bge-base-en-v1.5,BAAI/bge-reranker-base,N/A,hard-cut,256,50,206,Hybrid,0.8,50,5,forward,N/A,N/A,long,0.3292,0.1754,0.5477,0.4842,0.4706,0/100

data/Legal-cuad.csv ADDED Viewed

	@@ -0,0 +1,6 @@

+Test #,Configuration purpose,Subset(s),Embedding Model,Reranker Model,Summarization Model,Chunking Strategy,Chunk size,Overlap,Stride,Retreival Strategy,Alpha,Retr. K,Final K,Repacking,Summ. Max,Summ. Min,8B GPT Label,RMSE=trace relevance,RMSE=trace utilization,RMSE=trace completeness,AUCROC,F1-score,% Failed Sample
+1,Efficiency Baseline,cuad,BAAI/bge-m3,cross-encoder/ms-marco-MiniLM-L-6-v2,N/A,hard-cut/ token aware chunking ,512,100,N/A,Hybrid,0.6,50,5,forward,N/A,N/A,short,0.2951,0.1697,0.6225,0.4321,0.3761,35.00%
+2,Prove Chunking,cuad,BAAI/bge-m3,cross-encoder/ms-marco-MiniLM-L-6-v2,N/A,sliding-window,512,100,412,Hybrid,0.6,50,5,forward,N/A,N/A,short,0.2927,0.1623,0.5612,0.4065,0.2609,32.00%
+3,Prove Hybrid/Rerank,cuad,BAAI/bge-m3,BAAI/bge-reranker-v2-m3,N/A,sliding-window,512,100,412,Hybrid,0.6,50,5,forward,N/A,N/A,long,0.3087,0.1296,0.5315,0.5197,0.5543,15.00%
+4,Max Raw Context,cuad,BAAI/bge-m3,BAAI/bge-reranker-v2-m3,N/A,hard-cut/ token aware chunking ,512,100,412,Hybrid,0.6,50,5,reverse,N/A,N/A,long,0.3287,0.1429,0.6583,0.4132,0.3859,17.00%
+5,Golden Setup,cuad,BAAI/bge-m3,BAAI/bge-reranker-v2-m3,fangyuan/nq_abstractive_compressor,sliding-window,512,100,412,Hybrid,0.6,50,5,reverse,250,50,long_cot,0.5048,0.7648,0.4832,0.0215,0.5054,17.00%

data_loader.py ADDED Viewed

	@@ -0,0 +1,160 @@

+"""
+Data loading and processing module for RAG Analytics
+"""
+import pandas as pd
+import os
+from pathlib import Path
+from typing import Tuple, List
+from config import DATA_FOLDER, COLUMN_MAP, METRIC_COLUMNS, NUMERIC_CONFIG_COLUMNS, REQUIRED_COLUMNS, DEBUG
+def normalize_dataframe(df: pd.DataFrame) -> pd.DataFrame:
+    """
+    1. Renames columns by stripping special chars (spaces, =, -).
+    2. Forces metric columns to numeric (floats).
+    3. Retains all data without schema validation dropping rows.
+    Args:
+        df: Raw dataframe loaded from CSV
+    Returns:
+        Normalized dataframe with standardized column names and types
+    """
+    rename_dict = {}
+    for col in df.columns:
+        # Aggressive clean: "RMSE=trace relevance" -> "rmsetracerelevance"
+        # Remove spaces, underscores, hyphens, equals signs
+        clean_col = "".join(ch for ch in str(col).lower() if ch.isalnum())
+        if clean_col in COLUMN_MAP:
+            rename_dict[col] = COLUMN_MAP[clean_col]
+    df = df.rename(columns=rename_dict)
+    # Force ALL metric columns to float64 (Coerce errors to NaN then 0.0)
+    # This ensures "Empty" strings or invalid values don't crash the graph
+    # Using astype(float) explicitly ensures floating-point display
+    for metric in METRIC_COLUMNS:
+        if metric in df.columns:
+            df[metric] = pd.to_numeric(df[metric], errors='coerce').fillna(0.0).astype(float)
+    # Force ALL numeric configuration columns to float64
+    # This prevents integers like "256" from displaying as integers in graphs
+    for config_col in NUMERIC_CONFIG_COLUMNS:
+        if config_col in df.columns:
+            # Convert to numeric, but preserve N/A as NaN (don't fill)
+            df[config_col] = pd.to_numeric(df[config_col], errors='coerce').astype(float)
+    return df
+def validate_dataframe(df: pd.DataFrame) -> Tuple[bool, str]:
+    """
+    Validates that the dataframe has required columns.
+    Args:
+        df: Dataframe to validate
+    Returns:
+        Tuple of (is_valid, error_message)
+    """
+    missing_cols = REQUIRED_COLUMNS - set(df.columns)
+    if missing_cols:
+        return False, f"Missing required columns: {', '.join(missing_cols)}"
+    if df.empty:
+        return False, "Dataframe is empty"
+    return True, "Valid"
+def load_csv_from_folder(folder_path: str = None) -> Tuple[pd.DataFrame, str]:
+    """
+    Loads all CSV files from the specified folder and combines them.
+    Args:
+        folder_path: Path to folder containing CSV files. If None, uses DATA_FOLDER from config.
+    Returns:
+        Tuple of (combined_dataframe, status_message)
+    """
+    if folder_path is None:
+        folder_path = DATA_FOLDER
+    folder = Path(folder_path)
+    if not folder.exists():
+        return pd.DataFrame(), f"Error: Data folder '{folder_path}' does not exist."
+    if not folder.is_dir():
+        return pd.DataFrame(), f"Error: '{folder_path}' is not a directory."
+    # Find all CSV files
+    csv_files = list(folder.glob("*.csv"))
+    if not csv_files:
+        return pd.DataFrame(), f"Error: No CSV files found in '{folder_path}'."
+    all_dfs = []
+    loaded_files = []
+    errors = []
+    for csv_file in csv_files:
+        try:
+            # Load raw CSV
+            df_raw = pd.read_csv(csv_file, encoding='utf-8-sig')
+            # Normalize column names and types
+            df_clean = normalize_dataframe(df_raw)
+            # Validate
+            is_valid, error_msg = validate_dataframe(df_clean)
+            if not is_valid:
+                errors.append(f"{csv_file.name}: {error_msg}")
+                continue
+            all_dfs.append(df_clean)
+            loaded_files.append(csv_file.name)
+        except Exception as e:
+            errors.append(f"{csv_file.name}: {str(e)}")
+    if not all_dfs:
+        error_summary = "\n".join(errors) if errors else "Unknown error"
+        return pd.DataFrame(), f"Error: Failed to load any valid CSV files.\n{error_summary}"
+    # Combine all dataframes
+    final_df = pd.concat(all_dfs, ignore_index=True)
+    # Build status message
+    status_parts = [f"Successfully loaded {len(final_df)} test runs from {len(loaded_files)} file(s):"]
+    status_parts.extend([f"  • {fname}" for fname in loaded_files])
+    if errors:
+        status_parts.append(f"\n{len(errors)} file(s) skipped due to errors:")
+        status_parts.extend([f"  • {err}" for err in errors])
+    # Add debug info if enabled
+    if DEBUG and not final_df.empty:
+        sample = final_df.iloc[0]
+        debug_info = f"\nDEBUG (Row 1): Relevance={sample.get('rmse_relevance', 'N/A')}, F1={sample.get('f1_score', 'N/A')}, AUCROC={sample.get('aucroc', 'N/A')}"
+        status_parts.append(debug_info)
+    return final_df, "\n".join(status_parts)
+def get_available_datasets(df: pd.DataFrame) -> List[str]:
+    """
+    Extracts unique dataset names from the dataframe.
+    Args:
+        df: Dataframe containing dataset_name column
+    Returns:
+        List of unique dataset names
+    """
+    if df.empty or 'dataset_name' not in df.columns:
+        return []
+    return sorted(df['dataset_name'].unique().tolist())

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+gradio>=4.44.0
+plotly>=5.18.0
+pandas>=2.0.0
+fastapi
+uvicorn
+python-multipart