Spaces:

MCP-1st-Birthday
/

dataviz

Sleeping

App Files Files Community

Медведев Андрей Васильевич commited on Nov 25, 2025

Commit

ac776ac

0 Parent(s):

init commit

Browse files

Files changed (10) hide show

.gitignore +34 -0
LICENSE +21 -0
README.md +66 -0
agent/agent.py +172 -0
app.py +11 -0
mcp_tools/client.py +52 -0
mcp_tools/server.py +136 -0
requirements.txt +11 -0
run.bat +5 -0
ui/app.py +438 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,34 @@

+# Python
+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+.Python
+env/
+venv/
+.env
+.env.*
+.venv
+pip-log.txt
+pip-delete-this-directory.txt
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.log
+.pytest_cache/
+.mypy_cache/
+# VS Code
+.vscode/
+# Project specific
+*.parquet
+*.png
+output.png
+# Logs
+dataviz_agent.log

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2025 DataViz Agent
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md ADDED Viewed

	@@ -0,0 +1,66 @@

+# 📊 DataViz Agent — MCP-Powered Data Analyst
+**DataViz Agent** is an intelligent data analyst that turns your CSV/Excel files into beautiful charts through conversation.
+The project demonstrates the power of **Model Context Protocol (MCP)**: The UI communicates with an isolated tool server via a standard protocol, ensuring security and flexibility.
+🚀 **Demo for Hugging Face Hackathon**
+---
+## ✨ Key Features
+*   **🗣️ Chat with Data**: Just ask "Plot a histogram of age" or "Show correlation between salary and experience".
+*   **🛡️ Sandboxed Execution**: Chart generation code runs in isolated temporary processes. Direct system access is blocked.
+*   **🔌 MCP Architecture**: The application is split into Client (UI) and Server (Tools), communicating via the MCP standard (Stdio).
+*   **📈 Interactive Gallery**: All charts are saved, have IDs, and can be modified ("Make chart #2 green").
+*   **📦 Export**: Download charts as an archive (ZIP) or a ready-made report (Word).
+---
+## 🛠 Tech Stack
+*   **UI**: Gradio (Async)
+*   **LLM**: Gemini 2.0 Flash (via Google GenAI)
+*   **Protocol**: Model Context Protocol (MCP) Python SDK
+*   **Data**: Pandas, Matplotlib, Seaborn
+*   **Security**: `tempfile` isolation, `ast` validation, `matplotlib` Agg backend
+---
+## 🚀 How to Run
+### Locally
+1.  Clone the repository.
+2.  Create a `.env` file with your key: `GEMINI_API_KEY=your_key`
+3.  Install dependencies:
+    ```bash
+    pip install -r requirements.txt
+    ```
+4.  Run the application:
+    ```bash
+    python app.py
+    ```
+### Hugging Face Spaces
+The project is fully ready for deployment on HF Spaces (SDK: Gradio).
+Just add `GEMINI_API_KEY` to secrets (Settings -> Variables and secrets).
+---
+## 📂 Project Structure
+```text
+/
+├── app.py                 # Entry point (for HF Spaces)
+├── agent/                 # LLM Agent logic
+├── mcp_tools/
+│   ├── server.py          # MCP Server (visualization tools)
+│   └── client.py          # MCP Client (connection to server)
+├── ui/                    # Gradio Interface
+└── requirements.txt       # Dependencies
+```

agent/agent.py ADDED Viewed

	@@ -0,0 +1,172 @@

+import os
+import google.generativeai as genai
+from dotenv import load_dotenv
+import re
+import logging
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# Load environment variables
+load_dotenv()
+# Configure Gemini
+api_key = os.getenv("GEMINI_API_KEY")
+if api_key:
+    genai.configure(api_key=api_key)
+class DataVizAgent:
+    def __init__(self):
+        if not api_key:
+            raise ValueError("GEMINI_API_KEY not found. Please check your .env file.")
+        self.model = genai.GenerativeModel('gemini-2.0-flash') # Using a fast model
+    def generate_plot_code(self, user_query, columns_summary, history=None, existing_code=None):
+        """
+        Generates Python code for plotting based on user query and dataset summary.
+        Can also respond conversationally without generating code.
+        Args:
+            user_query: User's message
+            columns_summary: Dataset column information
+            history: Chat history for context (list of dicts with 'role' and 'content')
+            existing_code: Code from existing chart to modify
+        Returns:
+            dict with 'type' ('code' or 'message') and 'content'
+        """
+        summary_str = "Dataset Columns:\n"
+        for col in columns_summary.get("columns", []):
+            summary_str += f"- {col['name']} ({col['type']})"
+            if col.get('is_numeric') and col.get('min') is not None:
+                summary_str += f", range: [{col.get('min')}, {col.get('max')}]"
+            summary_str += f", unique values: {col.get('unique_values')}\n"
+        system_prompt = f"""
+You are an expert Data Visualization Assistant. You have access to a pandas DataFrame named `df`.
+{summary_str}
+YOUR CAPABILITIES:
+1. **Conversational Mode**: Answer questions, provide suggestions, explain concepts about data visualization
+2. **Code Generation Mode**: Generate Python code for creating visualizations
+WHEN TO USE EACH MODE:
+- Use CONVERSATIONAL mode when user:
+  * Asks for suggestions or advice (e.g., "What graphs can I build?", "What would you recommend?")
+  * Asks questions about the data (e.g., "What columns do I have?")
+  * Wants explanations or discussions
+  * Greets you or makes small talk
+- Use CODE GENERATION mode when user:
+  * Explicitly requests a visualization (e.g., "Create a histogram", "Show distribution", "Plot X vs Y")
+  * Asks to modify an existing chart
+  * Uses visualization-related verbs (plot, show, draw, create, build, visualize)
+CODE GENERATION RULES (only when generating code):
+1. The DataFrame `df` is ALREADY LOADED and available with the columns listed above.
+2. Import pandas as pd, matplotlib.pyplot as plt, and seaborn as sns at the start of your code.
+3. MANDATORY: Save the plot to a file named 'plot.png' in the current directory:
+   ```python
+   plt.savefig('plot.png')
+   ```
+4. Do NOT use `plt.show()`.
+5. Create clear plots with proper titles, labels, and legends.
+6. Handle potential NaN or missing values appropriately.
+7. If modifying an existing chart, I will provide the existing code - update it to match the new request.
+OUTPUT FORMAT:
+- For CONVERSATIONAL responses: Reply naturally in plain text, no code blocks
+- For CODE GENERATION: Output ONLY Python code wrapped in ```python ... ``` markdown block
+EXAMPLES:
+User: "What visualizations would you suggest for this data?"
+Assistant: "Based on your dataset, here are some visualization ideas:
+1. Distribution plots for numerical columns like [column names]
+2. Count plots for categorical data
+3. Correlation heatmaps if you want to see relationships between variables
+4. Scatter plots to explore relationships between specific pairs of columns
+What interests you most?"
+User: "Create a histogram of age"
+Assistant: ```python
+import pandas as pd
+import matplotlib.pyplot as plt
+import seaborn as sns
+plt.figure(figsize=(10, 6))
+plt.hist(df['age'].dropna(), bins=30, edgecolor='black')
+plt.title('Distribution of Age')
+plt.xlabel('Age')
+plt.ylabel('Frequency')
+plt.grid(alpha=0.3)
+plt.savefig('plot.png')
+```
+"""
+        messages = []
+        # Add chat history for context
+        if history:
+            for msg in history:
+                role = "user" if msg["role"] == "user" else "model"
+                messages.append({"role": role, "parts": [msg["content"]]})
+        # Add system prompt and current query
+        if existing_code:
+            current_message = f"{system_prompt}\n\nExisting Code:\n```python\n{existing_code}\n```\n\nUser Request: {user_query}"
+        else:
+            current_message = f"{system_prompt}\n\nUser Request: {user_query}"
+        messages.append({"role": "user", "parts": [current_message]})
+        try:
+            logger.info(f"Generating response for: {user_query}")
+            response = self.model.generate_content(messages)
+            response_text = response.text
+            # Check if response contains code
+            if "```python" in response_text or "```\n" in response_text:
+                code = self._extract_code(response_text)
+                logger.info("Generated code response")
+                return {"type": "code", "content": code}
+            else:
+                logger.info("Generated conversational response")
+                return {"type": "message", "content": response_text}
+        except Exception as e:
+            logger.error(f"Error generating response: {str(e)}")
+            return {"type": "error", "content": f"Error generating response: {str(e)}"}
+    def _extract_code(self, text):
+        """
+        Extracts python code from markdown code blocks.
+        """
+        match = re.search(r'```python\n(.*?)\n```', text, re.DOTALL)
+        if match:
+            return match.group(1)
+        # Fallback: try finding any code block
+        match = re.search(r'```\n(.*?)\n```', text, re.DOTALL)
+        if match:
+            return match.group(1)
+        return text # Return raw text if no code block found (might be an error message or direct code)
+    def describe_chart(self, user_query, code):
+        """
+        Generates a short description/title for the chart.
+        """
+        prompt = f"""
+        Based on the user query: "{user_query}" and the generated code, provide a short, descriptive title for this chart (max 10 words).
+        Code:
+        {code}
+        """
+        try:
+            response = self.model.generate_content(prompt)
+            return response.text.strip()
+        except:
+            return "Chart"

app.py ADDED Viewed

	@@ -0,0 +1,11 @@

+import os
+import sys
+# Add current directory to path so we can import from agent, mcp_tools, ui
+sys.path.append(os.path.dirname(os.path.abspath(__file__)))
+# Import the demo object from ui/app.py
+from ui.app import demo
+if __name__ == "__main__":
+    demo.launch()

mcp_tools/client.py ADDED Viewed

	@@ -0,0 +1,52 @@

+import asyncio
+import os
+import sys
+import json
+from contextlib import asynccontextmanager
+from mcp import ClientSession, StdioServerParameters
+from mcp.client.stdio import stdio_client
+class DataVizClient:
+    def __init__(self):
+        # Determine the path to the server script
+        current_dir = os.path.dirname(os.path.abspath(__file__))
+        self.server_script = os.path.join(current_dir, "server.py")
+        # Server launch parameters (python mcp_tools/server.py)
+        self.server_params = StdioServerParameters(
+            command=sys.executable, # Use the same python as the main app
+            args=[self.server_script],
+            env=None
+        )
+    @asynccontextmanager
+    async def connect(self):
+        # Start server and connect to it
+        async with stdio_client(self.server_params) as (read, write):
+            async with ClientSession(read, write) as session:
+                yield session
+    async def generate_plot(self, code: str, data_path: str = None):
+        """
+        Calls the 'run_plot_code' tool via MCP protocol
+        """
+        async with self.connect() as session:
+            # Initialization (handshake)
+            await session.initialize()
+            # Call tool
+            result = await session.call_tool(
+                "run_plot_code",
+                arguments={"code": code, "data_path": data_path}
+            )
+            # Parse result
+            if not result.content:
+                return {"success": False, "error": "Empty response from MCP server"}
+            # FastMCP returns JSON string inside TextContent
+            try:
+                text_content = result.content[0].text
+                return json.loads(text_content)
+            except Exception as e:
+                return {"success": False, "error": f"Failed to parse MCP response: {e}", "raw": str(result.content)}

mcp_tools/server.py ADDED Viewed

	@@ -0,0 +1,136 @@

+from mcp.server.fastmcp import FastMCP
+import subprocess
+import os
+import tempfile
+import base64
+import sys
+import ast
+import logging
+# Configure logging to file to avoid interfering with Stdio
+logging.basicConfig(
+    level=logging.INFO,
+    filename='mcp_server.log',
+    filemode='a',
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+)
+logger = logging.getLogger(__name__)
+# Initialize FastMCP server
+mcp = FastMCP("DataViz Tools")
+# Whitelist of allowed imports for security
+ALLOWED_IMPORTS = {
+    'pandas', 'pd',
+    'matplotlib', 'pyplot', 'plt',
+    'seaborn', 'sns',
+    'numpy', 'np',
+    'warnings',
+    'math',
+    'datetime',
+    'collections',
+}
+def validate_code_safety(code: str) -> tuple[bool, str]:
+    """
+    Validates Python code for security risks.
+    Returns (is_safe, error_message)
+    """
+    try:
+        tree = ast.parse(code)
+    except SyntaxError as e:
+        return False, f"Syntax error: {str(e)}"
+    for node in ast.walk(tree):
+        # Check imports
+        if isinstance(node, ast.Import):
+            for alias in node.names:
+                module_name = alias.name.split('.')[0]
+                if module_name not in ALLOWED_IMPORTS:
+                    return False, f"Import '{alias.name}' is not allowed for security reasons"
+        elif isinstance(node, ast.ImportFrom):
+            if node.module:
+                module_name = node.module.split('.')[0]
+                if module_name not in ALLOWED_IMPORTS:
+                    return False, f"Import from '{node.module}' is not allowed for security reasons"
+        # Check for dangerous functions
+        elif isinstance(node, ast.Call):
+            if isinstance(node.func, ast.Name):
+                # Block dangerous built-in functions
+                dangerous_funcs = {'eval', 'exec', 'compile', '__import__', 'open'}
+                if node.func.id in dangerous_funcs:
+                    return False, f"Function '{node.func.id}' is not allowed for security reasons"
+    return True, ""
+@mcp.tool()
+def run_plot_code(code: str, data_path: str = None) -> dict:
+    """
+    Executes Python code to generate a plot.
+    The code should use matplotlib/seaborn and save the figure to 'plot.png'.
+    Args:
+        code: The Python code to execute.
+        data_path: Optional path to a dataset file (csv, xlsx, parquet) to load as 'df' before execution.
+    Returns:
+        A dictionary containing success status, base64 encoded image, stdout, and stderr.
+    """
+    # Create a temporary directory for execution
+    with tempfile.TemporaryDirectory() as temp_dir:
+        script_path = os.path.join(temp_dir, 'script.py')
+        plot_path = os.path.join(temp_dir, 'plot.png')
+        # Prepare the script content
+        script_content = "import matplotlib\nmatplotlib.use('Agg')\nimport matplotlib.pyplot as plt\n"
+        if data_path:
+            # Inject data loading code
+            # Use raw string for path and forward slashes to avoid escape issues
+            safe_data_path = data_path.replace('\\', '/')
+            if data_path.endswith('.csv'):
+                script_content += f"import pandas as pd\ndf = pd.read_csv(r'{safe_data_path}')\n"
+            elif data_path.endswith('.xlsx'):
+                script_content += f"import pandas as pd\ndf = pd.read_excel(r'{safe_data_path}')\n"
+            elif data_path.endswith('.parquet'):
+                script_content += f"import pandas as pd\ndf = pd.read_parquet(r'{safe_data_path}')\n"
+        script_content += code
+        # Write the script
+        with open(script_path, 'w', encoding='utf-8') as f:
+            f.write(script_content)
+        try:
+            # Run the script in the temporary directory
+            result = subprocess.run(
+                [sys.executable, script_path],
+                capture_output=True,
+                text=True,
+                cwd=temp_dir,
+                timeout=120,
+                stdin=subprocess.DEVNULL
+            )
+            if result.returncode != 0:
+                return {"success": False, "error": result.stderr, "logs": result.stdout}
+            # Check if plot was created
+            if os.path.exists(plot_path):
+                with open(plot_path, "rb") as img_file:
+                    b64_img = base64.b64encode(img_file.read()).decode('utf-8')
+                return {"success": True, "image": b64_img, "logs": result.stdout}
+            else:
+                return {"success": False, "error": "Plot file 'plot.png' was not created.", "logs": result.stdout}
+        except Exception as e:
+            return {"success": False, "error": str(e)}
+if __name__ == "__main__":
+    try:
+        mcp.run()
+    except Exception as e:
+        logger.critical(f"Server failed to start: {e}", exc_info=True)
+        sys.exit(1)

requirements.txt ADDED Viewed

	@@ -0,0 +1,11 @@

+gradio
+mcp
+pandas
+matplotlib
+seaborn
+google-generativeai
+python-dotenv
+openpyxl
+uvicorn
+pyarrow
+python-docx

run.bat ADDED Viewed

	@@ -0,0 +1,5 @@

+@echo off
+call env\Scripts\activate
+set PYTHONPATH=%CD%
+python -m app
+pause

ui/app.py ADDED Viewed

	@@ -0,0 +1,438 @@

+import gradio as gr
+import pandas as pd
+import os
+import tempfile
+import re
+import base64
+import io
+import zipfile
+import logging
+import asyncio
+from PIL import Image
+from docx import Document
+from docx.shared import Inches
+from agent.agent import DataVizAgent
+from mcp_tools.client import DataVizClient
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
+    handlers=[
+        logging.FileHandler('dataviz_agent.log'),
+        logging.StreamHandler()
+    ]
+)
+logger = logging.getLogger(__name__)
+# Initialize Agent
+agent = DataVizAgent()
+# Initialize MCP Client
+mcp_client = DataVizClient()
+def b64_to_pil(b64_str):
+    return Image.open(io.BytesIO(base64.b64decode(b64_str)))
+def analyze_dataset(file_path):
+    """
+    Analyzes the dataset and returns a summary and the dataframe.
+    """
+    if file_path is None:
+        return None, "No file uploaded."
+    try:
+        if file_path.endswith('.csv'):
+            df = pd.read_csv(file_path)
+        elif file_path.endswith('.xlsx'):
+            df = pd.read_excel(file_path)
+        else:
+            return None, "Unsupported file format. Please upload CSV or Excel."
+        # Validate dataset
+        if df.empty:
+            return None, "Error: The uploaded file is empty."
+        if len(df.columns) == 0:
+            return None, "Error: No columns found in the dataset."
+        if len(df) > 1000000:
+            return None, "Error: Dataset is too large (>1M rows). Please use a smaller file."
+    except Exception as e:
+        return None, f"Error loading file: {str(e)}"
+    summary = {
+        "columns": [],
+        "row_count": len(df)
+    }
+    for col in df.columns:
+        col_info = {
+            "name": col,
+            "type": str(df[col].dtype),
+            "unique_values": df[col].nunique(),
+            "missing_values": df[col].isnull().sum()
+        }
+        if pd.api.types.is_numeric_dtype(df[col]):
+            try:
+                min_val = df[col].min()
+                max_val = df[col].max()
+                col_info["min"] = float(min_val) if pd.notna(min_val) else None
+                col_info["max"] = float(max_val) if pd.notna(max_val) else None
+            except (ValueError, TypeError):
+                col_info["min"] = None
+                col_info["max"] = None
+            col_info["is_numeric"] = True
+        else:
+            col_info["is_numeric"] = False
+        summary["columns"].append(col_info)
+    return df, summary
+def process_upload(file):
+    logger.info(f"Processing file upload: {file.name}")
+    df, summary = analyze_dataset(file.name)
+    if df is None:
+        logger.error(f"Failed to load file: {file.name}")
+        return None, {}, "Error loading file.", None
+    # Save dataframe to a temporary parquet file for the MCP tool
+    fd, path = tempfile.mkstemp(suffix='.parquet')
+    os.close(fd)
+    df.to_parquet(path)
+    logger.info(f"Dataset saved to temp file: {path}")
+    # Create a readable summary string
+    summary_str = f"Dataset Loaded: {len(df)} rows, {len(df.columns)} columns.\n\nColumns:\n"
+    for col in summary["columns"]:
+        summary_str += f"- {col['name']} ({col['type']}): {col['unique_values']} unique"
+        if col['is_numeric'] and col.get('min') is not None and col.get('max') is not None:
+            summary_str += f", range: [{col['min']:.2f}, {col['max']:.2f}]"
+        summary_str += "\n"
+    return df, summary, summary_str, path
+async def respond(message, chat_history, state):
+    logger.info(f"User message: {message}")
+    if state["dataframe"] is None:
+        logger.warning("User attempted to chat without uploading dataset")
+        chat_history.append({"role": "user", "content": message})
+        chat_history.append({"role": "assistant", "content": "Please upload a dataset first."})
+        return "", chat_history, gr.update(), state, gr.update(choices=[])
+    # Check for chart modification request
+    chart_id_match = re.search(r'#(\d+)', message)
+    existing_code = None
+    target_chart_id = None
+    if chart_id_match:
+        chart_id = int(chart_id_match.group(1))
+        if chart_id in state["charts"]:
+            existing_code = state["charts"][chart_id]["code"]
+            target_chart_id = chart_id
+            logger.info(f"Modifying chart #{chart_id}")
+        else:
+            chat_history.append({"role": "user", "content": message})
+            chat_history.append({"role": "assistant", "content": f"Chart #{chart_id} not found."})
+            return "", chat_history, _get_gallery_items(state), state, _get_chart_choices(state)
+    # Generate response using Agent (with chat history)
+    response = agent.generate_plot_code(
+        message,
+        state["columns_summary"],
+        history=chat_history,
+        existing_code=existing_code
+    )
+    chat_history.append({"role": "user", "content": message})
+    # Check response type
+    if response["type"] == "error":
+        logger.error(f"Agent error: {response['content']}")
+        chat_history.append({"role": "assistant", "content": f"Error: {response['content']}"})
+        return "", chat_history, _get_gallery_items(state), state, _get_chart_choices(state)
+    elif response["type"] == "message":
+        # Conversational response - no code to execute
+        logger.info("Agent provided conversational response")
+        chat_history.append({"role": "assistant", "content": response["content"]})
+        return "", chat_history, _get_gallery_items(state), state, _get_chart_choices(state)
+    elif response["type"] == "code":
+        # Code generation - execute it
+        code = response["content"]
+        logger.info("Executing generated code")
+        # Execute code using MCP Tool
+        result = await mcp_client.generate_plot(code, state["data_path"])
+        gallery_update = _get_gallery_items(state)
+        if result["success"]:
+            # Determine Chart ID
+            if target_chart_id:
+                cid = target_chart_id
+                action = "Updated"
+            else:
+                cid = state["next_chart_id"]
+                state["next_chart_id"] += 1
+                action = "Created"
+            # Generate description
+            description = agent.describe_chart(message, code)
+            # Update State
+            state["charts"][cid] = {
+                "code": code,
+                "image": result["image"],
+                "description": description
+            }
+            response_text = f"{action} chart #{cid}: {description}"
+            chat_history.append({"role": "assistant", "content": response_text})
+            logger.info(f"{action} chart #{cid}")
+            gallery_update = _get_gallery_items(state, selected_cid=cid)
+        else:
+            error_details = result.get('stderr', result.get('error', 'Unknown error occurred'))
+            error_msg = f"Failed to generate chart.\nError: {error_details}\n\nCode:\n```python\n{code}\n```"
+            chat_history.append({"role": "assistant", "content": error_msg})
+            logger.error(f"Chart generation failed: {error_details}")
+        return "", chat_history, gallery_update, state, _get_chart_choices(state)
+    # Fallback
+    return "", chat_history, _get_gallery_items(state), state, _get_chart_choices(state)
+def _get_gallery_items(state, selected_cid=None):
+    items = []
+    selected_index = None
+    current_idx = 0
+    # Sort by ID
+    for cid in sorted(state["charts"].keys()):
+        chart = state["charts"][cid]
+        if chart["image"]:
+            img = b64_to_pil(chart["image"])
+            items.append((img, f"#{cid} {chart['description']}"))
+            if selected_cid is not None and cid == selected_cid:
+                selected_index = current_idx
+            current_idx += 1
+    if selected_cid is not None:
+        return gr.update(value=items, selected_index=selected_index)
+    return items
+def _get_chart_choices(state):
+    return gr.update(choices=[f"#{cid}" for cid in sorted(state["charts"].keys())])
+def delete_chart(chart_str, chat_history, state):
+    if not chart_str:
+        return chat_history, _get_gallery_items(state), state, _get_chart_choices(state)
+    try:
+        cid = int(chart_str.replace("#", ""))
+        if cid in state["charts"]:
+            del state["charts"][cid]
+            chat_history.append({"role": "assistant", "content": f"🗑️ Chart #{cid} has been deleted."})
+    except:
+        pass
+    return chat_history, _get_gallery_items(state), state, _get_chart_choices(state)
+def download_zip(state):
+    if not state["charts"]:
+        return None
+    zip_filename = tempfile.mktemp(suffix=".zip")
+    with zipfile.ZipFile(zip_filename, 'w') as zipf:
+        for cid, chart in state["charts"].items():
+            if chart["image"]:
+                img_data = base64.b64decode(chart["image"])
+                zipf.writestr(f"chart_{cid}.png", img_data)
+    return zip_filename
+def download_report(state):
+    if not state["charts"]:
+        return None
+    doc = Document()
+    doc.add_heading('DataViz Agent Report', 0)
+    for cid in sorted(state["charts"].keys()):
+        chart = state["charts"][cid]
+        if chart["image"]:
+            doc.add_heading(f"Chart #{cid}: {chart['description']}", level=1)
+            # Save temp image for docx
+            with tempfile.NamedTemporaryFile(suffix=".png", delete=False) as tmp_img:
+                tmp_img.write(base64.b64decode(chart["image"]))
+                tmp_img_path = tmp_img.name
+            try:
+                doc.add_picture(tmp_img_path, width=Inches(6))
+            finally:
+                os.remove(tmp_img_path)
+            doc.add_paragraph(f"Code:\n{chart['code']}")
+            doc.add_page_break()
+    doc_filename = tempfile.mktemp(suffix=".docx")
+    doc.save(doc_filename)
+    return doc_filename
+def global_clear():
+    logger.info("Global clear initiated")
+    new_state = {
+        "dataframe": None,
+        "columns_summary": {},
+        "charts": {},
+        "next_chart_id": 1,
+        "data_path": None
+    }
+    return (
+        None, # File
+        "Upload a dataset to get started.", # Info
+        [], # Chat
+        [], # Gallery
+        new_state, # State
+        gr.update(choices=[]), # Dropdown
+        None # Download File
+    )
+with gr.Blocks(title="DataViz Agent", theme=gr.themes.Soft(), fill_height=True) as demo:
+    state = gr.State({
+        "dataframe": None,
+        "columns_summary": {},
+        "charts": {},
+        "next_chart_id": 1,
+        "data_path": None
+    })
+    with gr.Row():
+        gr.Markdown("## 🤖 DataViz Agent Chat")
+        gr.Markdown("## 📊 Charts Gallery")
+    with gr.Row():
+        with gr.Column(scale=3):
+            with gr.Row():
+                with gr.Group():
+                    file_upload = gr.File(label="Upload Dataset (CSV/XLSX)", file_types=[".csv", ".xlsx"])
+                    with gr.Accordion("Dataset Info", open=False):
+                        dataset_info = gr.Markdown("Upload a dataset to get started.")
+            with gr.Row(scale=1, height=700):
+                chatbot = gr.Chatbot(type="messages", height=700)
+            with gr.Row(height=50, equal_height=True):
+                msg = gr.Textbox(
+                    placeholder="Ask to visualize data (e.g., 'Show distribution of age')",
+                    show_label=False,
+                    elem_id="chat-input",
+                    lines=1,
+                    max_lines=1,
+                    scale=1
+                )
+                send_btn = gr.Button("Send", variant="primary", scale=0)
+        with gr.Column(scale=2):
+            with gr.Row(height=626):
+                gallery = gr.Gallery(label="Generated Charts", columns=1, object_fit="contain", height=626)
+            with gr.Row():
+                with gr.Group():
+                    gr.Markdown("### Manage Charts")
+                    with gr.Row():
+                        chart_selector = gr.Dropdown(label="Select Chart to Delete", choices=[])
+                        delete_btn = gr.Button("🗑️ Delete Chart", variant="stop")
+                    with gr.Row():
+                        dl_zip_btn = gr.Button("💾 Download All (ZIP)")
+                        dl_report_btn = gr.Button("📄 Download Report (Word)")
+                    with gr.Row(height=80):
+                        dl_file = gr.File(label="Download", visible=True)
+    # Global Clear (Bottom)
+    with gr.Row():
+        global_clear_btn = gr.Button("Global Clear (Reset All)", variant="stop")
+    # Event Handlers
+    def on_file_upload(file, current_state):
+        if file is None:
+            return current_state, "Upload a dataset to get started."
+        df, summary, summary_str, path = process_upload(file)
+        if df is not None:
+            current_state["dataframe"] = df
+            current_state["columns_summary"] = summary
+            current_state["data_path"] = path
+            return current_state, summary_str
+        return current_state, summary_str
+    def on_file_upload_wrapper(file, current_state):
+        # Clean up old temporary file if exists
+        if current_state.get("data_path") and os.path.exists(current_state["data_path"]):
+            try:
+                os.remove(current_state["data_path"])
+                logger.info(f"Cleaned up old temp file: {current_state['data_path']}")
+            except Exception as e:
+                logger.warning(f"Failed to remove temp file: {e}")
+        return on_file_upload(file, current_state)
+    file_upload.change(
+        on_file_upload_wrapper,
+        inputs=[file_upload, state],
+        outputs=[state, dataset_info]
+    )
+    # Chat interactions
+    msg.submit(
+        respond,
+        inputs=[msg, chatbot, state],
+        outputs=[msg, chatbot, gallery, state, chart_selector]
+    ).then(
+        None, None, None,
+        js="() => { setTimeout(() => { const el = document.getElementById('chat-input'); if (el) { const input = el.querySelector('textarea') || el.querySelector('input'); if (input) input.focus(); } }, 200); }"
+    )
+    send_btn.click(
+        respond,
+        inputs=[msg, chatbot, state],
+        outputs=[msg, chatbot, gallery, state, chart_selector]
+    ).then(
+        None, None, None,
+        js="() => { setTimeout(() => { const el = document.getElementById('chat-input'); if (el) { const input = el.querySelector('textarea') || el.querySelector('input'); if (input) input.focus(); } }, 200); }"
+    )
+    # Chart Management
+    delete_btn.click(
+        delete_chart,
+        inputs=[chart_selector, chatbot, state],
+        outputs=[chatbot, gallery, state, chart_selector]
+    )
+    dl_zip_btn.click(
+        download_zip,
+        inputs=[state],
+        outputs=[dl_file]
+    )
+    dl_report_btn.click(
+        download_report,
+        inputs=[state],
+        outputs=[dl_file]
+    )
+    global_clear_btn.click(
+        global_clear,
+        inputs=[],
+        outputs=[file_upload, dataset_info, chatbot, gallery, state, chart_selector, dl_file]
+    )
+if __name__ == "__main__":
+    demo.launch()