Spaces:

duqing26
/

smart-data-refinery

Sleeping

App Files Files Community

3v324v23 commited on Feb 1

Commit

e15a3ce

0 Parent(s):

Initial commit with robust upload and demo data

Browse files

Files changed (7) hide show

Dockerfile +15 -0
README.md +62 -0
__pycache__/app.cpython-314.pyc +0 -0
app.py +385 -0
requirements.txt +4 -0
templates/index.html +431 -0
test.csv +6 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,15 @@

+FROM python:3.9-slim
+WORKDIR /app
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+# Create upload directory
+RUN mkdir -p /tmp/uploads
+EXPOSE 7860
+CMD ["python", "app.py"]

README.md ADDED Viewed

	@@ -0,0 +1,62 @@

+---
+title: 智能数据炼油厂
+emoji: 🛢️
+colorFrom: blue
+colorTo: purple
+sdk: docker
+pinned: false
+short_description: 一站式CSV/JSON数据清洗与转换工具，支持可视化流水线操作。
+---
+# 智能数据炼油厂 (Smart Data Refinery)
+## 项目简介
+**智能数据炼油厂** 是一个现代化的数据清洗与转换工具 (ETL Lite)，专为非技术人员和数据分析师设计。通过直观的 Web 界面，用户可以上传 CSV、JSON 或 Excel 文件，构建数据处理“流水线” (Pipeline)，实时预览清洗结果，并导出干净的数据。
+本项目旨在解决企业和个人日常工作中遇到的“脏数据”痛点，提供无需编写代码即可完成的高级数据处理能力。
+## 核心功能
+1.  **多格式支持**: 支持 CSV, JSON, Excel 文件的导入与导出。
+2.  **可视化流水线**:
+    *   **筛选 (Filter)**: 按条件过滤数据 (>, <, ==, 包含等)。
+    *   **去重 (Dedupe)**: 智能去除重复行，支持指定列。
+    *   **缺失值处理 (Fill NA)**: 填充指定值，或使用前向/后向填充。
+    *   **排序 (Sort)**: 多字段排序。
+    *   **列操作**: 重命名、选择特定列。
+3.  **实时预览**: 每一步操作后立即查看数据变化 (前 50 行)。
+4.  **隐私安全**: 所有处理在容器内完成，不依赖外部 API。
+5.  **高性能**: 基于 Pandas 引擎，处理百万级数据无压力 (受限于内存)。
+## 商业价值
+*   **效率工具**: 替代 Excel 繁琐的手动操作，自动化重复的数据清洗任务。
+*   **数据资产**: 未来可扩展“清洗配方”保存功能，让数据处理标准化。
+*   **适用场景**: 电商订单清洗、营销名单筛选、日志分析预处理。
+## 快速开始
+### Docker 部署 (推荐)
+```bash
+# 构建镜像
+docker build -t smart-data-refinery .
+# 运行容器
+docker run -p 7860:7860 smart-data-refinery
+```
+访问 `http://localhost:7860` 即可使用。
+### 本地开发
+```bash
+pip install -r requirements.txt
+python app.py
+```
+## 技术栈
+*   **后端**: Flask, Pandas, OpenPyxl
+*   **前端**: Vue 3, Tailwind CSS (Dark Mode)
+*   **部署**: Docker
+## 许可证
+MIT License

__pycache__/app.cpython-314.pyc ADDED Viewed

Binary file (18.2 kB). View file

app.py ADDED Viewed

	@@ -0,0 +1,385 @@

+import os
+import io
+import json
+import logging
+import pandas as pd
+from flask import Flask, render_template, request, jsonify, send_file, session
+from werkzeug.utils import secure_filename
+# Configure logging
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
+logger = logging.getLogger(__name__)
+app = Flask(__name__)
+app.secret_key = os.urandom(24)
+app.config['MAX_CONTENT_LENGTH'] = 50 * 1024 * 1024  # 50MB limit
+app.config['UPLOAD_FOLDER'] = '/tmp/uploads'
+# Ensure upload directory exists
+os.makedirs(app.config['UPLOAD_FOLDER'], exist_ok=True)
+ALLOWED_EXTENSIONS = {'csv', 'json', 'xlsx'}
+def allowed_file(filename):
+    return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS
+def check_robustness(file_stream):
+    """Check for null bytes and other safety constraints."""
+    try:
+        # Read a chunk to check for binary content
+        chunk = file_stream.read(4096)
+        file_stream.seek(0)
+        # Text files shouldn't have null bytes usually, unless it's some specific encoding.
+        # However, Excel files (xlsx) ARE binary (zip archives).
+        # We should only check for null bytes if it claims to be CSV or JSON.
+        # But we don't know the extension here reliably yet if we just pass the stream.
+        # So we should probably pass the filename or extension to this function.
+        if b'\0' in chunk:
+             return True, "Binary content detected (warning)" # Changed to warning or handle in route
+        return True, ""
+    except Exception as e:
+        return False, f"Error checking file robustness: {str(e)}"
+def load_df(filepath, ext):
+    if ext == 'csv':
+        return pd.read_csv(filepath)
+    elif ext == 'json':
+        return pd.read_json(filepath)
+    elif ext == 'xlsx':
+        return pd.read_excel(filepath)
+    return None
+def df_to_json_preview(df, rows=50):
+    """Convert first N rows of DF to JSON for preview."""
+    preview = df.head(rows).fillna("").to_dict(orient='records')
+    columns = list(df.columns)
+    stats = {
+        "rows": len(df),
+        "columns": len(columns),
+        "missing_values": int(df.isnull().sum().sum()),
+        "duplicates": int(df.duplicated().sum())
+    }
+    return {"data": preview, "columns": columns, "stats": stats}
+@app.route('/')
+def index():
+    return render_template('index.html')
+@app.route('/health')
+def health():
+    return jsonify({"status": "healthy"}), 200
+@app.route('/api/load_demo', methods=['POST'])
+def load_demo():
+    try:
+        # Create a simple demo dataframe
+        data = {
+            "Date": pd.date_range(start='2024-01-01', periods=100),
+            "Category": ['A', 'B', 'C', 'A', 'B'] * 20,
+            "Value": pd.Series(range(100)) + pd.Series([1, 2, 5] * 33 + [1]),
+            "Status": ['Active', 'Inactive', 'Pending', 'Active'] * 25
+        }
+        df = pd.DataFrame(data)
+        # Add some random missing values
+        import numpy as np
+        df.loc[5:10, 'Value'] = np.nan
+        df.loc[15:20, 'Status'] = np.nan
+        filename = "demo_data.csv"
+        filepath = os.path.join(app.config['UPLOAD_FOLDER'], filename)
+        df.to_csv(filepath, index=False)
+        return jsonify({
+            "message": "Demo data loaded successfully",
+            "filename": filename,
+            "preview": df_to_json_preview(df)
+        })
+    except Exception as e:
+        logger.error(f"Demo load error: {e}")
+        return jsonify({"error": str(e)}), 500
+@app.route('/api/upload', methods=['POST'])
+def upload_file():
+    try:
+        if 'file' not in request.files:
+            return jsonify({"error": "No file part"}), 400
+        file = request.files['file']
+        if file.filename == '':
+            return jsonify({"error": "No selected file"}), 400
+        if not allowed_file(file.filename):
+            return jsonify({"error": "File type not allowed. Use CSV, JSON, or XLSX."}), 400
+        filename = secure_filename(file.filename)
+        ext = filename.rsplit('.', 1)[1].lower()
+        # Robustness check
+        # Only check for null bytes if it is a text format (csv, json)
+        if ext in ['csv', 'json']:
+            is_safe, msg = check_robustness(file.stream)
+            # If it returns True (safe) but with a message, it might be a warning, but for text files, binary content is usually bad.
+            # However, my previous edit made it return True even if binary.
+            # Let's fix that logic inline or revert/adjust check_robustness.
+            # Actually, let's just do the check here properly.
+            chunk = file.stream.read(4096)
+            file.stream.seek(0)
+            if b'\0' in chunk:
+                 return jsonify({"error": "File contains null bytes (binary suspected). Please upload a valid text file for CSV/JSON."}), 400
+        filepath = os.path.join(app.config['UPLOAD_FOLDER'], filename)
+        file.save(filepath)
+        # Load and Preview
+        try:
+            df = load_df(filepath, ext)
+        except Exception as e:
+            return jsonify({"error": f"Failed to parse file: {str(e)}"}), 400
+        # Store file info in session (stateless ideally, but for simplicity storing path)
+        # For a more robust solution, we'd return a token. Let's return a token/filename.
+        return jsonify({
+            "message": "File uploaded successfully",
+            "filename": filename,
+            "preview": df_to_json_preview(df)
+        })
+    except Exception as e:
+        logger.error(f"Upload error: {e}")
+        return jsonify({"error": str(e)}), 500
+@app.route('/api/process', methods=['POST'])
+def process_data():
+    try:
+        data = request.json
+        filename = data.get('filename')
+        operations = data.get('operations', [])
+        if not filename:
+            return jsonify({"error": "Filename missing"}), 400
+        filepath = os.path.join(app.config['UPLOAD_FOLDER'], secure_filename(filename))
+        if not os.path.exists(filepath):
+            return jsonify({"error": "File not found. Please upload again."}), 404
+        ext = filename.rsplit('.', 1)[1].lower()
+        df = load_df(filepath, ext)
+        # Apply Operations Pipeline
+        for op in operations:
+            op_type = op.get('type')
+            params = op.get('params', {})
+            if op_type == 'drop_duplicates':
+                subset = params.get('subset')
+                if subset:
+                    df = df.drop_duplicates(subset=subset)
+                else:
+                    df = df.drop_duplicates()
+            elif op_type == 'dropna':
+                how = params.get('how', 'any')
+                subset = params.get('subset')
+                if subset:
+                    df = df.dropna(how=how, subset=subset)
+                else:
+                    df = df.dropna(how=how)
+            elif op_type == 'fillna':
+                value = params.get('value')
+                method = params.get('method') # ffill, bfill
+                subset = params.get('subset') # columns to apply
+                if subset:
+                    if method:
+                        df[subset] = df[subset].fillna(method=method)
+                    else:
+                        df[subset] = df[subset].fillna(value)
+                else:
+                    if method:
+                        df = df.fillna(method=method)
+                    else:
+                        df = df.fillna(value)
+            elif op_type == 'filter':
+                # Simple filtering: col operator value
+                col = params.get('column')
+                operator = params.get('operator') # ==, !=, >, <, contains
+                value = params.get('value')
+                if col in df.columns:
+                    if operator == '==':
+                        df = df[df[col] == value]
+                    elif operator == '!=':
+                        df = df[df[col] != value]
+                    elif operator == '>':
+                        df = df[pd.to_numeric(df[col], errors='coerce') > float(value)]
+                    elif operator == '<':
+                        df = df[pd.to_numeric(df[col], errors='coerce') < float(value)]
+                    elif operator == 'contains':
+                        df = df[df[col].astype(str).str.contains(value, na=False)]
+            elif op_type == 'sort':
+                col = params.get('column')
+                ascending = params.get('ascending', True)
+                if col in df.columns:
+                    df = df.sort_values(by=col, ascending=ascending)
+            elif op_type == 'rename':
+                mapping = params.get('mapping') # {old: new}
+                if mapping:
+                    df = df.rename(columns=mapping)
+            elif op_type == 'select_columns':
+                cols = params.get('columns')
+                if cols:
+                    valid_cols = [c for c in cols if c in df.columns]
+                    df = df[valid_cols]
+        return jsonify({
+            "message": "Processed successfully",
+            "preview": df_to_json_preview(df)
+        })
+    except Exception as e:
+        logger.error(f"Processing error: {e}")
+        return jsonify({"error": str(e)}), 500
+@app.route('/api/export', methods=['POST'])
+def export_data():
+    try:
+        data = request.json
+        filename = data.get('filename')
+        operations = data.get('operations', [])
+        format_type = data.get('format', 'csv')
+        filepath = os.path.join(app.config['UPLOAD_FOLDER'], secure_filename(filename))
+        ext = filename.rsplit('.', 1)[1].lower()
+        df = load_df(filepath, ext)
+        # Re-apply operations (stateless)
+        for op in operations:
+            # ... (Duplicate logic, ideally refactor to function)
+            # For simplicity, assuming same logic.
+            # Let's refactor 'apply_operations'
+            pass
+        # Actually, let's just copy-paste the logic for now to ensure it works,
+        # or better: refactor.
+        df = apply_operations(df, operations)
+        output = io.BytesIO()
+        if format_type == 'csv':
+            df.to_csv(output, index=False)
+            mimetype = 'text/csv'
+            download_name = 'processed_data.csv'
+        elif format_type == 'json':
+            df.to_json(output, orient='records')
+            mimetype = 'application/json'
+            download_name = 'processed_data.json'
+        elif format_type == 'xlsx':
+            df.to_excel(output, index=False)
+            mimetype = 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
+            download_name = 'processed_data.xlsx'
+        else:
+            return jsonify({"error": "Invalid format"}), 400
+        output.seek(0)
+        return send_file(
+            output,
+            mimetype=mimetype,
+            as_attachment=True,
+            download_name=download_name
+        )
+    except Exception as e:
+        logger.error(f"Export error: {e}")
+        return jsonify({"error": str(e)}), 500
+def apply_operations(df, operations):
+    """Helper to apply operations to DF."""
+    for op in operations:
+        op_type = op.get('type')
+        params = op.get('params', {})
+        if op_type == 'drop_duplicates':
+            subset = params.get('subset')
+            if subset:
+                df = df.drop_duplicates(subset=subset)
+            else:
+                df = df.drop_duplicates()
+        elif op_type == 'dropna':
+            how = params.get('how', 'any')
+            subset = params.get('subset')
+            if subset:
+                df = df.dropna(how=how, subset=subset)
+            else:
+                df = df.dropna(how=how)
+        elif op_type == 'fillna':
+            value = params.get('value')
+            method = params.get('method')
+            subset = params.get('subset')
+            if subset:
+                # Handle list of columns
+                if isinstance(subset, str):
+                    subset = [subset]
+                # Check if columns exist
+                valid_subset = [c for c in subset if c in df.columns]
+                if method:
+                    df[valid_subset] = df[valid_subset].fillna(method=method)
+                else:
+                    df[valid_subset] = df[valid_subset].fillna(value)
+            else:
+                if method:
+                    df = df.fillna(method=method)
+                else:
+                    df = df.fillna(value)
+        elif op_type == 'filter':
+            col = params.get('column')
+            operator = params.get('operator')
+            value = params.get('value')
+            if col in df.columns:
+                if operator == '==':
+                    df = df[df[col].astype(str) == str(value)]
+                elif operator == '!=':
+                    df = df[df[col].astype(str) != str(value)]
+                elif operator == '>':
+                    try:
+                        df = df[pd.to_numeric(df[col], errors='coerce') > float(value)]
+                    except: pass
+                elif operator == '<':
+                    try:
+                        df = df[pd.to_numeric(df[col], errors='coerce') < float(value)]
+                    except: pass
+                elif operator == 'contains':
+                    df = df[df[col].astype(str).str.contains(str(value), na=False)]
+        elif op_type == 'sort':
+            col = params.get('column')
+            ascending = params.get('ascending', True)
+            if col in df.columns:
+                df = df.sort_values(by=col, ascending=ascending)
+        elif op_type == 'rename':
+            mapping = params.get('mapping')
+            if mapping:
+                df = df.rename(columns=mapping)
+        elif op_type == 'select_columns':
+            cols = params.get('columns')
+            if cols:
+                valid_cols = [c for c in cols if c in df.columns]
+                df = df[valid_cols]
+    return df
+if __name__ == '__main__':
+    app.run(host='0.0.0.0', port=7860, debug=False)

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+flask==2.3.3
+pandas==2.0.3
+openpyxl==3.1.2
+werkzeug==2.3.7

templates/index.html ADDED Viewed

	@@ -0,0 +1,431 @@

+<!DOCTYPE html>
+<html lang="zh-CN" class="dark">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>智能数据炼油厂 (Smart Data Refinery)</title>
+    <script src="https://cdn.tailwindcss.com"></script>
+    <script src="https://unpkg.com/vue@3/dist/vue.global.js"></script>
+    <script src="https://cdn.jsdelivr.net/npm/axios/dist/axios.min.js"></script>
+    <script>
+        tailwind.config = {
+            darkMode: 'class',
+            theme: {
+                extend: {
+                    colors: {
+                        primary: '#3b82f6',
+                        secondary: '#10b981',
+                        dark: '#111827',
+                        darker: '#0f172a',
+                        panel: '#1e293b'
+                    }
+                }
+            }
+        }
+    </script>
+    <style>
+        body { font-family: 'Inter', sans-serif; }
+        [v-cloak] { display: none !important; }
+        .glass {
+            background: rgba(30, 41, 59, 0.7);
+            backdrop-filter: blur(10px);
+            border: 1px solid rgba(255, 255, 255, 0.1);
+        }
+        ::-webkit-scrollbar { width: 8px; height: 8px; }
+        ::-webkit-scrollbar-track { background: #1e293b; }
+        ::-webkit-scrollbar-thumb { background: #475569; border-radius: 4px; }
+        ::-webkit-scrollbar-thumb:hover { background: #64748b; }
+    </style>
+</head>
+<body class="bg-darker text-gray-200 min-h-screen flex flex-col">
+    <div id="app" class="flex flex-col h-screen" v-cloak>
+        <!-- Header -->
+        <header class="h-16 border-b border-gray-700 bg-panel flex items-center justify-between px-6 shrink-0">
+            <div class="flex items-center gap-3">
+                <div class="w-8 h-8 rounded bg-gradient-to-br from-blue-500 to-purple-600 flex items-center justify-center font-bold text-white">D</div>
+                <h1 class="text-xl font-bold bg-clip-text text-transparent bg-gradient-to-r from-blue-400 to-purple-400">智能数据炼油厂</h1>
+            </div>
+            <div class="flex items-center gap-4">
+                <button @click="loadDemoData" class="px-3 py-1.5 bg-gray-600 hover:bg-gray-700 rounded text-sm font-medium transition flex items-center gap-2" :disabled="loading">
+                    <span>🧪 加载演示数据</span>
+                </button>
+                <button @click="exportData('csv')" class="px-3 py-1.5 bg-green-600 hover:bg-green-700 rounded text-sm font-medium transition flex items-center gap-2" :disabled="!filename">
+                    <span>导出 CSV</span>
+                </button>
+                <button @click="exportData('json')" class="px-3 py-1.5 bg-yellow-600 hover:bg-yellow-700 rounded text-sm font-medium transition flex items-center gap-2" :disabled="!filename">
+                    <span>导出 JSON</span>
+                </button>
+            </div>
+        </header>
+        <!-- Main Content -->
+        <main class="flex-1 flex overflow-hidden">
+            <!-- Sidebar (Pipeline) -->
+            <aside class="w-80 bg-panel border-r border-gray-700 flex flex-col shrink-0">
+                <div class="p-4 border-b border-gray-700">
+                    <h2 class="font-semibold text-gray-300 mb-2">处理流水线 (Pipeline)</h2>
+                    <div class="text-xs text-gray-500">按顺序执行以下操作</div>
+                </div>
+                <div class="flex-1 overflow-y-auto p-4 space-y-3">
+                    <div v-if="operations.length === 0" class="text-center text-gray-500 py-10 border-2 border-dashed border-gray-700 rounded-lg">
+                        暂无操作
+                    </div>
+                    <div v-for="(op, index) in operations" :key="index" class="bg-dark p-3 rounded border border-gray-600 relative group">
+                        <button @click="removeOperation(index)" class="absolute top-2 right-2 text-gray-500 hover:text-red-400 opacity-0 group-hover:opacity-100 transition">✕</button>
+                        <div class="text-sm font-bold text-blue-400 mb-1">${ getOpName(op.type) }</div>
+                        <!-- Dynamic Params Display -->
+                        <div class="text-xs text-gray-400 space-y-1">
+                            <div v-if="op.type === 'filter'">
+                                ${ op.params.column } ${ op.params.operator } ${ op.params.value }
+                            </div>
+                            <div v-if="op.type === 'fillna'">
+                                ${ op.params.subset ? op.params.subset : '所有列' } -> ${ op.params.method || op.params.value }
+                            </div>
+                            <div v-if="op.type === 'drop_duplicates'">
+                                ${ op.params.subset ? '依据: ' + op.params.subset : '完全重复' }
+                            </div>
+                            <div v-if="op.type === 'sort'">
+                                ${ op.params.column } (${ op.params.ascending ? '升序' : '降序' })
+                            </div>
+                            <div v-if="op.type === 'select_columns'">
+                                保留: ${ op.params.columns.join(', ') }
+                            </div>
+                             <div v-if="op.type === 'rename'">
+                                重命名: ${ JSON.stringify(op.params.mapping) }
+                            </div>
+                        </div>
+                    </div>
+                </div>
+                <!-- Add Operation Button -->
+                <div class="p-4 border-t border-gray-700 bg-panel">
+                    <button @click="showAddOpModal = true" class="w-full py-2 bg-blue-600 hover:bg-blue-700 rounded text-sm font-medium transition" :disabled="!filename">
+                        + 添加操作
+                    </button>
+                </div>
+            </aside>
+            <!-- Main Area -->
+            <div class="flex-1 flex flex-col bg-darker overflow-hidden relative">
+                <!-- Upload / Empty State -->
+                <div v-if="!filename" class="absolute inset-0 flex items-center justify-center z-10 bg-darker/90 backdrop-blur-sm">
+                    <div
+                        class="w-96 h-64 border-2 border-dashed border-gray-600 rounded-xl flex flex-col items-center justify-center cursor-pointer hover:border-blue-500 hover:bg-blue-500/5 transition group"
+                        @click="triggerFileInput"
+                        @dragover.prevent
+                        @drop.prevent="handleDrop"
+                    >
+                        <input type="file" ref="fileInput" class="hidden" @change="handleFileSelect" accept=".csv,.json,.xlsx">
+                        <div class="text-4xl mb-4 group-hover:scale-110 transition">📂</div>
+                        <div class="text-lg font-medium text-gray-300">点击或拖拽上传文件</div>
+                        <div class="text-sm text-gray-500 mt-2">支持 CSV, JSON, Excel (< 16MB)</div>
+                    </div>
+                </div>
+                <!-- Data Table -->
+                <div class="flex-1 overflow-auto p-0 relative">
+                    <div v-if="loading" class="absolute inset-0 flex items-center justify-center bg-darker/50 z-20">
+                        <div class="animate-spin rounded-full h-12 w-12 border-b-2 border-blue-500"></div>
+                    </div>
+                    <table v-if="previewData" class="w-full text-left border-collapse">
+                        <thead class="bg-panel sticky top-0 z-10 shadow-md">
+                            <tr>
+                                <th v-for="col in previewColumns" :key="col" class="p-3 text-xs font-medium text-gray-400 uppercase tracking-wider border-b border-gray-700 whitespace-nowrap">
+                                    ${ col }
+                                </th>
+                            </tr>
+                        </thead>
+                        <tbody class="divide-y divide-gray-800">
+                            <tr v-for="(row, idx) in previewData" :key="idx" class="hover:bg-gray-800/50 transition">
+                                <td v-for="col in previewColumns" :key="col" class="p-3 text-sm text-gray-300 whitespace-nowrap border-r border-gray-800 last:border-r-0">
+                                    ${ row[col] }
+                                </td>
+                            </tr>
+                        </tbody>
+                    </table>
+                </div>
+                <!-- Footer Stats -->
+                <div class="h-10 bg-panel border-t border-gray-700 flex items-center px-4 gap-6 text-xs text-gray-400 shrink-0">
+                    <div v-if="stats">
+                        <span>行数: <span class="text-white">${ stats.rows }</span></span>
+                        <span class="ml-4">列数: <span class="text-white">${ stats.columns }</span></span>
+                        <span class="ml-4">缺失值: <span class="text-yellow-500">${ stats.missing_values }</span></span>
+                        <span class="ml-4">重复行: <span class="text-red-500">${ stats.duplicates }</span></span>
+                    </div>
+                    <div class="ml-auto">
+                        <span v-if="filename" class="text-blue-400">${ filename }</span>
+                    </div>
+                </div>
+            </div>
+        </main>
+        <!-- Add Operation Modal -->
+        <div v-if="showAddOpModal" class="fixed inset-0 bg-black/50 backdrop-blur-sm flex items-center justify-center z-50">
+            <div class="bg-panel border border-gray-600 rounded-lg w-[500px] shadow-2xl p-6">
+                <h3 class="text-lg font-bold mb-4 text-white">添加操作</h3>
+                <div class="mb-4">
+                    <label class="block text-sm text-gray-400 mb-1">操作类型</label>
+                    <select v-model="newOp.type" class="w-full bg-dark border border-gray-600 rounded px-3 py-2 text-white focus:outline-none focus:border-blue-500">
+                        <option value="filter">筛选 (Filter)</option>
+                        <option value="sort">排序 (Sort)</option>
+                        <option value="fillna">填充缺失值 (Fill NA)</option>
+                        <option value="drop_duplicates">去重 (Drop Duplicates)</option>
+                        <option value="select_columns">选择列 (Select Columns)</option>
+                        <option value="rename">重命名列 (Rename)</option>
+                    </select>
+                </div>
+                <!-- Dynamic Inputs based on Type -->
+                <div class="space-y-3 mb-6">
+                    <!-- Filter -->
+                    <div v-if="newOp.type === 'filter'">
+                        <select v-model="newOp.params.column" class="w-full bg-dark border border-gray-600 rounded px-3 py-2 text-white mb-2">
+                            <option v-for="col in previewColumns" :value="col">${ col }</option>
+                        </select>
+                        <div class="flex gap-2 mb-2">
+                            <select v-model="newOp.params.operator" class="w-1/3 bg-dark border border-gray-600 rounded px-3 py-2 text-white">
+                                <option value="==">等于</option>
+                                <option value="!=">不等于</option>
+                                <option value=">">大于</option>
+                                <option value="<">小于</option>
+                                <option value="contains">包含</option>
+                            </select>
+                            <input v-model="newOp.params.value" placeholder="值" class="w-2/3 bg-dark border border-gray-600 rounded px-3 py-2 text-white">
+                        </div>
+                    </div>
+                    <!-- Sort -->
+                    <div v-if="newOp.type === 'sort'">
+                        <select v-model="newOp.params.column" class="w-full bg-dark border border-gray-600 rounded px-3 py-2 text-white mb-2">
+                            <option v-for="col in previewColumns" :value="col">${ col }</option>
+                        </select>
+                        <label class="flex items-center gap-2 text-sm text-gray-300">
+                            <input type="checkbox" v-model="newOp.params.ascending"> 升序 (Ascending)
+                        </label>
+                    </div>
+                    <!-- FillNA -->
+                    <div v-if="newOp.type === 'fillna'">
+                         <select v-model="newOp.params.subset" class="w-full bg-dark border border-gray-600 rounded px-3 py-2 text-white mb-2">
+                            <option value="">所有列</option>
+                            <option v-for="col in previewColumns" :value="col">${ col }</option>
+                        </select>
+                         <div class="flex gap-2">
+                            <input v-model="newOp.params.value" placeholder="填充值 (e.g. 0, Unknown)" class="flex-1 bg-dark border border-gray-600 rounded px-3 py-2 text-white">
+                            <select v-model="newOp.params.method" class="w-1/3 bg-dark border border-gray-600 rounded px-3 py-2 text-white">
+                                <option value="">指定值</option>
+                                <option value="ffill">前向填充</option>
+                                <option value="bfill">后向填充</option>
+                            </select>
+                        </div>
+                    </div>
+                    <!-- Drop Duplicates -->
+                    <div v-if="newOp.type === 'drop_duplicates'">
+                         <select v-model="newOp.params.subset" class="w-full bg-dark border border-gray-600 rounded px-3 py-2 text-white mb-2">
+                            <option value="">所有列 (完全重复)</option>
+                            <option v-for="col in previewColumns" :value="col">${ col }</option>
+                        </select>
+                    </div>
+                    <!-- Select Columns -->
+                    <div v-if="newOp.type === 'select_columns'">
+                        <div class="h-32 overflow-y-auto border border-gray-600 rounded p-2 bg-dark">
+                            <label v-for="col in previewColumns" :key="col" class="flex items-center gap-2 text-sm text-gray-300 mb-1">
+                                <input type="checkbox" :value="col" v-model="newOp.params.columns"> ${ col }
+                            </label>
+                        </div>
+                    </div>
+                     <!-- Rename -->
+                    <div v-if="newOp.type === 'rename'">
+                         <select v-model="tempRenameCol" class="w-full bg-dark border border-gray-600 rounded px-3 py-2 text-white mb-2">
+                            <option v-for="col in previewColumns" :value="col">${ col }</option>
+                        </select>
+                        <input v-model="tempRenameVal" placeholder="新列名" class="w-full bg-dark border border-gray-600 rounded px-3 py-2 text-white">
+                    </div>
+                </div>
+                <div class="flex justify-end gap-3">
+                    <button @click="showAddOpModal = false" class="px-4 py-2 text-gray-400 hover:text-white transition">取消</button>
+                    <button @click="addOperation" class="px-4 py-2 bg-blue-600 hover:bg-blue-700 rounded text-white font-medium transition">确认添加</button>
+                </div>
+            </div>
+        </div>
+    </div>
+    <script>
+        const { createApp, ref, reactive } = Vue
+        createApp({
+            delimiters: ['${', '}'],
+            setup() {
+                const filename = ref('')
+                const previewData = ref(null)
+                const previewColumns = ref([])
+                const stats = ref(null)
+                const operations = ref([])
+                const loading = ref(false)
+                const showAddOpModal = ref(false)
+                // Add Op Form
+                const newOp = reactive({
+                    type: 'filter',
+                    params: {
+                        columns: [], // for select_columns
+                        ascending: true
+                    }
+                })
+                const tempRenameCol = ref('')
+                const tempRenameVal = ref('')
+                const fileInput = ref(null)
+                const triggerFileInput = () => fileInput.value.click()
+                const handleFileSelect = (e) => {
+                    const file = e.target.files[0]
+                    if (file) uploadFile(file)
+                }
+                const handleDrop = (e) => {
+                    const file = e.dataTransfer.files[0]
+                    if (file) uploadFile(file)
+                }
+                const loadDemoData = async () => {
+                    loading.value = true
+                    try {
+                        const res = await axios.post('/api/load_demo')
+                        filename.value = res.data.filename
+                        previewData.value = res.data.preview.data
+                        previewColumns.value = res.data.preview.columns
+                        stats.value = res.data.preview.stats
+                        operations.value = []
+                    } catch (e) {
+                        alert('Demo load failed: ' + (e.response?.data?.error || e.message))
+                    } finally {
+                        loading.value = false
+                    }
+                }
+                const uploadFile = async (file) => {
+                    // Backend limit is 50MB now, frontend warning at 50MB
+                    if (file.size > 50 * 1024 * 1024) {
+                        alert('文件过大，建议小于 50MB')
+                    }
+                    const formData = new FormData()
+                    formData.append('file', file)
+                    loading.value = true
+                    try {
+                        const res = await axios.post('/api/upload', formData)
+                        filename.value = res.data.filename
+                        previewData.value = res.data.preview.data
+                        previewColumns.value = res.data.preview.columns
+                        stats.value = res.data.preview.stats
+                        operations.value = [] // Reset operations
+                    } catch (e) {
+                        alert('Upload failed: ' + (e.response?.data?.error || e.message))
+                    } finally {
+                        loading.value = false
+                    }
+                }
+                const addOperation = () => {
+                    const op = JSON.parse(JSON.stringify(newOp)) // Deep copy
+                    // Specific logic fixes
+                    if (op.type === 'rename') {
+                        if (!tempRenameCol.value || !tempRenameVal.value) return
+                        op.params.mapping = { [tempRenameCol.value]: tempRenameVal.value }
+                    }
+                    if (op.type === 'select_columns' && op.params.columns.length === 0) return
+                    operations.value.push(op)
+                    showAddOpModal.value = false
+                    // Reset specialized params
+                    newOp.params = { columns: [], ascending: true }
+                    tempRenameCol.value = ''
+                    tempRenameVal.value = ''
+                    // Trigger process
+                    processPipeline()
+                }
+                const removeOperation = (index) => {
+                    operations.value.splice(index, 1)
+                    processPipeline()
+                }
+                const processPipeline = async () => {
+                    loading.value = true
+                    try {
+                        const res = await axios.post('/api/process', {
+                            filename: filename.value,
+                            operations: operations.value
+                        })
+                        previewData.value = res.data.preview.data
+                        previewColumns.value = res.data.preview.columns
+                        stats.value = res.data.preview.stats
+                    } catch (e) {
+                        alert('Processing failed: ' + (e.response?.data?.error || e.message))
+                    } finally {
+                        loading.value = false
+                    }
+                }
+                const exportData = async (format) => {
+                    try {
+                        const res = await axios.post('/api/export', {
+                            filename: filename.value,
+                            operations: operations.value,
+                            format: format
+                        }, { responseType: 'blob' })
+                        const url = window.URL.createObjectURL(new Blob([res.data]))
+                        const link = document.createElement('a')
+                        link.href = url
+                        link.setAttribute('download', `processed_${filename.value.split('.')[0]}.${format}`)
+                        document.body.appendChild(link)
+                        link.click()
+                    } catch (e) {
+                        alert('Export failed')
+                    }
+                }
+                const getOpName = (type) => {
+                    const map = {
+                        'filter': '筛选 (Filter)',
+                        'sort': '排序 (Sort)',
+                        'fillna': '填充缺失 (Fill NA)',
+                        'drop_duplicates': '去重 (Dedupe)',
+                        'select_columns': '列选择 (Select)',
+                        'rename': '重命名 (Rename)'
+                    }
+                    return map[type] || type
+                }
+                return {
+                    filename, previewData, previewColumns, stats, operations, loading,
+                    showAddOpModal, newOp, tempRenameCol, tempRenameVal, fileInput,
+                    triggerFileInput, handleFileSelect, handleDrop,
+                    addOperation, removeOperation, exportData, getOpName,
+                    loadDemoData
+                }
+            }
+        }).mount('#app')
+    </script>
+</body>
+</html>

test.csv ADDED Viewed

	@@ -0,0 +1,6 @@

+id,name,age,city
+1,Alice,30,New York
+2,Bob,25,Los Angeles
+3,Charlie,,Chicago
+4,Alice,30,New York
+5,David,40,