--- title: Data Analyst Agent emoji: 📊 colorFrom: yellow colorTo: blue sdk: gradio app_file: app.py pinned: false license: mit --- # Data Analyst Agent ## Question What does an agentic data-analysis loop look like when the generated code is visible? ## System Boundary This Space analyzes CSV files by generating pandas code, executing it in a constrained namespace, and returning both the result and the code. The transparency is deliberate. ## Method The app reads the uploaded CSV, summarizes the dataframe schema, sends the user question and schema to an instruction model, extracts executable pandas code, runs it with safeguards, and displays tables or Plotly charts. ## Technique This is a tool-using agent pattern. The language model does not directly compute the answer; it writes code that a deterministic tool executes. The useful boundary is between the model and the runtime. The model proposes a program. Python computes the result. The user can inspect both. ## Output The app returns generated code, execution logs, result tables, and visualizations. ## Why It Matters Agent demos are often opaque. This one makes the reasoning artifact inspectable: the code. That lets users verify calculations, learn from the workflow, and debug failures. ## What To Notice Look at the generated pandas before trusting the answer. If the code is wrong, the result is wrong. This is the correct failure mode because it is visible. ## Effect In Practice Transparent code generation can speed up exploratory analysis while preserving auditability. It is especially useful for teaching, notebooks, and internal analytics tools. ## Hugging Face Extension The Space can be evaluated with a dataset of CSV files, natural-language questions, expected code patterns, and expected answers. ## Limitations Generated code should be reviewed. The execution sandbox is intentionally narrow and does not replace a hardened production isolation layer. ## Run Locally ```bash pip install -r requirements.txt python app.py ```