--- title: Pandas CSV Analyzer (LlamaIndex) emoji: 🐼 colorFrom: purple colorTo: indigo sdk: gradio app_file: app.py pinned: false license: mit short_description: An AI agent for CSV analysis with LlamaIndex & Pandas. sdk_version: 5.44.1 --- # 🐼 Pandas CSV Analyzer (LlamaIndex) This is an advanced AI agent that allows you to analyze CSV files by asking questions in natural language. Upload a file, ask your questions, and get instant insights without writing a single line of code. ## ✨ Key Features * **Natural Language Queries:** Ask questions like, "Which branch had the highest total revenue?" or "Show the top 5 best-selling products." * **Pandas Code Generation:** The AI generates and executes the necessary Python (Pandas) code to answer your query, displaying the code used for full transparency. * **PDF Report Generation:** Download your entire conversation history and analysis in a clean and organized PDF report. * **Robust Architecture:** Built with the new LlamaIndex Workflows library, ensuring a modular and reliable data processing pipeline. ## 🚀 How to Use 1. **Upload your CSV File:** Use the upload panel on the left to load your data. 2. **Ask a Question:** Type your question about the data in the text box at the bottom. 3. **Get Insights:** The agent will process your request, generate the answer, and display it in the chat. 4. **Download the Report:** When your analysis is complete, click "Generate and Download PDF" to get a full report. ## 🛠️ How it Works (Tech Stack) This project integrates several cutting-edge technologies to create a seamless data analysis experience: * **Interface:** **Gradio** is used to build the interactive web interface. * **Data Manipulation:** **Pandas** is the engine behind all data manipulation and analysis of the CSV file. * **AI Orchestration:** **LlamaIndex Workflows** manages the entire process, from receiving the user's question to synthesizing the final answer. This is a modern replacement for the older `QueryPipelines`. * **Language Model (LLM):** The **Groq API** provides access to high-speed language models (like Llama 3) to generate Pandas code and synthesize human-readable answers. * **PDF Generation:** The **FPDF2** library is used to create reports from the conversation history. The workflow is as follows: 1. The user submits a question. 2. The LlamaIndex `PandasWorkflow` is initiated. 3. The LLM receives the question, data schema, and examples, then generates a Pandas expression. 4. This expression is safely executed in the backend. 5. The result is sent back to the LLM, which generates a final, natural-language response for the user. --- *Developed by Yuri Arduino Bernardineli Alves* * **GitHub:** [YuriArduino](https://github.com/YuriArduino) * **Email:** yuriarduino@gmail.com