ArduinoYuri's picture
Update README.md
153e311 verified
---
title: Pandas CSV Analyzer (LlamaIndex)
emoji: ๐Ÿผ
colorFrom: purple
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: false
license: mit
short_description: An AI agent for CSV analysis with LlamaIndex & Pandas.
sdk_version: 5.44.1
---
# ๐Ÿผ Pandas CSV Analyzer (LlamaIndex)
This is an advanced AI agent that allows you to analyze CSV files by asking questions in natural language. Upload a file, ask your questions, and get instant insights without writing a single line of code.
## โœจ Key Features
* **Natural Language Queries:** Ask questions like, "Which branch had the highest total revenue?" or "Show the top 5 best-selling products."
* **Pandas Code Generation:** The AI generates and executes the necessary Python (Pandas) code to answer your query, displaying the code used for full transparency.
* **PDF Report Generation:** Download your entire conversation history and analysis in a clean and organized PDF report.
* **Robust Architecture:** Built with the new LlamaIndex Workflows library, ensuring a modular and reliable data processing pipeline.
## ๐Ÿš€ How to Use
1. **Upload your CSV File:** Use the upload panel on the left to load your data.
2. **Ask a Question:** Type your question about the data in the text box at the bottom.
3. **Get Insights:** The agent will process your request, generate the answer, and display it in the chat.
4. **Download the Report:** When your analysis is complete, click "Generate and Download PDF" to get a full report.
## ๐Ÿ› ๏ธ How it Works (Tech Stack)
This project integrates several cutting-edge technologies to create a seamless data analysis experience:
* **Interface:** **Gradio** is used to build the interactive web interface.
* **Data Manipulation:** **Pandas** is the engine behind all data manipulation and analysis of the CSV file.
* **AI Orchestration:** **LlamaIndex Workflows** manages the entire process, from receiving the user's question to synthesizing the final answer. This is a modern replacement for the older `QueryPipelines`.
* **Language Model (LLM):** The **Groq API** provides access to high-speed language models (like Llama 3) to generate Pandas code and synthesize human-readable answers.
* **PDF Generation:** The **FPDF2** library is used to create reports from the conversation history.
The workflow is as follows:
1. The user submits a question.
2. The LlamaIndex `PandasWorkflow` is initiated.
3. The LLM receives the question, data schema, and examples, then generates a Pandas expression.
4. This expression is safely executed in the backend.
5. The result is sent back to the LLM, which generates a final, natural-language response for the user.
---
*Developed by Yuri Arduino Bernardineli Alves*
* **GitHub:** [YuriArduino](https://github.com/YuriArduino)
* **Email:** yuriarduino@gmail.com