File size: 2,838 Bytes
759a407
3190365
 
 
 
759a407
 
 
 
3190365
153e311
759a407
 
3190365
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
---
title: Pandas CSV Analyzer (LlamaIndex)
emoji: 🐼
colorFrom: purple
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: false
license: mit
short_description: An AI agent for CSV analysis with LlamaIndex & Pandas.
sdk_version: 5.44.1
---

# 🐼 Pandas CSV Analyzer (LlamaIndex)

This is an advanced AI agent that allows you to analyze CSV files by asking questions in natural language. Upload a file, ask your questions, and get instant insights without writing a single line of code.

## ✨ Key Features

*   **Natural Language Queries:** Ask questions like, "Which branch had the highest total revenue?" or "Show the top 5 best-selling products."
*   **Pandas Code Generation:** The AI generates and executes the necessary Python (Pandas) code to answer your query, displaying the code used for full transparency.
*   **PDF Report Generation:** Download your entire conversation history and analysis in a clean and organized PDF report.
*   **Robust Architecture:** Built with the new LlamaIndex Workflows library, ensuring a modular and reliable data processing pipeline.

## 🚀 How to Use

1.  **Upload your CSV File:** Use the upload panel on the left to load your data.
2.  **Ask a Question:** Type your question about the data in the text box at the bottom.
3.  **Get Insights:** The agent will process your request, generate the answer, and display it in the chat.
4.  **Download the Report:** When your analysis is complete, click "Generate and Download PDF" to get a full report.

## 🛠️ How it Works (Tech Stack)

This project integrates several cutting-edge technologies to create a seamless data analysis experience:

*   **Interface:** **Gradio** is used to build the interactive web interface.
*   **Data Manipulation:** **Pandas** is the engine behind all data manipulation and analysis of the CSV file.
*   **AI Orchestration:** **LlamaIndex Workflows** manages the entire process, from receiving the user's question to synthesizing the final answer. This is a modern replacement for the older `QueryPipelines`.
*   **Language Model (LLM):** The **Groq API** provides access to high-speed language models (like Llama 3) to generate Pandas code and synthesize human-readable answers.
*   **PDF Generation:** The **FPDF2** library is used to create reports from the conversation history.

The workflow is as follows:
1.  The user submits a question.
2.  The LlamaIndex `PandasWorkflow` is initiated.
3.  The LLM receives the question, data schema, and examples, then generates a Pandas expression.
4.  This expression is safely executed in the backend.
5.  The result is sent back to the LLM, which generates a final, natural-language response for the user.

---

*Developed by Yuri Arduino Bernardineli Alves*
*   **GitHub:** [YuriArduino](https://github.com/YuriArduino)
*   **Email:** yuriarduino@gmail.com