File size: 4,293 Bytes
f16de1b
 
 
87def6b
f16de1b
70f37b4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6496b12
70f37b4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6496b12
 
f16de1b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
---
license: mit
sdk: streamlit
sdk_version: 1.55.0
---
# 🧠 DataMind Agent
### AI-Powered Data Analyst β€” LangChain + Gemini + Streamlit

Upload any data file (CSV, Excel, JSON) and chat with your data using natural language. The agent analyzes, visualizes, and explains your data powered by Google Gemini.

---

## πŸš€ Features

| Feature | Description |
|---|---|
| πŸ“‚ Multi-format support | CSV, Excel (.xlsx/.xls), JSON |
| πŸ’¬ Natural language Q&A | Ask anything, get intelligent answers |
| πŸ“Š Auto visualizations | AI picks the best chart for your question |
| 🎨 Custom chart builder | Build any chart with dropdown controls |
| πŸ” Data explorer | Filter, search, and download raw data |
| 🧠 AI data summary | Executive summary generated by Gemini |

---

## πŸ“ Project Structure

```
data-analyst-agent/
β”œβ”€β”€ app.py              # Streamlit UI (main app)
β”œβ”€β”€ core_agent.py       # LangChain + Gemini logic
β”œβ”€β”€ requirements.txt    # Python dependencies
β”œβ”€β”€ .env                # API key config
β”œβ”€β”€ sample_data.csv     # Test dataset (sales data)
└── README.md           # This file
```

---

## βš™οΈ Setup & Installation

### Step 1 β€” Clone / download the project
```bash
cd data-analyst-agent
```

### Step 2 β€” Create a virtual environment (recommended)
```bash
python -m venv venv

# On Windows:
venv\Scripts\activate

# On Mac/Linux:
source venv/bin/activate
```

### Step 3 β€” Install dependencies
```bash
pip install -r requirements.txt
```

### Step 4 β€” Get your free Gemini API key
1. Go to [https://aistudio.google.com/app/apikey](https://aistudio.google.com/app/apikey)
2. Sign in with Google
3. Click **"Create API Key"**
4. Copy the key (starts with `AIza...`)

### Step 5 β€” Add your API key
Either paste it directly in the app sidebar, OR add it to `.env`:
```
GOOGLE_API_KEY=AIzaYourKeyHere
```

### Step 6 β€” Run the app
```bash
streamlit run app.py
```

The app opens at **http://localhost:8501**

---

## 🎯 How to Use

1. **Paste your Gemini API key** in the sidebar
2. **Upload a data file** (CSV, Excel, or JSON)
3. **Dashboard tab** β€” see auto-generated stats and charts
4. **Chat tab** β€” ask questions like:
   - *"What are the top selling products?"*
   - *"Is there a correlation between age and spending?"*
   - *"Show me outliers in the sales column"*
5. **Charts tab** β€” build custom visualizations
6. **Raw Data tab** β€” filter and download your data

---

## πŸ’‘ Example Questions to Ask

```
"What is the average profit by category?"
"Which region has the highest sales?"
"Are there any missing values I should worry about?"
"What trends do you see in the data over time?"
"Which customers are the most valuable?"
"Give me a statistical summary of all numeric columns"
"What correlations exist between the columns?"
```

---

## πŸ—οΈ Architecture

```
User (Streamlit UI)
       β”‚
       β–Ό
  app.py (UI Layer)
       β”‚
       β”œβ”€β”€ core_agent.py
       β”‚       β”œβ”€β”€ load_file()          β†’ Parses CSV/Excel/JSON β†’ DataFrame
       β”‚       β”œβ”€β”€ profile_dataframe()  β†’ Statistical profiling
       β”‚       β”œβ”€β”€ ask_agent()          β†’ LangChain β†’ Gemini β†’ Answer
       β”‚       β”œβ”€β”€ make_plotly_chart()  β†’ Renders visualizations
       β”‚       └── ai_recommend_chart() β†’ Gemini picks best chart
       β”‚
       └── Google Gemini 1.5 Flash (via LangChain)
```

---

## πŸ“¦ Key Libraries Used

| Library | Purpose |
|---|---|
| `langchain` | Agent framework, prompt management |
| `langchain-google-genai` | Gemini LLM integration |
| `streamlit` | Web UI |
| `pandas` | Data loading and manipulation |
| `plotly` | Interactive visualizations |
| `openpyxl` / `xlrd` | Excel file support |

---

## πŸ”§ Customization Ideas

- Add **PDF support** using `pdfplumber`
- Add **database connection** (SQLite, PostgreSQL)
- Add **export to PowerPoint** for chart reports
- Add **multi-file comparison** mode
- Deploy to **Streamlit Cloud** (free hosting)

---

## πŸ†“ Free Tier Limits (Gemini 1.5 Flash)
- 15 requests per minute
- 1 million tokens per minute
- 1,500 requests per day

This is more than enough for personal data analysis projects!

---

*Built with ❀️ using LangChain + Google Gemini + Streamlit*