Spaces:
Sleeping
Sleeping
File size: 4,293 Bytes
f16de1b 87def6b f16de1b 70f37b4 6496b12 70f37b4 6496b12 f16de1b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 | ---
license: mit
sdk: streamlit
sdk_version: 1.55.0
---
# π§ DataMind Agent
### AI-Powered Data Analyst β LangChain + Gemini + Streamlit
Upload any data file (CSV, Excel, JSON) and chat with your data using natural language. The agent analyzes, visualizes, and explains your data powered by Google Gemini.
---
## π Features
| Feature | Description |
|---|---|
| π Multi-format support | CSV, Excel (.xlsx/.xls), JSON |
| π¬ Natural language Q&A | Ask anything, get intelligent answers |
| π Auto visualizations | AI picks the best chart for your question |
| π¨ Custom chart builder | Build any chart with dropdown controls |
| π Data explorer | Filter, search, and download raw data |
| π§ AI data summary | Executive summary generated by Gemini |
---
## π Project Structure
```
data-analyst-agent/
βββ app.py # Streamlit UI (main app)
βββ core_agent.py # LangChain + Gemini logic
βββ requirements.txt # Python dependencies
βββ .env # API key config
βββ sample_data.csv # Test dataset (sales data)
βββ README.md # This file
```
---
## βοΈ Setup & Installation
### Step 1 β Clone / download the project
```bash
cd data-analyst-agent
```
### Step 2 β Create a virtual environment (recommended)
```bash
python -m venv venv
# On Windows:
venv\Scripts\activate
# On Mac/Linux:
source venv/bin/activate
```
### Step 3 β Install dependencies
```bash
pip install -r requirements.txt
```
### Step 4 β Get your free Gemini API key
1. Go to [https://aistudio.google.com/app/apikey](https://aistudio.google.com/app/apikey)
2. Sign in with Google
3. Click **"Create API Key"**
4. Copy the key (starts with `AIza...`)
### Step 5 β Add your API key
Either paste it directly in the app sidebar, OR add it to `.env`:
```
GOOGLE_API_KEY=AIzaYourKeyHere
```
### Step 6 β Run the app
```bash
streamlit run app.py
```
The app opens at **http://localhost:8501**
---
## π― How to Use
1. **Paste your Gemini API key** in the sidebar
2. **Upload a data file** (CSV, Excel, or JSON)
3. **Dashboard tab** β see auto-generated stats and charts
4. **Chat tab** β ask questions like:
- *"What are the top selling products?"*
- *"Is there a correlation between age and spending?"*
- *"Show me outliers in the sales column"*
5. **Charts tab** β build custom visualizations
6. **Raw Data tab** β filter and download your data
---
## π‘ Example Questions to Ask
```
"What is the average profit by category?"
"Which region has the highest sales?"
"Are there any missing values I should worry about?"
"What trends do you see in the data over time?"
"Which customers are the most valuable?"
"Give me a statistical summary of all numeric columns"
"What correlations exist between the columns?"
```
---
## ποΈ Architecture
```
User (Streamlit UI)
β
βΌ
app.py (UI Layer)
β
βββ core_agent.py
β βββ load_file() β Parses CSV/Excel/JSON β DataFrame
β βββ profile_dataframe() β Statistical profiling
β βββ ask_agent() β LangChain β Gemini β Answer
β βββ make_plotly_chart() β Renders visualizations
β βββ ai_recommend_chart() β Gemini picks best chart
β
βββ Google Gemini 1.5 Flash (via LangChain)
```
---
## π¦ Key Libraries Used
| Library | Purpose |
|---|---|
| `langchain` | Agent framework, prompt management |
| `langchain-google-genai` | Gemini LLM integration |
| `streamlit` | Web UI |
| `pandas` | Data loading and manipulation |
| `plotly` | Interactive visualizations |
| `openpyxl` / `xlrd` | Excel file support |
---
## π§ Customization Ideas
- Add **PDF support** using `pdfplumber`
- Add **database connection** (SQLite, PostgreSQL)
- Add **export to PowerPoint** for chart reports
- Add **multi-file comparison** mode
- Deploy to **Streamlit Cloud** (free hosting)
---
## π Free Tier Limits (Gemini 1.5 Flash)
- 15 requests per minute
- 1 million tokens per minute
- 1,500 requests per day
This is more than enough for personal data analysis projects!
---
*Built with β€οΈ using LangChain + Google Gemini + Streamlit* |