data_analyst_pro / README.md
sanjaystarc's picture
Update README.md
87def6b verified
---
license: mit
sdk: streamlit
sdk_version: 1.55.0
---
# 🧠 DataMind Agent
### AI-Powered Data Analyst β€” LangChain + Gemini + Streamlit
Upload any data file (CSV, Excel, JSON) and chat with your data using natural language. The agent analyzes, visualizes, and explains your data powered by Google Gemini.
---
## πŸš€ Features
| Feature | Description |
|---|---|
| πŸ“‚ Multi-format support | CSV, Excel (.xlsx/.xls), JSON |
| πŸ’¬ Natural language Q&A | Ask anything, get intelligent answers |
| πŸ“Š Auto visualizations | AI picks the best chart for your question |
| 🎨 Custom chart builder | Build any chart with dropdown controls |
| πŸ” Data explorer | Filter, search, and download raw data |
| 🧠 AI data summary | Executive summary generated by Gemini |
---
## πŸ“ Project Structure
```
data-analyst-agent/
β”œβ”€β”€ app.py # Streamlit UI (main app)
β”œβ”€β”€ core_agent.py # LangChain + Gemini logic
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ .env # API key config
β”œβ”€β”€ sample_data.csv # Test dataset (sales data)
└── README.md # This file
```
---
## βš™οΈ Setup & Installation
### Step 1 β€” Clone / download the project
```bash
cd data-analyst-agent
```
### Step 2 β€” Create a virtual environment (recommended)
```bash
python -m venv venv
# On Windows:
venv\Scripts\activate
# On Mac/Linux:
source venv/bin/activate
```
### Step 3 β€” Install dependencies
```bash
pip install -r requirements.txt
```
### Step 4 β€” Get your free Gemini API key
1. Go to [https://aistudio.google.com/app/apikey](https://aistudio.google.com/app/apikey)
2. Sign in with Google
3. Click **"Create API Key"**
4. Copy the key (starts with `AIza...`)
### Step 5 β€” Add your API key
Either paste it directly in the app sidebar, OR add it to `.env`:
```
GOOGLE_API_KEY=AIzaYourKeyHere
```
### Step 6 β€” Run the app
```bash
streamlit run app.py
```
The app opens at **http://localhost:8501**
---
## 🎯 How to Use
1. **Paste your Gemini API key** in the sidebar
2. **Upload a data file** (CSV, Excel, or JSON)
3. **Dashboard tab** β€” see auto-generated stats and charts
4. **Chat tab** β€” ask questions like:
- *"What are the top selling products?"*
- *"Is there a correlation between age and spending?"*
- *"Show me outliers in the sales column"*
5. **Charts tab** β€” build custom visualizations
6. **Raw Data tab** β€” filter and download your data
---
## πŸ’‘ Example Questions to Ask
```
"What is the average profit by category?"
"Which region has the highest sales?"
"Are there any missing values I should worry about?"
"What trends do you see in the data over time?"
"Which customers are the most valuable?"
"Give me a statistical summary of all numeric columns"
"What correlations exist between the columns?"
```
---
## πŸ—οΈ Architecture
```
User (Streamlit UI)
β”‚
β–Ό
app.py (UI Layer)
β”‚
β”œβ”€β”€ core_agent.py
β”‚ β”œβ”€β”€ load_file() β†’ Parses CSV/Excel/JSON β†’ DataFrame
β”‚ β”œβ”€β”€ profile_dataframe() β†’ Statistical profiling
β”‚ β”œβ”€β”€ ask_agent() β†’ LangChain β†’ Gemini β†’ Answer
β”‚ β”œβ”€β”€ make_plotly_chart() β†’ Renders visualizations
β”‚ └── ai_recommend_chart() β†’ Gemini picks best chart
β”‚
└── Google Gemini 1.5 Flash (via LangChain)
```
---
## πŸ“¦ Key Libraries Used
| Library | Purpose |
|---|---|
| `langchain` | Agent framework, prompt management |
| `langchain-google-genai` | Gemini LLM integration |
| `streamlit` | Web UI |
| `pandas` | Data loading and manipulation |
| `plotly` | Interactive visualizations |
| `openpyxl` / `xlrd` | Excel file support |
---
## πŸ”§ Customization Ideas
- Add **PDF support** using `pdfplumber`
- Add **database connection** (SQLite, PostgreSQL)
- Add **export to PowerPoint** for chart reports
- Add **multi-file comparison** mode
- Deploy to **Streamlit Cloud** (free hosting)
---
## πŸ†“ Free Tier Limits (Gemini 1.5 Flash)
- 15 requests per minute
- 1 million tokens per minute
- 1,500 requests per day
This is more than enough for personal data analysis projects!
---
*Built with ❀️ using LangChain + Google Gemini + Streamlit*