Spaces:
Running
Running
metadata
license: mit
sdk: streamlit
sdk_version: 1.55.0
π§ DataMind Agent
AI-Powered Data Analyst β LangChain + Gemini + Streamlit
Upload any data file (CSV, Excel, JSON) and chat with your data using natural language. The agent analyzes, visualizes, and explains your data powered by Google Gemini.
π Features
| Feature | Description |
|---|---|
| π Multi-format support | CSV, Excel (.xlsx/.xls), JSON |
| π¬ Natural language Q&A | Ask anything, get intelligent answers |
| π Auto visualizations | AI picks the best chart for your question |
| π¨ Custom chart builder | Build any chart with dropdown controls |
| π Data explorer | Filter, search, and download raw data |
| π§ AI data summary | Executive summary generated by Gemini |
π Project Structure
data-analyst-agent/
βββ app.py # Streamlit UI (main app)
βββ core_agent.py # LangChain + Gemini logic
βββ requirements.txt # Python dependencies
βββ .env # API key config
βββ sample_data.csv # Test dataset (sales data)
βββ README.md # This file
βοΈ Setup & Installation
Step 1 β Clone / download the project
cd data-analyst-agent
Step 2 β Create a virtual environment (recommended)
python -m venv venv
# On Windows:
venv\Scripts\activate
# On Mac/Linux:
source venv/bin/activate
Step 3 β Install dependencies
pip install -r requirements.txt
Step 4 β Get your free Gemini API key
- Go to https://aistudio.google.com/app/apikey
- Sign in with Google
- Click "Create API Key"
- Copy the key (starts with
AIza...)
Step 5 β Add your API key
Either paste it directly in the app sidebar, OR add it to .env:
GOOGLE_API_KEY=AIzaYourKeyHere
Step 6 β Run the app
streamlit run app.py
The app opens at http://localhost:8501
π― How to Use
- Paste your Gemini API key in the sidebar
- Upload a data file (CSV, Excel, or JSON)
- Dashboard tab β see auto-generated stats and charts
- Chat tab β ask questions like:
- "What are the top selling products?"
- "Is there a correlation between age and spending?"
- "Show me outliers in the sales column"
- Charts tab β build custom visualizations
- Raw Data tab β filter and download your data
π‘ Example Questions to Ask
"What is the average profit by category?"
"Which region has the highest sales?"
"Are there any missing values I should worry about?"
"What trends do you see in the data over time?"
"Which customers are the most valuable?"
"Give me a statistical summary of all numeric columns"
"What correlations exist between the columns?"
ποΈ Architecture
User (Streamlit UI)
β
βΌ
app.py (UI Layer)
β
βββ core_agent.py
β βββ load_file() β Parses CSV/Excel/JSON β DataFrame
β βββ profile_dataframe() β Statistical profiling
β βββ ask_agent() β LangChain β Gemini β Answer
β βββ make_plotly_chart() β Renders visualizations
β βββ ai_recommend_chart() β Gemini picks best chart
β
βββ Google Gemini 1.5 Flash (via LangChain)
π¦ Key Libraries Used
| Library | Purpose |
|---|---|
langchain |
Agent framework, prompt management |
langchain-google-genai |
Gemini LLM integration |
streamlit |
Web UI |
pandas |
Data loading and manipulation |
plotly |
Interactive visualizations |
openpyxl / xlrd |
Excel file support |
π§ Customization Ideas
- Add PDF support using
pdfplumber - Add database connection (SQLite, PostgreSQL)
- Add export to PowerPoint for chart reports
- Add multi-file comparison mode
- Deploy to Streamlit Cloud (free hosting)
π Free Tier Limits (Gemini 1.5 Flash)
- 15 requests per minute
- 1 million tokens per minute
- 1,500 requests per day
This is more than enough for personal data analysis projects!
Built with β€οΈ using LangChain + Google Gemini + Streamlit