--- license: mit sdk: streamlit sdk_version: 1.55.0 --- # 🧠 DataMind Agent ### AI-Powered Data Analyst — LangChain + Gemini + Streamlit Upload any data file (CSV, Excel, JSON) and chat with your data using natural language. The agent analyzes, visualizes, and explains your data powered by Google Gemini. --- ## 🚀 Features | Feature | Description | |---|---| | 📂 Multi-format support | CSV, Excel (.xlsx/.xls), JSON | | 💬 Natural language Q&A | Ask anything, get intelligent answers | | 📊 Auto visualizations | AI picks the best chart for your question | | 🎨 Custom chart builder | Build any chart with dropdown controls | | 🔍 Data explorer | Filter, search, and download raw data | | 🧠 AI data summary | Executive summary generated by Gemini | --- ## 📁 Project Structure ``` data-analyst-agent/ ├── app.py # Streamlit UI (main app) ├── core_agent.py # LangChain + Gemini logic ├── requirements.txt # Python dependencies ├── .env # API key config ├── sample_data.csv # Test dataset (sales data) └── README.md # This file ``` --- ## ⚙️ Setup & Installation ### Step 1 — Clone / download the project ```bash cd data-analyst-agent ``` ### Step 2 — Create a virtual environment (recommended) ```bash python -m venv venv # On Windows: venv\Scripts\activate # On Mac/Linux: source venv/bin/activate ``` ### Step 3 — Install dependencies ```bash pip install -r requirements.txt ``` ### Step 4 — Get your free Gemini API key 1. Go to [https://aistudio.google.com/app/apikey](https://aistudio.google.com/app/apikey) 2. Sign in with Google 3. Click **"Create API Key"** 4. Copy the key (starts with `AIza...`) ### Step 5 — Add your API key Either paste it directly in the app sidebar, OR add it to `.env`: ``` GOOGLE_API_KEY=AIzaYourKeyHere ``` ### Step 6 — Run the app ```bash streamlit run app.py ``` The app opens at **http://localhost:8501** --- ## 🎯 How to Use 1. **Paste your Gemini API key** in the sidebar 2. **Upload a data file** (CSV, Excel, or JSON) 3. **Dashboard tab** — see auto-generated stats and charts 4. **Chat tab** — ask questions like: - *"What are the top selling products?"* - *"Is there a correlation between age and spending?"* - *"Show me outliers in the sales column"* 5. **Charts tab** — build custom visualizations 6. **Raw Data tab** — filter and download your data --- ## 💡 Example Questions to Ask ``` "What is the average profit by category?" "Which region has the highest sales?" "Are there any missing values I should worry about?" "What trends do you see in the data over time?" "Which customers are the most valuable?" "Give me a statistical summary of all numeric columns" "What correlations exist between the columns?" ``` --- ## 🏗️ Architecture ``` User (Streamlit UI) │ ▼ app.py (UI Layer) │ ├── core_agent.py │ ├── load_file() → Parses CSV/Excel/JSON → DataFrame │ ├── profile_dataframe() → Statistical profiling │ ├── ask_agent() → LangChain → Gemini → Answer │ ├── make_plotly_chart() → Renders visualizations │ └── ai_recommend_chart() → Gemini picks best chart │ └── Google Gemini 1.5 Flash (via LangChain) ``` --- ## 📦 Key Libraries Used | Library | Purpose | |---|---| | `langchain` | Agent framework, prompt management | | `langchain-google-genai` | Gemini LLM integration | | `streamlit` | Web UI | | `pandas` | Data loading and manipulation | | `plotly` | Interactive visualizations | | `openpyxl` / `xlrd` | Excel file support | --- ## 🔧 Customization Ideas - Add **PDF support** using `pdfplumber` - Add **database connection** (SQLite, PostgreSQL) - Add **export to PowerPoint** for chart reports - Add **multi-file comparison** mode - Deploy to **Streamlit Cloud** (free hosting) --- ## 🆓 Free Tier Limits (Gemini 1.5 Flash) - 15 requests per minute - 1 million tokens per minute - 1,500 requests per day This is more than enough for personal data analysis projects! --- *Built with ❤️ using LangChain + Google Gemini + Streamlit*