Spaces:
Running
Running
| license: mit | |
| sdk: streamlit | |
| sdk_version: 1.55.0 | |
| # π§ DataMind Agent | |
| ### AI-Powered Data Analyst β LangChain + Gemini + Streamlit | |
| Upload any data file (CSV, Excel, JSON) and chat with your data using natural language. The agent analyzes, visualizes, and explains your data powered by Google Gemini. | |
| --- | |
| ## π Features | |
| | Feature | Description | | |
| |---|---| | |
| | π Multi-format support | CSV, Excel (.xlsx/.xls), JSON | | |
| | π¬ Natural language Q&A | Ask anything, get intelligent answers | | |
| | π Auto visualizations | AI picks the best chart for your question | | |
| | π¨ Custom chart builder | Build any chart with dropdown controls | | |
| | π Data explorer | Filter, search, and download raw data | | |
| | π§ AI data summary | Executive summary generated by Gemini | | |
| --- | |
| ## π Project Structure | |
| ``` | |
| data-analyst-agent/ | |
| βββ app.py # Streamlit UI (main app) | |
| βββ core_agent.py # LangChain + Gemini logic | |
| βββ requirements.txt # Python dependencies | |
| βββ .env # API key config | |
| βββ sample_data.csv # Test dataset (sales data) | |
| βββ README.md # This file | |
| ``` | |
| --- | |
| ## βοΈ Setup & Installation | |
| ### Step 1 β Clone / download the project | |
| ```bash | |
| cd data-analyst-agent | |
| ``` | |
| ### Step 2 β Create a virtual environment (recommended) | |
| ```bash | |
| python -m venv venv | |
| # On Windows: | |
| venv\Scripts\activate | |
| # On Mac/Linux: | |
| source venv/bin/activate | |
| ``` | |
| ### Step 3 β Install dependencies | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ### Step 4 β Get your free Gemini API key | |
| 1. Go to [https://aistudio.google.com/app/apikey](https://aistudio.google.com/app/apikey) | |
| 2. Sign in with Google | |
| 3. Click **"Create API Key"** | |
| 4. Copy the key (starts with `AIza...`) | |
| ### Step 5 β Add your API key | |
| Either paste it directly in the app sidebar, OR add it to `.env`: | |
| ``` | |
| GOOGLE_API_KEY=AIzaYourKeyHere | |
| ``` | |
| ### Step 6 β Run the app | |
| ```bash | |
| streamlit run app.py | |
| ``` | |
| The app opens at **http://localhost:8501** | |
| --- | |
| ## π― How to Use | |
| 1. **Paste your Gemini API key** in the sidebar | |
| 2. **Upload a data file** (CSV, Excel, or JSON) | |
| 3. **Dashboard tab** β see auto-generated stats and charts | |
| 4. **Chat tab** β ask questions like: | |
| - *"What are the top selling products?"* | |
| - *"Is there a correlation between age and spending?"* | |
| - *"Show me outliers in the sales column"* | |
| 5. **Charts tab** β build custom visualizations | |
| 6. **Raw Data tab** β filter and download your data | |
| --- | |
| ## π‘ Example Questions to Ask | |
| ``` | |
| "What is the average profit by category?" | |
| "Which region has the highest sales?" | |
| "Are there any missing values I should worry about?" | |
| "What trends do you see in the data over time?" | |
| "Which customers are the most valuable?" | |
| "Give me a statistical summary of all numeric columns" | |
| "What correlations exist between the columns?" | |
| ``` | |
| --- | |
| ## ποΈ Architecture | |
| ``` | |
| User (Streamlit UI) | |
| β | |
| βΌ | |
| app.py (UI Layer) | |
| β | |
| βββ core_agent.py | |
| β βββ load_file() β Parses CSV/Excel/JSON β DataFrame | |
| β βββ profile_dataframe() β Statistical profiling | |
| β βββ ask_agent() β LangChain β Gemini β Answer | |
| β βββ make_plotly_chart() β Renders visualizations | |
| β βββ ai_recommend_chart() β Gemini picks best chart | |
| β | |
| βββ Google Gemini 1.5 Flash (via LangChain) | |
| ``` | |
| --- | |
| ## π¦ Key Libraries Used | |
| | Library | Purpose | | |
| |---|---| | |
| | `langchain` | Agent framework, prompt management | | |
| | `langchain-google-genai` | Gemini LLM integration | | |
| | `streamlit` | Web UI | | |
| | `pandas` | Data loading and manipulation | | |
| | `plotly` | Interactive visualizations | | |
| | `openpyxl` / `xlrd` | Excel file support | | |
| --- | |
| ## π§ Customization Ideas | |
| - Add **PDF support** using `pdfplumber` | |
| - Add **database connection** (SQLite, PostgreSQL) | |
| - Add **export to PowerPoint** for chart reports | |
| - Add **multi-file comparison** mode | |
| - Deploy to **Streamlit Cloud** (free hosting) | |
| --- | |
| ## π Free Tier Limits (Gemini 1.5 Flash) | |
| - 15 requests per minute | |
| - 1 million tokens per minute | |
| - 1,500 requests per day | |
| This is more than enough for personal data analysis projects! | |
| --- | |
| *Built with β€οΈ using LangChain + Google Gemini + Streamlit* |