data_analyst_pro / README.md
sanjaystarc's picture
Update README.md
87def6b verified
metadata
license: mit
sdk: streamlit
sdk_version: 1.55.0

🧠 DataMind Agent

AI-Powered Data Analyst β€” LangChain + Gemini + Streamlit

Upload any data file (CSV, Excel, JSON) and chat with your data using natural language. The agent analyzes, visualizes, and explains your data powered by Google Gemini.


πŸš€ Features

Feature Description
πŸ“‚ Multi-format support CSV, Excel (.xlsx/.xls), JSON
πŸ’¬ Natural language Q&A Ask anything, get intelligent answers
πŸ“Š Auto visualizations AI picks the best chart for your question
🎨 Custom chart builder Build any chart with dropdown controls
πŸ” Data explorer Filter, search, and download raw data
🧠 AI data summary Executive summary generated by Gemini

πŸ“ Project Structure

data-analyst-agent/
β”œβ”€β”€ app.py              # Streamlit UI (main app)
β”œβ”€β”€ core_agent.py       # LangChain + Gemini logic
β”œβ”€β”€ requirements.txt    # Python dependencies
β”œβ”€β”€ .env                # API key config
β”œβ”€β”€ sample_data.csv     # Test dataset (sales data)
└── README.md           # This file

βš™οΈ Setup & Installation

Step 1 β€” Clone / download the project

cd data-analyst-agent

Step 2 β€” Create a virtual environment (recommended)

python -m venv venv

# On Windows:
venv\Scripts\activate

# On Mac/Linux:
source venv/bin/activate

Step 3 β€” Install dependencies

pip install -r requirements.txt

Step 4 β€” Get your free Gemini API key

  1. Go to https://aistudio.google.com/app/apikey
  2. Sign in with Google
  3. Click "Create API Key"
  4. Copy the key (starts with AIza...)

Step 5 β€” Add your API key

Either paste it directly in the app sidebar, OR add it to .env:

GOOGLE_API_KEY=AIzaYourKeyHere

Step 6 β€” Run the app

streamlit run app.py

The app opens at http://localhost:8501


🎯 How to Use

  1. Paste your Gemini API key in the sidebar
  2. Upload a data file (CSV, Excel, or JSON)
  3. Dashboard tab β€” see auto-generated stats and charts
  4. Chat tab β€” ask questions like:
    • "What are the top selling products?"
    • "Is there a correlation between age and spending?"
    • "Show me outliers in the sales column"
  5. Charts tab β€” build custom visualizations
  6. Raw Data tab β€” filter and download your data

πŸ’‘ Example Questions to Ask

"What is the average profit by category?"
"Which region has the highest sales?"
"Are there any missing values I should worry about?"
"What trends do you see in the data over time?"
"Which customers are the most valuable?"
"Give me a statistical summary of all numeric columns"
"What correlations exist between the columns?"

πŸ—οΈ Architecture

User (Streamlit UI)
       β”‚
       β–Ό
  app.py (UI Layer)
       β”‚
       β”œβ”€β”€ core_agent.py
       β”‚       β”œβ”€β”€ load_file()          β†’ Parses CSV/Excel/JSON β†’ DataFrame
       β”‚       β”œβ”€β”€ profile_dataframe()  β†’ Statistical profiling
       β”‚       β”œβ”€β”€ ask_agent()          β†’ LangChain β†’ Gemini β†’ Answer
       β”‚       β”œβ”€β”€ make_plotly_chart()  β†’ Renders visualizations
       β”‚       └── ai_recommend_chart() β†’ Gemini picks best chart
       β”‚
       └── Google Gemini 1.5 Flash (via LangChain)

πŸ“¦ Key Libraries Used

Library Purpose
langchain Agent framework, prompt management
langchain-google-genai Gemini LLM integration
streamlit Web UI
pandas Data loading and manipulation
plotly Interactive visualizations
openpyxl / xlrd Excel file support

πŸ”§ Customization Ideas

  • Add PDF support using pdfplumber
  • Add database connection (SQLite, PostgreSQL)
  • Add export to PowerPoint for chart reports
  • Add multi-file comparison mode
  • Deploy to Streamlit Cloud (free hosting)

πŸ†“ Free Tier Limits (Gemini 1.5 Flash)

  • 15 requests per minute
  • 1 million tokens per minute
  • 1,500 requests per day

This is more than enough for personal data analysis projects!


Built with ❀️ using LangChain + Google Gemini + Streamlit