Spaces:
Running
Running
Pulastya B
fix: Support both GOOGLE_API_KEY and GEMINI_API_KEY env vars for HuggingFace compatibility
fde5dd3
metadata
title: DevSprint Data Science Agent
emoji: π€
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
app_port: 7860
DevSprint Data Science Agent π€
An intelligent AI agent for automated data science workflows, powered by Google Gemini 2.5 Flash with 82+ specialized tools for data analysis, visualization, and machine learning.
Features
- π Automated EDA: YData profiling, statistical analysis, data quality reports
- π Smart Visualizations: Plotly dashboards, matplotlib plots, interactive charts
- π§Ή Data Cleaning: Missing value handling, outlier detection, type conversion
- π οΈ Feature Engineering: Automated feature creation, encoding, scaling
- π€ ML Training: AutoML with XGBoost, LightGBM, CatBoost, Neural Networks
- π¬ Natural Language Interface: Chat-based interaction for complex workflows
- π Business Intelligence: KPI tracking, trend analysis, forecasting
Tech Stack
- Backend: FastAPI + Python 3.12
- LLM: Google Gemini 2.5 Flash (text-based tool calling)
- Data Processing: Polars (high-performance dataframes)
- Frontend: React 19 + TypeScript + Vite
- ML Libraries: Scikit-learn, XGBoost, LightGBM, CatBoost, PyTorch
Usage
- Upload your CSV/Excel dataset
- Ask questions in natural language (e.g., "Generate a detailed profiling report")
- The agent automatically selects and executes the right tools
- View generated reports, visualizations, and insights
Memory Optimization
For large datasets (>50k rows or >10MB), the agent automatically:
- Samples to 50,000 rows for profiling
- Enables minimal mode to reduce memory usage
- Disables expensive correlation/interaction calculations
This ensures smooth operation even with large datasets on HuggingFace's 16GB RAM.
Environment Variables
Set these in HuggingFace Spaces settings (Settings β Repository secrets):
Required:
GEMINI_API_KEY- Your Google Gemini API key (get from https://aistudio.google.com/app/apikey)LLM_PROVIDER- Set togeminito use Gemini (orgroqif you have Groq API key)
Optional:
GROQ_API_KEY- Only if using Groq provider instead of Gemini
Note: The code supports both GOOGLE_API_KEY and GEMINI_API_KEY environment variable names.
Local Development
# Clone repository
git clone https://huggingface.co/spaces/YOUR_USERNAME/devs-print-data-science-agent
cd devs-print-data-science-agent
# Install dependencies
pip install -r requirements.txt
npm install --prefix FRRONTEEEND
# Build frontend
cd FRRONTEEEND && npm run build && cd ..
# Set API key
export GEMINI_API_KEY=your_key_here
# Run server
uvicorn src.api.app:app --host 0.0.0.0 --port 7860
Architecture
βββββββββββββββββββ
β React Frontend β β User uploads data + asks questions
ββββββββββ¬βββββββββ
β
ββββββββββΌβββββββββ
β FastAPI Server β β Serves frontend + API endpoints
ββββββββββ¬βββββββββ
β
ββββββββββΌβββββββββ
β Orchestrator β β LLM-driven tool selection & execution
ββββββββββ¬βββββββββ
β
ββββββββββΌβββββββββ
β 82+ Tools β β Specialized data science functions
βββββββββββββββββββ
Key Components
- Orchestrator (src/orchestrator.py): ReAct-based tool calling with Gemini
- Tools Registry (src/tools/): 82+ specialized data science tools
- Session Memory (src/session_memory.py): Conversation history + file tracking
- Artifact Store (src/storage/artifact_store.py): File management + metadata
Deployment
This Space uses a Docker deployment for maximum compatibility:
- Base image:
python:3.12-slim - Multi-stage build (Node.js for frontend, Python for backend)
- Auto-exposes port 7860 for HuggingFace
- All dependencies bundled in container
Contributing
Built for DevSprint Hackathon 2025. Contributions welcome post-hackathon!
License
MIT License - see LICENSE file for details