Spaces:
Running
Running
File size: 4,349 Bytes
09cd93c fde5dd3 09cd93c fde5dd3 09cd93c fde5dd3 09cd93c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
---
title: DevSprint Data Science Agent
emoji: π€
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
app_port: 7860
---
# DevSprint Data Science Agent π€
An intelligent AI agent for automated data science workflows, powered by Google Gemini 2.5 Flash with 82+ specialized tools for data analysis, visualization, and machine learning.
## Features
- π **Automated EDA**: YData profiling, statistical analysis, data quality reports
- π **Smart Visualizations**: Plotly dashboards, matplotlib plots, interactive charts
- π§Ή **Data Cleaning**: Missing value handling, outlier detection, type conversion
- π οΈ **Feature Engineering**: Automated feature creation, encoding, scaling
- π€ **ML Training**: AutoML with XGBoost, LightGBM, CatBoost, Neural Networks
- π¬ **Natural Language Interface**: Chat-based interaction for complex workflows
- π **Business Intelligence**: KPI tracking, trend analysis, forecasting
## Tech Stack
- **Backend**: FastAPI + Python 3.12
- **LLM**: Google Gemini 2.5 Flash (text-based tool calling)
- **Data Processing**: Polars (high-performance dataframes)
- **Frontend**: React 19 + TypeScript + Vite
- **ML Libraries**: Scikit-learn, XGBoost, LightGBM, CatBoost, PyTorch
## Usage
1. Upload your CSV/Excel dataset
2. Ask questions in natural language (e.g., "Generate a detailed profiling report")
3. The agent automatically selects and executes the right tools
4. View generated reports, visualizations, and insights
## Memory Optimization
For large datasets (>50k rows or >10MB), the agent automatically:
- Samples to 50,000 rows for profiling
- Enables minimal mode to reduce memory usage
- Disables expensive correlation/interaction calculations
This ensures smooth operation even with large datasets on HuggingFace's 16GB RAM.
## Environment Variables
Set these in HuggingFace Spaces settings (Settings β Repository secrets):
**Required:**
- `GEMINI_API_KEY` - Your Google Gemini API key (get from https://aistudio.google.com/app/apikey)
- `LLM_PROVIDER` - Set to `gemini` to use Gemini (or `groq` if you have Groq API key)
**Optional:**
- `GROQ_API_KEY` - Only if using Groq provider instead of Gemini
**Note**: The code supports both `GOOGLE_API_KEY` and `GEMINI_API_KEY` environment variable names.
## Local Development
```bash
# Clone repository
git clone https://huggingface.co/spaces/YOUR_USERNAME/devs-print-data-science-agent
cd devs-print-data-science-agent
# Install dependencies
pip install -r requirements.txt
npm install --prefix FRRONTEEEND
# Build frontend
cd FRRONTEEEND && npm run build && cd ..
# Set API key
export GEMINI_API_KEY=your_key_here
# Run server
uvicorn src.api.app:app --host 0.0.0.0 --port 7860
```
## Architecture
```
βββββββββββββββββββ
β React Frontend β β User uploads data + asks questions
ββββββββββ¬βββββββββ
β
ββββββββββΌβββββββββ
β FastAPI Server β β Serves frontend + API endpoints
ββββββββββ¬βββββββββ
β
ββββββββββΌβββββββββ
β Orchestrator β β LLM-driven tool selection & execution
ββββββββββ¬βββββββββ
β
ββββββββββΌβββββββββ
β 82+ Tools β β Specialized data science functions
βββββββββββββββββββ
```
## Key Components
- **Orchestrator** ([src/orchestrator.py](src/orchestrator.py)): ReAct-based tool calling with Gemini
- **Tools Registry** ([src/tools/](src/tools/)): 82+ specialized data science tools
- **Session Memory** ([src/session_memory.py](src/session_memory.py)): Conversation history + file tracking
- **Artifact Store** ([src/storage/artifact_store.py](src/storage/artifact_store.py)): File management + metadata
## Deployment
This Space uses a **Docker** deployment for maximum compatibility:
- Base image: `python:3.12-slim`
- Multi-stage build (Node.js for frontend, Python for backend)
- Auto-exposes port 7860 for HuggingFace
- All dependencies bundled in container
## Contributing
Built for DevSprint Hackathon 2025. Contributions welcome post-hackathon!
## License
MIT License - see LICENSE file for details
|