Spaces:
Running
Running
File size: 4,100 Bytes
09cd93c 226ac39 09cd93c 50a857f 09cd93c 50a857f 09cd93c 50a857f 09cd93c 50a857f 09cd93c 50a857f 09cd93c 50a857f 09cd93c 50a857f 09cd93c 50a857f 09cd93c 50a857f 09cd93c 50a857f 09cd93c 50a857f 09cd93c 50a857f 09cd93c d92d2aa 09cd93c d92d2aa 09cd93c d92d2aa 09cd93c d92d2aa 09cd93c d92d2aa 09cd93c d92d2aa 09cd93c d92d2aa 09cd93c d92d2aa 09cd93c d92d2aa 09cd93c d92d2aa 09cd93c d92d2aa 09cd93c d92d2aa 09cd93c d92d2aa 09cd93c d92d2aa 09cd93c d92d2aa 09cd93c d92d2aa 09cd93c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
---
title: DevSprint Data Science Agent
emoji: π€
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
app_port: 7860
---
# DevSprint Data Science Agent π€
An intelligent AI agent for automated data science workflows, powered by Google Gemini 2.5 Flash with 82+ specialized tools for data analysis, visualization, and machine learning.
## Features
- π **Automated EDA**: YData profiling, statistical analysis, data quality reports
- π **Smart Visualizations**: Plotly dashboards, matplotlib plots, interactive charts
- π§Ή **Data Cleaning**: Missing value handling, outlier detection, type conversion
- π οΈ **Feature Engineering**: Automated feature creation, encoding, scaling
- π€ **ML Training**: AutoML with XGBoost, LightGBM, CatBoost, Neural Networks
- π¬ **Natural Language Interface**: Chat-based interaction for complex workflows
- π **Business Intelligence**: KPI tracking, trend analysis, forecasting
## Tech Stack
- **Backend**: FastAPI + Python 3.12
- **LLM**: Google Gemini 2.5 Flash (text-based tool calling)
- **Data Processing**: Polars (high-performance dataframes)
- **Frontend**: React 19 + TypeScript + Vite
- **ML Libraries**: Scikit-learn, XGBoost, LightGBM, CatBoost, PyTorch
## Usage
1. Upload your CSV/Excel dataset
2. Ask questions in natural language (e.g., "Generate a detailed profiling report")
3. The agent automatically selects and executes the right tools
4. View generated reports, visualizations, and insights
## Memory Optimization
For large datasets (>50k rows or >10MB), the agent automatically:
- Samples to 50,000 rows for profiling
- Enables minimal mode to reduce memory usage
- Disables expensive correlation/interaction calculations
This ensures smooth operation even with large datasets on HuggingFace's 16GB RAM.
## Environment Variables
Set `GEMINI_API_KEY` in HuggingFace Spaces settings (Settings β Repository secrets):
```
GEMINI_API_KEY=your_google_gemini_api_key_here
```
Get your API key from: https://aistudio.google.com/app/apikey
## Local Development
```bash
# Clone repository
git clone https://huggingface.co/spaces/YOUR_USERNAME/devs-print-data-science-agent
cd devs-print-data-science-agent
# Install dependencies
pip install -r requirements.txt
npm install --prefix FRRONTEEEND
# Build frontend
cd FRRONTEEEND && npm run build && cd ..
# Set API key
export GEMINI_API_KEY=your_key_here
# Run server
uvicorn src.api.app:app --host 0.0.0.0 --port 7860
```
## Architecture
```
βββββββββββββββββββ
β React Frontend β β User uploads data + asks questions
ββββββββββ¬βββββββββ
β
ββββββββββΌβββββββββ
β FastAPI Server β β Serves frontend + API endpoints
ββββββββββ¬βββββββββ
β
ββββββββββΌβββββββββ
β Orchestrator β β LLM-driven tool selection & execution
ββββββββββ¬βββββββββ
β
ββββββββββΌβββββββββ
β 82+ Tools β β Specialized data science functions
βββββββββββββββββββ
```
## Key Components
- **Orchestrator** ([src/orchestrator.py](src/orchestrator.py)): ReAct-based tool calling with Gemini
- **Tools Registry** ([src/tools/](src/tools/)): 82+ specialized data science tools
- **Session Memory** ([src/session_memory.py](src/session_memory.py)): Conversation history + file tracking
- **Artifact Store** ([src/storage/artifact_store.py](src/storage/artifact_store.py)): File management + metadata
## Deployment
This Space uses a **Docker** deployment for maximum compatibility:
- Base image: `python:3.12-slim`
- Multi-stage build (Node.js for frontend, Python for backend)
- Auto-exposes port 7860 for HuggingFace
- All dependencies bundled in container
## Contributing
Built for DevSprint Hackathon 2025. Contributions welcome post-hackathon!
## License
MIT License - see LICENSE file for details
|