Spaces:
Running
Running
| title: DevSprint Data Science Agent | |
| emoji: π€ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: docker | |
| pinned: false | |
| license: mit | |
| app_port: 7860 | |
| # DevSprint Data Science Agent π€ | |
| An intelligent AI agent for automated data science workflows, powered by Google Gemini 2.5 Flash with 82+ specialized tools for data analysis, visualization, and machine learning. | |
| ## Features | |
| - π **Automated EDA**: YData profiling, statistical analysis, data quality reports | |
| - π **Smart Visualizations**: Plotly dashboards, matplotlib plots, interactive charts | |
| - π§Ή **Data Cleaning**: Missing value handling, outlier detection, type conversion | |
| - π οΈ **Feature Engineering**: Automated feature creation, encoding, scaling | |
| - π€ **ML Training**: AutoML with XGBoost, LightGBM, CatBoost, Neural Networks | |
| - π¬ **Natural Language Interface**: Chat-based interaction for complex workflows | |
| - π **Business Intelligence**: KPI tracking, trend analysis, forecasting | |
| ## Tech Stack | |
| - **Backend**: FastAPI + Python 3.12 | |
| - **LLM**: Google Gemini 2.5 Flash (text-based tool calling) | |
| - **Data Processing**: Polars (high-performance dataframes) | |
| - **Frontend**: React 19 + TypeScript + Vite | |
| - **ML Libraries**: Scikit-learn, XGBoost, LightGBM, CatBoost, PyTorch | |
| ## Usage | |
| 1. Upload your CSV/Excel dataset | |
| 2. Ask questions in natural language (e.g., "Generate a detailed profiling report") | |
| 3. The agent automatically selects and executes the right tools | |
| 4. View generated reports, visualizations, and insights | |
| ## Memory Optimization | |
| For large datasets (>50k rows or >10MB), the agent automatically: | |
| - Samples to 50,000 rows for profiling | |
| - Enables minimal mode to reduce memory usage | |
| - Disables expensive correlation/interaction calculations | |
| This ensures smooth operation even with large datasets on HuggingFace's 16GB RAM. | |
| ## Environment Variables | |
| Set `GEMINI_API_KEY` in HuggingFace Spaces settings (Settings β Repository secrets): | |
| ``` | |
| GEMINI_API_KEY=your_google_gemini_api_key_here | |
| ``` | |
| Get your API key from: https://aistudio.google.com/app/apikey | |
| ## Local Development | |
| ```bash | |
| # Clone repository | |
| git clone https://huggingface.co/spaces/YOUR_USERNAME/devs-print-data-science-agent | |
| cd devs-print-data-science-agent | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| npm install --prefix FRRONTEEEND | |
| # Build frontend | |
| cd FRRONTEEEND && npm run build && cd .. | |
| # Set API key | |
| export GEMINI_API_KEY=your_key_here | |
| # Run server | |
| uvicorn src.api.app:app --host 0.0.0.0 --port 7860 | |
| ``` | |
| ## Architecture | |
| ``` | |
| βββββββββββββββββββ | |
| β React Frontend β β User uploads data + asks questions | |
| ββββββββββ¬βββββββββ | |
| β | |
| ββββββββββΌβββββββββ | |
| β FastAPI Server β β Serves frontend + API endpoints | |
| ββββββββββ¬βββββββββ | |
| β | |
| ββββββββββΌβββββββββ | |
| β Orchestrator β β LLM-driven tool selection & execution | |
| ββββββββββ¬βββββββββ | |
| β | |
| ββββββββββΌβββββββββ | |
| β 82+ Tools β β Specialized data science functions | |
| βββββββββββββββββββ | |
| ``` | |
| ## Key Components | |
| - **Orchestrator** ([src/orchestrator.py](src/orchestrator.py)): ReAct-based tool calling with Gemini | |
| - **Tools Registry** ([src/tools/](src/tools/)): 82+ specialized data science tools | |
| - **Session Memory** ([src/session_memory.py](src/session_memory.py)): Conversation history + file tracking | |
| - **Artifact Store** ([src/storage/artifact_store.py](src/storage/artifact_store.py)): File management + metadata | |
| ## Deployment | |
| This Space uses a **Docker** deployment for maximum compatibility: | |
| - Base image: `python:3.12-slim` | |
| - Multi-stage build (Node.js for frontend, Python for backend) | |
| - Auto-exposes port 7860 for HuggingFace | |
| - All dependencies bundled in container | |
| ## Contributing | |
| Built for DevSprint Hackathon 2025. Contributions welcome post-hackathon! | |
| ## License | |
| MIT License - see LICENSE file for details | |