title: AutoML Studio
emoji: ✨
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
🎨 AutoML Studio
This repository is configured to run on Hugging Face Spaces as a single Docker Space that serves:
- Streamlit frontend through Nginx on public port
7860 - FastAPI backend internally on
127.0.0.1:8000 - Celery worker plus Redis inside the same container for background training jobs
AutoML Studio is a high-performance, intelligent end-to-end automated machine learning platform. It empowers anyone to upload tabular datasets, leverage "Dataset DNA" heuristics to automatically preprocess data, logically isolate the most appropriate algorithms, and train competitive ML models through a multi-page Streamlit frontend paired with a FastAPI backend.
(Placeholder: Add a screenshot or GIF of the dashboard here)
✨ Features
- Dataset DNA Analyzer: Instantly parses uploaded datasets (CSV, JSON, Excel, Parquet) to automatically determine shape, calculate missing value distributions, identify imbalances, and heuristically suggest target configurations.
- Auto-Imputing & Auto-Encoding: You never have to manually clean data again. The backend seamlessly applies
ColumnTransformers, routing numeric data through Medians/StandardScalers and categorical data through Constant/OneHotEncoders safely. - Smart Model Selection (Pro Mode): It doesn't test blindly. It evaluates the exact shape and taxonomy of your dataset to dynamically build a tailored algorithmic roster (e.g. leveraging
SVMfor small datasets, and unleashingXGBoostfor high-dimensional complexity). - Time Travel Training Logs: View live metric updates as pipelines iteratively optimize.
- Auto Report (Story Mode): Generates an automated "wrap-up" narrative explaining what data was analyzed, which algorithm dominated, and why it succeeded.
- Deep Insights: Explore exactly where the model fails via the Mistake Analyzer, view low-confidence classifications, and receive "Explain-Like-I'm-5" ML coaching strategies.
- One-Click Deploy Bundles: Automatically bundles and exports your trained
.pklmodel directly beside a custom-writtenFastAPIscript, giving you a deployment-ready inference server in 1 click!
🏗️ Architecture
AutoML Studio
├── frontend/
│ ├── app.py # Streamlit entry point
│ ├── pages/ # Streamlit workflows
│ ├── style.css # Shared visual system
│ └── ui_shell.py # Shared UI helpers
├── backend/
│ ├── main.py # FastAPI entry point
│ └── core/
│ ├── data_profiler.py # Dataset heuristic extraction logic
│ ├── insights.py # Narrative generation and AI coaching synthesis
│ └── export.py # ZIP creation for trained model bundles
├── requirements.txt # Shared dependencies
├── start.sh # Docker / HF launcher
├── run.sh # Local development launcher
└── README.md
🚀 Installation & Usage
Prerequisites
- Python 3.8+
pippackage manager
1. Setup Environment
Clone the repository and set up a virtual environment:
python3 -m venv venv
source venv/bin/activate
2. Install Dependencies
pip install -r requirements.txt
3. Quick Start (Recommended)
You can launch the FastAPI backend, worker stack, and Streamlit frontend using the provided shell script:
bash run.sh
4. Manual Launch
If you prefer to run it manually:
- Start the backend:
cd backend
uvicorn main:app --host 0.0.0.0 --port 8000
- Open the frontend:
http://localhost:8501
📖 How to Use
Once the application is live on http://localhost:8501, follow these steps:
- Upload Dataset: Navigate to the Home tab and drag-and-drop your dataset (CSV, JSON, Excel, or Parquet).
- Review DNA: Click on the DNA tab to review the automatic imputation plan and exploratory data analysis.
- Train Engine: Go to Training & Results to start the parallel training pipeline. Watch the time-travel metrics update live.
- Export: Once training completes, download the deployment-ready
.zipbundle to serve your model immediately.
🌐 Deployment Options
Docker
# Build and run with Docker Compose
docker-compose up -d
# Access the app through the container's configured public port
Production Considerations
- Database: Add PostgreSQL for production data persistence
- Redis: Required for background job queuing
- Storage: Use cloud storage (S3, GCS) for large model files
- Scaling: Consider load balancer for multiple instances
- Security: Add authentication, rate limiting, and input validation
Configuration Notes
PORTcontrols the public Nginx listener. Default is7860.AUTOML_ALLOWED_ORIGINSaccepts a comma-separated CORS allowlist for the FastAPI backend.STREAMLIT_ENABLE_CORSandSTREAMLIT_ENABLE_XSRF_PROTECTIONlet you tighten frontend security for non-HF deployments.
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request or open an Issue for any bugs, feature requests, or improvements.
📄 License
This project is licensed under the MIT License.