AutoMLOps · Mohammad Noorchenarboo

Architecture Overview

End-to-End MLOps Architecture

AutoMLOps runs entirely within a single Docker container, starting an Apache Airflow Scheduler alongside a Gunicorn-served Flask API. Raw datasets from scikit-learn and real-world sources are preprocessed and fed to any of 50+ algorithms, with every run logged to an SQLite-backed MLflow tracking server. The Pipeline Studio renders live DAG execution state directly in the browser — no page reloads required.

🗄️

Datasets

sklearn + California Housing

🔧

Preprocessing

Scale · Encode · Split

🧠

Training

50+ algorithms via Airflow DAG

📈

MLflow

Params · Metrics · Artifacts

🚀

Registry

Staging → Production

Module Breakdown

Six Core Modules

🎨 Studio

Pipeline Studio

Full-screen interactive DAG canvas rendered with HTML/CSS absolute positioning and SVG bezier arrows. Click any node to open a slide-in config panel; execution logs stream live in a built-in terminal.

Pipelines3 pre-built DAGs

EngineApache Airflow

🤖 AutoML

AutoML Engine

Automatically sweeps all algorithms in the registry for a chosen dataset and task type. Each trial is tracked as an MLflow run so you can compare results across the full algorithm landscape.

Search space50+ algorithms

Max runsconfigurable (default 20)

✈️ Airflow

Airflow Scheduler

Real Apache Airflow 2.10 with SequentialExecutor running inside the container. DAGs use PythonOperator and XCom to pass results between tasks, with DagRun/TaskInstance status surfaced back to the UI.

VersionAirflow 2.10.4

ExecutorSequentialExecutor

📈 MLflow

MLflow Tracking

Every training run — whether manual, AutoML, or pipeline — logs parameters, metrics, and sklearn model artifacts to a shared SQLite MLflow store. Experiments are organised by dataset name.

BackendSQLite (mlflow.db)

Trackedparams · metrics · artifacts

📦 Registry

Model Registry

Browse all registered models, their versions, and lifecycle stages (None → Staging → Production → Archived). Register any MLflow run directly from the UI and promote versions with a single click.

Stages4 lifecycle stages

APIMLflow Model Registry

🗂️ Datasets

Dataset Library

Five built-in datasets covering both classification (Iris, Wine, Breast Cancer) and regression (Diabetes Progression, California Housing). All expose a consistent loader interface used by training, AutoML, and pipelines.

Datasets5 built-in

Tasksclassification · regression

Algorithm & Tech Stack

50+ Algorithms Across 7 Categories

The algorithm registry covers the full spectrum from interpretable linear models to deep MLP networks. XGBoost and LightGBM are included as first-class citizens alongside scikit-learn. All algorithms share a uniform interface — the same JSON-serialisable config dict is used by manual training, AutoML search, and pipeline execution.

Linear Models

Logistic Regression (L1/L2), Ridge, SGD, Passive Aggressive, LDA · Ridge/Lasso/ElasticNet/Bayesian/Huber for regression

12 algorithms

Ensemble / Boosting

Gradient Boosting, AdaBoost, Bagging, XGBoost, LightGBM — for both classification and regression

10 algorithms

Tree-Based Models

Decision Tree, Random Forest, Extra Trees, QDA — classifier and regressor variants

7 algorithms

Neural Networks & SVMs

MLP (Small/Medium/Deep), SVC (RBF/Poly/Linear/LinearSVC), SVR (RBF/Linear), KNN k=3/5/9

17 algorithms

Interactive Explorer

Pipeline Scenario Explorer

Select a pipeline scenario to see representative metrics from the live platform. All values are from real training runs logged to MLflow.

Metrics from demo seed runs. Live app scores in real time — train your own models in the Pipeline Studio.

Performance Snapshot

Algorithm & Dataset Performance

Classification Accuracy

Dataset Distribution

Regression R² Scores

Top classification algorithm accuracy on demo datasets. LightGBM leads with 97.2% on Wine Quality.

Sample distribution across the 5 built-in datasets. California Housing is the largest at 20,640 samples.

R² scores for regression algorithms on California Housing. LightGBM Regressor achieves the highest R² of 0.834.

Design Decisions

Key Engineering Choices

✈️

Real Airflow

Pipelines execute as genuine Airflow DAGs with XCom, TaskInstance status tracking, and DagRun polling — not a simulated engine. The fallback is only activated if Airflow is not installed.

📦

Zero External Services

MLflow tracking, Airflow metadata, and the model registry all use SQLite. No Postgres, Redis, or message broker needed — the entire platform runs in a single HuggingFace Space container.

🎨

Canvas-Native DAG UI

The Pipeline Studio uses HTML divs with absolute positioning and SVG bezier arrows — no graph library dependency. Node status animations and the slide-in config panel are pure CSS transitions.

AutoMLOps — ML Experiment Tracking & Pipeline Studio

End-to-End MLOps Architecture

Single-Container MLOps Stack

Six Core Modules

50+ Algorithms Across 7 Categories

Uniform Algorithm Interface

Pipeline Scenario Explorer

Algorithm & Dataset Performance

Key Engineering Choices

At a Glance

Try It Live

Project Info

Tech Stack

App Pages

Related Work