📦 Olist E-Commerce Delivery Prediction

🎥 Project Video Presentation

📄 Project Overview

This project focuses on predicting delivery times for e-commerce orders in Brazil, using the famous Olist Public Dataset. The goal is to improve customer experience by providing accurate delivery estimates.

The project is divided into two main tasks:

Regression: Predicting the exact number of days for delivery.
Classification: Categorizing delivery speed into Fast, Medium, or Slow.

📊 The Dataset

Source: Brazilian E-Commerce Public Dataset by Olist.
Size: ~100,000 orders after cleaning.
Key Features:
- customer_state: The destination state (critical for logistics).
- product_weight_g: Weight of the package.
- freight_value: Shipping cost.
- price: Product value.

🛠️ Feature Engineering & Clustering

To improve the model's performance beyond the baseline, I applied several techniques:

Data Cleaning: Removed outliers (negative days or deliveries > 60 days).
Geolocation Encoding: Converted customer_state into One-Hot Encoded features to capture geographical patterns (e.g., separating Sao Paulo from Amazonas).
K-Means Clustering: Applied unsupervised learning to group orders into 4 distinct clusters based on weight, price, and shipping cost. These cluster labels were added as a new feature to the model.

🤖 Model Performance

Part 1: Regression (Predicting Days)

I compared a Baseline Linear Regression against advanced models trained on the engineered dataset.

Model	R2 Score	Improvement
Baseline (Linear Regression)	0.057	-
Improved Linear Regression	0.216	+278%
XGBoost / Random Forest (Winner)	0.267	Best Performance

Insight: The significant jump in R2 proves that location data and clustering were essential for prediction.

Part 2: Classification (Fast / Medium / Slow)

I converted the target variable into 3 balanced classes using Quantile Binning:

Fast: Bottom 33%
Medium: Middle 33%
Slow: Top 33% (Critical to identify)

Best Model: XGBoost Accuracy: ~57% (vs. 33% random chance)

📂 Repository Structure

This repository contains the following files:

Assignment_2.ipynb: The complete Python notebook with all code, EDA, and visualizations.
regression_model.pkl: The trained Random Forest model for regression.
classification_model.pkl: The trained XGBoost model for classification.
README.md: Project documentation.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support