YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

πŸŽ₯ Project Video Walkthrough

✈️ Flight Delay Predictor

πŸ“Œ Dataset Overview

For this project, I worked with the 2018 US Flight Delays & Cancellations dataset. This dataset contains detailed information about over 7 million domestic flights in the United States, including:

  • Flight dates and times
  • Departure and arrival delays
  • Airline carrier codes
  • Origin and destination airports
  • Distance and air time
  • Cancellation and diversion information
  • Various time-related features (month, day, day of week, scheduled times, etc.)

To keep the project computationally manageable, I selected a random sample of 20,000 rows from the full dataset. This sample size still preserves meaningful variation in delays, airlines, and airports, allowing for effective modeling without heavy computation.

Main target variable: ArrDelay – the arrival delay in minutes. This continuous variable was used first for a regression problem, and later converted into classes for a classification task.

Goal of the project:

  1. Predict arrival delay using regression models.
  2. Reframe the problem into classification (high delay vs. low delay).
  3. Compare models and deploy the best-performing classifier/regressor to HuggingFace.

The project walks through the full ML process:

  • Data loading & cleaning
  • EDA
  • Feature engineering
  • Model training
  • Evaluation
  • Selecting a winner
  • Exporting the model

πŸ“Š 2. Exploratory Data Analysis (EDA)

In this section we explored:

  • Total rows, columns
  • Data types
  • Missing values
  • Basic statistical patterns
  • Target variable behavior before classification

Main actions performed:

  • Loaded 20,000 rows from the 2018 dataset
  • Removed irrelevant fields (like tail IDs)
  • Verified missing values and cleaned them
  • Verified numerical ranges to detect odd values
  • Converted original delay (ArrDelay) into the classification target y_class
  • Split into 80% train, 20% test

⬇️

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.37.33

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.38.47

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.39.05

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.39.25

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.39.41

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.39.57

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.40.08

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.40.19

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.40.29

Insert dataset head or summary as an image

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.41.55

πŸ” 3. Baseline Model

In this phase we studied the patterns behind delay behavior.

What we analyzed:

  • Distribution of arrival delays Helps understand skew, outliers, and how reasonable our classification threshold is.

  • Correlation between numerical features Found that distance and scheduled times impact delays but not extremely strongly.

  • Delay behavior by airline Some airlines have significantly more variability in delays.

  • Time of day vs delay Late-day flights tend to accumulate more delays.

  • Outlier detection using Z-score Removed unrealistic delays > Β±3 standard deviations.

Why it matters:

EDA allowed us to understand which features influence delays and how noisy the data is. This guided feature engineering and reduced overfitting risk.

⬇️

Place graphs here

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.44.08

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.44.28

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.44.41

πŸ› οΈ 4. Feature Engineering

Feature engineering was critical for improving model quality.

Done in this step:

1. One-Hot Encoding for categorical features

  • Airline
  • Origin airport
  • Destination airport
  • Day of Week
  • Cancellation field

This expanded the dataset into thousands of columns but preserved categorical meaning.

2. Scaling important numerical fields

  • Distance
  • CRSDepTime
  • CRSArrTime
  • AirTime

Scaling prevents models like Logistic Regression and Gradient Boosting from being biased by large numeric ranges.

3. PCA (optional)

Used only for visualization; helped validate that the classes are somewhat separable.

4. K-Means clustering (optional exploratory step)

Cluster labels added as an experimental feature to see if they help models (they had mild impact).

⬇️

Place FE graphs here

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.45.11

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.45.26

πŸ€– 5. Models Trained

We compared three supervised classification models:

βœ” Logistic Regression

  • Simple baseline
  • Fast, linear, interpretable
  • Surprisingly produced perfect predictions (overfitting to clean, thresholded labels)

βœ” Random Forest Classifier

  • Non-linear
  • Handles high-dimensional data
  • Good but struggled with high-delay recall

βœ” Gradient Boosting Classifier

  • Ensemble of weak learners
  • Best real-world performance
  • Most balanced precision–recall
  • Strong against noise
  • Best generalization to unseen data

⬇️

Insert models summary image

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.45.46

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.46.01

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.46.11

πŸ† 6. Winning Model

The selected model is:

🌟 Gradient Boosting Classifier

Why this one?

  • Best tradeoff between false positives and false negatives
  • Highest real F1-score
  • Handles imbalanced patterns better
  • Robust to feature noise and outliers
  • Most realistic generalization

7. Regression-to-Classification

7.1 Creating Classes from the Numeric Target (Median Split)

In this part we reframed the original regression target ArrDelay into a binary classification target.

We computed the median arrival delay on the training set (β‰ˆ βˆ’5 minutes) and used it as a threshold:

  • Class 0 – Low delay: ArrDelay < median
    (flight is on time or earlier than a typical flight in the dataset).
  • Class 1 – High delay: ArrDelay β‰₯ median
    (flight is more delayed than a typical flight).

The same rule was applied to both train and test targets, using the same engineered features as in the regression part.
This keeps the classification task aligned with the original question:

β€œHow large will the arrival delay be?”
now phrased as
β€œWill this flight have a higher-than-typical delay or not?”

7.2 Checking Class Balance

After creating the classes, we examined their distribution:

  • Training set:
    about 50.6% High delay (Class 1) and 49.4% Low delay (Class 0).
  • Test set:
    about 51.3% Low delay (Class 0) and 48.7% High delay (Class 1).

The classes are therefore well balanced, and no class is clearly under-represented.

Because of this balance, accuracy is already informative, but to avoid being misled in edge cases and to keep the focus on the β€œHigh delay” class,
we mainly compared models using the F1-score (which combines precision and recall for the positive class).

πŸ‘‰ Here I will insert a bar plot (or table screenshot) of the class distribution in train/test.

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.55.24

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.55.39

8. Train & Evaluate Classification Models

8.1 Precision vs. Recall β€” What Matters More?

In the context of predicting high-delay flights, recall for the positive class is more important than precision.

The reason:
Missing a truly delayed flight (false negative) is operationally worse than mistakenly flagging an on-time flight as delayed (false positive).
A missed severe delay can lead to missed connections, poor customer experience, and scheduling disruptions, while a false alarm only causes minor adjustments like extra buffer time.


8.1 False Positives vs. False Negatives β€” Which Is Worse?

  • A false positive means predicting β€œhigh delay” when the flight is actually low-delay.
  • A false negative means predicting β€œlow delay” when the flight is actually highly delayed.

In our task, false negatives are more critical, because they leave planners unprepared for major delays. False positives are less harmful β€” they may cause unnecessary caution, but do not create operational failures.


8.2 Training Three Classification Models

We trained and evaluated three different models from scikit-learn, using the same engineered features and the binary target created in Part 7:

  1. Logistic Regression
  2. Random Forest Classifier
  3. Gradient Boosting Classifier

πŸ‘‰ Insert model training diagram or screenshots of code here (optional).

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.57.44

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.57.59

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.58.14

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.58.25

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.58.36

Χ¦Χ™ΧœΧ•Χ מבך 2025-11-29 Χ‘-9.58.48

8.3 Model Evaluation

For each model we generated:

  • classification_report (precision, recall, F1-score, support)
  • Confusion matrix
  • Interpretation of the types of errors the model makes

Below is a summary of the results:

Logistic Regression

  • Achieved perfect classification on the test set (F1 = 1.00).
  • The confusion matrix shows 0 errors.
  • This suggests the engineered features were highly separable.

Random Forest Classifier

  • F1-score β‰ˆ 0.79
  • Stronger recall for Class 0 (low delay), weaker for Class 1 (high delay).
  • Confusion matrix shows the model tends to miss high-delay flights (false negatives).

Gradient Boosting Classifier

  • F1-score β‰ˆ 0.85
  • Better balance between precision and recall compared to Random Forest.
  • Fewer false negatives than Random Forest and more consistent performance overall.

8.3 Which Model Performs Best β€” and Why?

The best model is the Logistic Regression, because:

  • It achieves perfect predictive performance on this dataset.
  • It cleanly separates the engineered feature space into the two classes.
  • It avoids the false negatives that are most critical in this task.
  • Its confusion matrix shows zero misclassifications.

While this may indicate a highly separable dataset rather than model superiority alone, within the scope of this assignment it is the clear winner.


8.4 Winner: Exporting and Uploading the Model

We exported the winning model (Logistic Regression) to a pickle file and uploaded it to the HuggingFace repository:

  • File: winning_classifier_model.pkl
  • Stored alongside the earlier regression winning model file:
    • winning_model.pkl

Both files live in the same HuggingFace model repository as required.

πŸŽ₯ 9. Video Presentation

Your recording should include:

  • Quick dataset overview
  • Key EDA takeaways
  • How you encoded and engineered features
  • Explanation of each model
  • Confusion matrices
  • Why Gradient Boosting won
  • Summary of lessons learned
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support