🚒 PIML Hybrid Ensemble for Biofouling Prediction (LOF 0-4)

πŸ₯‡ Executive Summary

This repository hosts the trained artifacts for a Physics-Informed Machine Learning (PIML) solution, developed for Augmented Predictive Maintenance in the naval sector. The model is a multi-class classifier that predicts the Level of Fouling (LOF) on Transpetro vessels, translating the physical risk into a direct financial loss metric (USD).

The core strength of the solution lies in overcoming data scarcity (small dataset of 29 records) and severe class imbalance through rigorous, causal Feature Engineering and the use of Natural Language Processing (NLP).

🧠 Architecture and Essential Artifacts

The final model is a Hybrid Ensemble (Voting Classifier with soft voting), optimized to handle the multi-class classification problem (LOF 0 to 4).

The Ensemble model (modelo_casco.joblib) is composed of:

  • Random Forest (RF): Configured with class_weight='balanced' to mitigate class imbalance.
  • XGBoost (XGB): Used for its superior discriminatory power in gradient boosting.
  • Weight Optimization: The Ensemble was optimized with weights (RF: 1, XGB: 2), prioritizing XGBoost's precision.

The essential artifacts for deployment were persisted using joblib:

Artifact File Description Purpose
modelo_casco.joblib The trained Voting Classifier Ensemble Makes the final LOF class prediction.
regua_fisica.joblib The Physical Baseline (Linear Regression). Calculates the expected energy consumption of a clean ship.
label_encoder.joblib The Target Encoder. Decodes the numerical prediction output back to the corresponding LOF label.

πŸ”¬ Physics-Informed Feature Engineering (PIML)

The main force of the solution is the causal Feature Engineering that isolates the fouling signal, orchestrated by the function criar_features_geo_fisicas.

  1. PERFORMANCE_LOSS_MJ (Pure Incrustation Signal)

    • Foundation: The calculation is based on the naval propulsion principle that a clean ship's power consumption (P) is ideally a cubic function of its velocity (V).
    • Calculation: This feature is calculated as the residual between the REAL ENERGY consumed and the EXPECTED ENERGY derived from the regua_fisica baseline model.
    • Function: This residual is the pure, quantified signal of energy loss not explained by normal factors, isolating the impact of incrustation drag.
  2. RISCO_REGIONAL (Geo-Environmental Prior)

    • The function RISCO_REGIONAL maps geographical coordinates (Lat/Lon) to a discrete Biological Growth Risk score (1.0 to 5.0).
    • Function: This prior allows the model to modulate the rate of LOF accumulation, recognizing that the temporal feature (Dias Desde UltimaLimpeza) is more critical in high-risk zones.

πŸ“Š Performance and Business Impact

The PIML architecture demonstrated superior effectiveness with an Overall Accuracy of 0.82 (82%) on the test set.

LOF Class Precision Recall F1-Score Support
0 0.86 0.97 0.91 98
1 0.51 0.51 0.51 41
2 0.71 0.74 0.73 162
3 0.75 0.70 0.72 30
4 0.90 0.86 0.88 393
Overall Accuracy 0.82
Macro Avg 0.75 0.76 0.75 724

Strategic Value and Financial Impact

The technical success translates directly into business value:

  • Maintenance Optimization: High reliability in the extreme classes (LOF 0 and 4) allows the transition from a reactive/scheduled model to a predictive maintenance model.
  • Actionable Decision: The post-processing function (formatar_saida_modelo_v2) transforms the discrete LOF prediction into a continuous financial impact metric (e.g., Loss in USD).This allows Transpetro to optimize millions of dollars in fuel costs.

πŸ› οΈ Usage and Implementation

To consume this model in Python, you must use the huggingface_hub library to download the artifacts directly from this repository:

import joblib
from huggingface_hub import hf_hub_download

# Define your repository ID
MODEL_REPO_ID = "YOUR_USERNAME/YOUR_MODEL_NAME" 

# 1. Download and load the Main Ensemble
ensemble_path = hf_hub_download(repo_id=MODEL_REPO_ID, filename="modelo_casco.joblib")
modelo_casco = joblib.load(ensemble_path)

# 2. Download and load the Physical Baseline (Essential for feature creation)
regua_path = hf_hub_download(repo_id=MODEL_REPO_ID, filename="regua_fisica.joblib")
regua_fisica = joblib.load(regua_path)

# The model is now ready for inference, after generating the necessary features 
# (e.g., PERFORMANCE_LOSS_MJ) using 'regua_fisica'.

Or, use the model REST API at https://huggingface.co/spaces/carpenterbb/api-transpetro-hackathon

If you use the model's weights or the PIML architecture in your research or application, please cite the corresponding paper or repository.

  • Model Artifacts DOI: 10.57967/hf/7136

  • If you prefer a direct text citation:

Alves Gomes, Gabrielly, (2025).
Hybrid Physics-Informed ML Architecture for Biofouling Prediction and Economic Impact in Vessels.
Hugging Face Hub. [doi: 10.57967/hf/7136]
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support