π’ PIML Hybrid Ensemble for Biofouling Prediction (LOF 0-4)
π₯ Executive Summary
This repository hosts the trained artifacts for a Physics-Informed Machine Learning (PIML) solution, developed for Augmented Predictive Maintenance in the naval sector. The model is a multi-class classifier that predicts the Level of Fouling (LOF) on Transpetro vessels, translating the physical risk into a direct financial loss metric (USD).
The core strength of the solution lies in overcoming data scarcity (small dataset of 29 records) and severe class imbalance through rigorous, causal Feature Engineering and the use of Natural Language Processing (NLP).
π§ Architecture and Essential Artifacts
The final model is a Hybrid Ensemble (Voting Classifier with soft voting), optimized to handle the multi-class classification problem (LOF 0 to 4).
The Ensemble model (modelo_casco.joblib) is composed of:
- Random Forest (RF): Configured with
class_weight='balanced'to mitigate class imbalance. - XGBoost (XGB): Used for its superior discriminatory power in gradient boosting.
- Weight Optimization: The Ensemble was optimized with weights (RF: 1, XGB: 2), prioritizing XGBoost's precision.
The essential artifacts for deployment were persisted using joblib:
| Artifact File | Description | Purpose |
|---|---|---|
modelo_casco.joblib |
The trained Voting Classifier Ensemble | Makes the final LOF class prediction. |
regua_fisica.joblib |
The Physical Baseline (Linear Regression). | Calculates the expected energy consumption of a clean ship. |
label_encoder.joblib |
The Target Encoder. | Decodes the numerical prediction output back to the corresponding LOF label. |
π¬ Physics-Informed Feature Engineering (PIML)
The main force of the solution is the causal Feature Engineering that isolates the fouling signal, orchestrated by the function criar_features_geo_fisicas.
PERFORMANCE_LOSS_MJ (Pure Incrustation Signal)
- Foundation: The calculation is based on the naval propulsion principle that a clean ship's power consumption (P) is ideally a cubic function of its velocity (V).
- Calculation: This feature is calculated as the residual between the REAL ENERGY consumed and the EXPECTED ENERGY derived from the
regua_fisicabaseline model. - Function: This residual is the pure, quantified signal of energy loss not explained by normal factors, isolating the impact of incrustation drag.
RISCO_REGIONAL (Geo-Environmental Prior)
- The function
RISCO_REGIONALmaps geographical coordinates (Lat/Lon) to a discrete Biological Growth Risk score (1.0 to 5.0). - Function: This prior allows the model to modulate the rate of LOF accumulation, recognizing that the temporal feature (
Dias Desde UltimaLimpeza) is more critical in high-risk zones.
- The function
π Performance and Business Impact
The PIML architecture demonstrated superior effectiveness with an Overall Accuracy of 0.82 (82%) on the test set.
| LOF Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| 0 | 0.86 | 0.97 | 0.91 | 98 |
| 1 | 0.51 | 0.51 | 0.51 | 41 |
| 2 | 0.71 | 0.74 | 0.73 | 162 |
| 3 | 0.75 | 0.70 | 0.72 | 30 |
| 4 | 0.90 | 0.86 | 0.88 | 393 |
| Overall Accuracy | 0.82 | |||
| Macro Avg | 0.75 | 0.76 | 0.75 | 724 |
Strategic Value and Financial Impact
The technical success translates directly into business value:
- Maintenance Optimization: High reliability in the extreme classes (LOF 0 and 4) allows the transition from a reactive/scheduled model to a predictive maintenance model.
- Actionable Decision: The post-processing function (
formatar_saida_modelo_v2) transforms the discrete LOF prediction into a continuous financial impact metric (e.g., Loss in USD).This allows Transpetro to optimize millions of dollars in fuel costs.
π οΈ Usage and Implementation
To consume this model in Python, you must use the huggingface_hub library to download the artifacts directly from this repository:
import joblib
from huggingface_hub import hf_hub_download
# Define your repository ID
MODEL_REPO_ID = "YOUR_USERNAME/YOUR_MODEL_NAME"
# 1. Download and load the Main Ensemble
ensemble_path = hf_hub_download(repo_id=MODEL_REPO_ID, filename="modelo_casco.joblib")
modelo_casco = joblib.load(ensemble_path)
# 2. Download and load the Physical Baseline (Essential for feature creation)
regua_path = hf_hub_download(repo_id=MODEL_REPO_ID, filename="regua_fisica.joblib")
regua_fisica = joblib.load(regua_path)
# The model is now ready for inference, after generating the necessary features
# (e.g., PERFORMANCE_LOSS_MJ) using 'regua_fisica'.
Or, use the model REST API at https://huggingface.co/spaces/carpenterbb/api-transpetro-hackathon
If you use the model's weights or the PIML architecture in your research or application, please cite the corresponding paper or repository.
Model Artifacts DOI: 10.57967/hf/7136If you prefer a direct text citation:
Alves Gomes, Gabrielly, (2025).
Hybrid Physics-Informed ML Architecture for Biofouling Prediction and Economic Impact in Vessels.
Hugging Face Hub. [doi: 10.57967/hf/7136]