|
|
--- |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- chemistry |
|
|
- fuel |
|
|
- engines |
|
|
- YSI |
|
|
--- |
|
|
|
|
|
# YSI Predictor β Yield Sooting Index Model |
|
|
|
|
|
## π Overview |
|
|
|
|
|
This repository contains a machine learning model for predicting the **Yield Sooting Index (YSI)** of single-component fuel molecules directly from their **SMILES** representation. |
|
|
|
|
|
**YSI is a soot formation metric** used in combustion science. |
|
|
- **Lower YSI β cleaner combustion** |
|
|
- Highly relevant for **diesel replacement fuels**, **bio-fuels**, and **oxygenated fuels**. |
|
|
|
|
|
This model supports: |
|
|
- molecular design and optimization, |
|
|
- genetic algorithms (e.g., CREM), |
|
|
- Pareto optimization (CN vs YSI), |
|
|
- rapid candidate screening. |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ How It Works |
|
|
|
|
|
The prediction pipeline uses: |
|
|
- **RDKit** β molecule parsing |
|
|
- **Mordred** β 2D/3D molecular descriptors |
|
|
- **FeatureSelector** β dimensionality reduction |
|
|
- **Tree-based regression model** trained on experimental YSI values |
|
|
|
|
|
**Prediction flow:** |
|
|
1. Input SMILES β RDKit Molecule |
|
|
2. Mordred descriptors generated |
|
|
3. Feature selection applied |
|
|
4. YSI predicted using trained regressor |
|
|
|
|
|
Two model artifacts are included: |
|
|
|
|
|
model.joblib # trained regressor |
|
|
selector.joblib # feature selector used during training |
|
|
|
|
|
|
|
|
|
|
|
--- |
|
|
|
|
|
## 𧬠Training Data |
|
|
|
|
|
The model was trained using a curated dataset of **experimentally measured YSI values**, covering a diverse set of fuel molecule structures: |
|
|
|
|
|
Includes: |
|
|
- linear alkanes |
|
|
- branched alkanes |
|
|
- cyclic hydrocarbons |
|
|
- aromatics |
|
|
- oxygenated species (ethers, esters) |
|
|
|
|
|
YSI range in dataset: **β 3 β 80** |
|
|
|
|
|
--- |
|
|
|
|
|
## π Performance |
|
|
|
|
|
Performance was evaluated on both training and **held-out test** sets. |
|
|
|
|
|
### β Training Performance |
|
|
|
|
|
| Metric | Score | |
|
|
|--------|--------| |
|
|
| RMSE | **6.9661** | |
|
|
| MAE | **4.0581** | |
|
|
| RΒ² | **0.9309** | |
|
|
|
|
|
--- |
|
|
|
|
|
### π§ Test Performance |
|
|
|
|
|
| Metric | Score | |
|
|
|--------|--------| |
|
|
| RMSE | **5.9667** | |
|
|
| MAE | **3.8324** | |
|
|
| RΒ² | **0.9440** | |
|
|
| MAPE | **18.38%** | |
|
|
|
|
|
The **test RΒ² = 0.9440** shows strong predictive accuracy. |
|
|
|
|
|
--- |
|
|
|
|
|
### π Generalization Check |
|
|
|
|
|
| Metric | Value | |
|
|
|--------------|--------| |
|
|
| Train RMSE | **6.9661** | |
|
|
| Test RMSE | **5.9667** | |
|
|
| Ξ (Test β Train) | **β0.9994** | |
|
|
|
|
|
β‘οΈ The negative Ξ indicates **no overfitting**, and even **better test performance** due to more stable distribution. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Usage |
|
|
|
|
|
Below is a minimal example showing how to use the model in Python. |
|
|
|
|
|
> The feature calculation must match the training pipeline. |
|
|
|
|
|
```python |
|
|
import joblib |
|
|
from rdkit import Chem |
|
|
from shared_features import featurize_df, FeatureSelector |
|
|
|
|
|
# Load model & selector |
|
|
model = joblib.load("model.joblib") |
|
|
selector = joblib.load("selector.joblib") |
|
|
|
|
|
def predict_ysi(smiles: str): |
|
|
mol = Chem.MolFromSmiles(smiles) |
|
|
df = featurize_df([smiles]) |
|
|
X = selector.transform(df) |
|
|
y = model.predict(X) |
|
|
return float(y[0]) |
|
|
|
|
|
print(predict_ysi("CCCCCCC")) |
|
|
|