Odor Prediction Model

Predict odor descriptors and perceptual ratings from a molecule's SMILES string.

What it does:

  • Predicts 112 odor descriptors (fruity, floral, woody, sweet, etc.)
  • Predicts 3 perceptual ratings (Pleasantness, Intensity, Familiarity)

Results

Task Metric Score
Odor Classification Top-20 Macro F1 0.45
Odor Classification Mean AUC-ROC 0.84
Pleasantness Pearson r 0.55
Intensity Pearson r 0.34
Familiarity Pearson r 0.44

Quick Start

pip install torch xgboost rdkit numpy
from src.predict import predict

predict("c1ccc(cc1)C=O")  # Benzaldehyde - smells like almonds

Output:

Odor descriptors:
  sweet      : 0.98
  spicy      : 0.98
  almond     : 0.98
  floral     : 0.96
  green      : 0.93

Perceptual ratings (0-1 scale):
  Pleasantness: 0.99
  Intensity   : 0.89
  Familiarity : 0.92

Data Sources

All training data comes from the Pyrfume project:

Dataset What it contains
GoodScents ~4,500 molecules with expert odor annotations
Leffingwell ~3,200 molecules with odor descriptors
Keller et al. 2016/2012 Perceptual ratings (pleasantness, intensity, familiarity)
MA et al. 2021 Additional perceptual ratings
Arshamian et al. 2022 Pleasantness rankings

Model Details

Classifier (odor descriptors):

  • XGBoost with 300 trees
  • Morgan fingerprints (2048-bit) from RDKit
  • Per-label threshold tuning

Regressor (perceptual ratings):

  • PyTorch MLP (1024-512-256 hidden layers)
  • Separate models for each rating

Limitations

  • Only works on single molecules, not mixtures
  • Intensity predictions are weaker (r=0.34)
  • Perceptual ratings trained on ~500 molecules only

Citation

@misc{molecular-odor-prediction,
  title={Molecular Odor Prediction},
  author={batterdaysahead},
  year={2026},
  url={https://huggingface.co/batterdaysahead/molecular-odor-prediction},
}

License

MIT

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support