Molecular Odor Predictor

A PyTorch MLP that predicts odor descriptors from a molecule's SMILES string using Morgan (ECFP4) fingerprints. Given any molecule, the model outputs a smell profile across 50 odor categories.

Model description

Property Detail
Architecture MLP β€” 2048 β†’ 512 β†’ 256 β†’ 50
Regularisation BatchNorm + Dropout (0.4) per hidden layer
Input 2048-bit Morgan fingerprint (ECFP4, radius=2)
Output 50-class multi-label probabilities (sigmoid)
Loss BCEWithLogitsLoss
Optimiser Adam, lr=1e-3 with ReduceLROnPlateau
Epochs 80 (best checkpoint at epoch 22)

Performance

Evaluated on a held-out test set of 545 molecules:

Metric Score
Macro F1 0.421
Micro F1 0.498
Hamming loss 0.080

RandomForest baseline macro F1: 0.374 (MLP is +13% relative improvement).

Top-performing labels: sulfurous (0.76), fruity (0.67), balsamic (0.61), floral (0.61), fatty (0.60)

Labels (50)

fruity, green, sweet, floral, herbal, woody, fatty, fresh, waxy, spicy, citrus, sulfurous, tropical, oily, nutty, earthy, rose, balsamic, apple, vegetable, meaty, ethereal, roasted, caramellic, winey, pineapple, musty, pungent, creamy, cheesy, minty, phenolic, onion, burnt, powdery, berry, aldehydic, camphoreous, honey, pear, melon, fermented, buttery, metallic, leafy, savory, animal, alliaceous, cocoa, dairy

Usage

import json
import numpy as np
import torch
import torch.nn as nn
from rdkit import Chem
from rdkit.Chem import rdFingerprintGenerator
from huggingface_hub import hf_hub_download

REPO = "Hari5115/molecular-odor-predictor"

class OdorMLP(nn.Module):
    def __init__(self, n_inputs, n_outputs, dropout=0.4):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(n_inputs, 512), nn.BatchNorm1d(512), nn.ReLU(), nn.Dropout(dropout),
            nn.Linear(512, 256),     nn.BatchNorm1d(256), nn.ReLU(), nn.Dropout(dropout),
            nn.Linear(256, n_outputs),
        )
    def forward(self, x):
        return self.net(x)

# Load
labels = json.load(open(hf_hub_download(REPO, "labels.json")))
model  = OdorMLP(2048, len(labels))
model.load_state_dict(torch.load(hf_hub_download(REPO, "best_model.pt"), map_location="cpu"))
model.eval()

# Predict
fp_gen = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=2048)
mol    = Chem.MolFromSmiles("COc1cc(C=O)ccc1O")  # vanillin
fp     = torch.tensor([fp_gen.GetFingerprint(mol)], dtype=torch.float32)

with torch.no_grad():
    probs = torch.sigmoid(model(fp)).squeeze().numpy()

for label, prob in sorted(zip(labels, probs), key=lambda x: -x[1]):
    if prob >= 0.3:
        print(f"{label}: {prob:.0%}")

Training data

See molecular-odor-dataset.

5,308 unique molecules from the GoodScents and Leffingwell databases, split 80/10/10 into train/val/test using stratified multi-label splitting (iterative_train_test_split).

Data sources & credits

This model was trained on data from two public olfactory databases, accessed via the Pyrfume open-science library:

  • The Good Scent Company (TGSC) β€” GoodScents database Odor descriptors and molecular identifiers from goodscentscompany.com

  • Leffingwell & Associates β€” Leffingwell Flavor & Fragrance database Odor descriptors from leffingwell.com

  • Pyrfume β€” Open science library for standardising and publishing olfactory data. Mainland JD, et al. Pyrfume: A window to the world's olfactory data. github.com/pyrfume/pyrfume

Molecular SMILES strings sourced from PubChem via Pyrfume.

Limitations

  • The model predicts odor descriptors (how people describe the smell), not physical odor intensity or threshold.
  • Performance is lower on rare/subjective descriptors (buttery, powdery, metallic) which lack distinct molecular fingerprint patterns.
  • Trained only on molecules with PubChem entries β€” novel or proprietary molecules may be out-of-distribution.

Demo

Try the live demo: Hari5115/molecular-odor-demo

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using Hari5115/molecular-odor-predictor 1

Collection including Hari5115/molecular-odor-predictor