InventMol-R1

Target-Conditioned Molecular Ideation Model for Drug Discovery Research

Research prototype. Not for clinical use. No experimental validation.

Model Description

InventMol-R1 is a fine-tuned version of Qwen2.5-0.5B that generates novel drug-like molecules conditioned on biological context. Given a protein target, disease, mutation, and mechanism of action, the model outputs molecular structures in SELFIES format.

This model demonstrates the concept of reasoning-guided molecular ideation aligned with modern AI-driven drug discovery pipelines.

Intended Use

  • Research prototype for computational drug discovery
  • Molecular ideation and scaffold hopping
  • Educational demonstration of LLMs in cheminformatics
  • Target-conditioned molecular generation

Training Data

Trained on tyrosine kinase inhibitors with bioactivity data from ChEMBL, filtered for drug-likeness and converted to SELFIES representation. The dataset includes:

  • 7+ protein targets (EGFR, BRAF, ALK, KIT, VEGFR, BTK, FGFR, MET, RET)
  • Multiple disease contexts (NSCLC, melanoma, GIST, AML, etc.)
  • Clinically relevant mutations (T790M, V600E, D816V, etc.)
  • Mechanism of action annotations
  • Potency labels (active, intermediate, inactive)

Quick Start

from unsloth import FastLanguageModel
from selfies import decoder
from rdkit import Chem
from rdkit.Chem import Descriptors
import re

model, tokenizer = FastLanguageModel.from_pretrained("Hamdan003/InventMol-R1")

def extract_selfies(text):
    matches = re.findall(r'\[[^\]]*\]', text)
    if len(matches) >= 5:
        first = text.find(matches[0])
        count = 0
        for i in range(first, len(text)):
            if text[i] == '[': count += 1
            elif text[i] == ']':
                count -= 1
                if count == 0: return text[first:i+1]
    return ""

def generate_molecule(target, disease, mutation, mechanism):
    prompt = f"[Target]: {target}\n[Disease]: {disease}\n[Mutation]: {mutation}\n[Mechanism]: {mechanism}\n[Potency]: High\n"
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=300, temperature=0.7, do_sample=True, top_p=0.95)
    generated = tokenizer.decode(outputs[0], skip_special_tokens=True)
    selfies_str = extract_selfies(generated)
    if selfies_str:
        smiles = decoder(selfies_str)
        mol = Chem.MolFromSmiles(smiles)
        if mol:
            return smiles, Descriptors.MolWt(mol), Descriptors.MolLogP(mol)
    return None, 0, 0

smiles, mw, logp = generate_molecule("EGFR", "NSCLC", "T790M", "Irreversible covalent inhibition")
print(f"SMILES: {smiles}\nMW: {mw:.0f}\nLogP: {logp:.1f}")
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Hamdan003/inventmol-r1

Finetuned
(609)
this model