Model Card for M1-AIS-rds

Model Details

Model Description

M1-AIS-rds is a Natural Product–specific Chemical Language Model (CLM) based on the Mamba selective state-space architecture. It uses the Atom-in-SMILES tokenizer. It is pre-trained on ~1M natural products (NPs) represented as SMILES strings. The model can be further deployed for de novo molecule generation and molecular property prediction in the Natural Product domain. Further details regarding model pre-training process can be found in the paper.

  • Model type: Selective State-Space Model (Mamba)
  • Language(s): SMILES molecular strings
  • Training and Data: Trained from scratch on a curated Natural Product corpus consisting of around 1.3M molecules.

Model Sources


Uses

Direct Use

  • De novo generation of Natural Product–like molecules

Downstream Use

  • Integration into drug discovery and virtual screening pipelines
  • Fine-tuning for NP-related property prediction tasks

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Prerequisites

# Clone the repository
git clone https://github.com/rozariwang/CLMs-for-NPs.git
cd CLMs-for-NPs

# Install dependencies
pip install -r [requirements.txt](http://_vscodecontentref_/4)

Example Use

Task 1: Molecule Sequence Generation
Generate new pseudo-Natural Product SMILES sequences using the pretrained model:

from mol_generation import run_generation

# Configure and run generation
config = {
    "model_name": "rozariwang/M1-AIS-rds",
    "num_mols": 1000,              # Number of molecules to generate
    "max_length": 512,              # Maximum SMILES length
    "temperature": 1.0,             # Sampling temperature (1.0 = no adjustment)
    "outfile": "generated_molecules.csv"
}

run_generation(config)

Output: CSV file with columns [Molecule, Log-Likelihood]

Alternatively, use the CLI:

python3 main.py \
  --task generate \
  --model_names rozariwang/M1-AIS-rds \
  --num_mols 1000 \
  --temperature 1.0 \
  --max_length 512

Training Details

Training Data

[More Information Needed]

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

  • Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: [More Information Needed]
  • Hours used: [More Information Needed]
  • Cloud Provider: [More Information Needed]
  • Compute Region: [More Information Needed]
  • Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

[More Information Needed]

Software

[More Information Needed]

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

Downloads last month
19
Safetensors
Model size
59.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for rozariwang/M1-AIS-rds