Model Card for M1-AIS-rds
Model Details
Model Description
M1-AIS-rds is a Natural Product–specific Chemical Language Model (CLM) based on the Mamba selective state-space architecture. It uses the Atom-in-SMILES tokenizer. It is pre-trained on ~1M natural products (NPs) represented as SMILES strings. The model can be further deployed for de novo molecule generation and molecular property prediction in the Natural Product domain. Further details regarding model pre-training process can be found in the paper.
- Model type: Selective State-Space Model (Mamba)
- Language(s): SMILES molecular strings
- Training and Data: Trained from scratch on a curated Natural Product corpus consisting of around 1.3M molecules.
Model Sources
- Paper: Chemical Language Models for Natural Products: A State-Space Model Approach
- Repository: CLMs-for-NPs
Uses
Direct Use
- De novo generation of Natural Product–like molecules
Downstream Use
- Integration into drug discovery and virtual screening pipelines
- Fine-tuning for NP-related property prediction tasks
Bias, Risks, and Limitations
[More Information Needed]
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Prerequisites
# Clone the repository
git clone https://github.com/rozariwang/CLMs-for-NPs.git
cd CLMs-for-NPs
# Install dependencies
pip install -r [requirements.txt](http://_vscodecontentref_/4)
Example Use
Task 1: Molecule Sequence Generation
Generate new pseudo-Natural Product SMILES sequences using the pretrained model:
from mol_generation import run_generation
# Configure and run generation
config = {
"model_name": "rozariwang/M1-AIS-rds",
"num_mols": 1000, # Number of molecules to generate
"max_length": 512, # Maximum SMILES length
"temperature": 1.0, # Sampling temperature (1.0 = no adjustment)
"outfile": "generated_molecules.csv"
}
run_generation(config)
Output: CSV file with columns [Molecule, Log-Likelihood]
Alternatively, use the CLI:
python3 main.py \
--task generate \
--model_names rozariwang/M1-AIS-rds \
--num_mols 1000 \
--temperature 1.0 \
--max_length 512
Training Details
Training Data
[More Information Needed]
Training Procedure
Preprocessing [optional]
[More Information Needed]
Training Hyperparameters
- Training regime: [More Information Needed]
Speeds, Sizes, Times [optional]
[More Information Needed]
Evaluation
Testing Data, Factors & Metrics
Testing Data
[More Information Needed]
Factors
[More Information Needed]
Metrics
[More Information Needed]
Results
[More Information Needed]
Summary
Model Examination [optional]
[More Information Needed]
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Technical Specifications [optional]
Model Architecture and Objective
[More Information Needed]
Compute Infrastructure
[More Information Needed]
Hardware
[More Information Needed]
Software
[More Information Needed]
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]
Model Card Contact
[More Information Needed]
- Downloads last month
- 19