DemoDiff-0.7B / README.md
liuganghuggingface's picture
Improve model card: Add pipeline tag, paper, GitHub link, and description (#1)
c49f2a6 verified
metadata
datasets:
  - liuganghuggingface/demodiff_downstream
license: mit
tags:
  - chemistry
  - biology
pipeline_tag: graph-ml

DemoDiff: Graph Diffusion Transformers are In-Context Molecular Designers

This repository contains the DemoDiff model, a diffusion-based molecular foundation model for in-context inverse molecular design, as presented in the paper Graph Diffusion Transformers are In-Context Molecular Designers.

DemoDiff leverages graph diffusion transformers to generate molecules based on contextual examples, enabling few-shot molecular design across diverse chemical tasks without task-specific fine-tuning. It introduces demonstration-conditioned diffusion models, which define task contexts using a small set of molecule-score examples instead of text descriptions to guide a denoising Transformer for molecule generation. A novel molecular tokenizer with Node Pair Encoding is developed for scalable pretraining, representing molecules at the motif level.

Code: https://github.com/liugangcode/DemoDiff

🌟 Key Features

  • In-Context Learning: Generate molecules using only contextual examples (no fine-tuning required)
  • Graph-Based Tokenization: Novel molecular graph tokenization with BPE-style vocabulary
  • Comprehensive Benchmarks: 30+ downstream tasks covering drug discovery, docking, and polymer design

Model Configuration

Parameter Value Description
context_length 150 Maximum sequence length for the input context.
depth 24 Number of transformer layers.
diffusion_steps 500 Number of diffusion steps during training.
hidden_size 1280 Hidden dimension size in the transformer.
mlp_ratio 4 Expansion ratio in the MLP block.
num_heads 16 Number of attention heads.
task_name pretrain Task type for model training.
tokenizer_name pretrain Tokenizer used for model input.
vocab_ring_len 300 Length of the circular vocabulary window.
vocab_size 3000 Total vocabulary size.