--- license: cc-by-nc-4.0 ---

HIGHT: Hierarchical Graph Tokenization for Graph-Language Alignment

This repo contains the model checkpoints of our ICML 2025 paper: *[Hierarchical Graph Tokenization for Molecule-Language Alignment](https://arxiv.org/abs/2406.14021)*, which has also been presented at ICML 2024 workshop on [Foundation Models in the Wild](https://icml.cc/virtual/2024/workshop/29954). 😆😆😆 ## File Structures The pretrained Hierarchical VQ-VAE model is stored in `hivqvae.pth`. The checkpoints of graph-language models based on llama2-7b-chat and vicuna-v1-3-7b are contained in `/llama2` and `/vicuna`, respectively. Inside each directory, the remaining checkpoints are organized as (using vicuna as an example): - `llava-hvqvae2-vicuna-v1-3-7b-pretrain`: model after stage 1 pretraining; - `graph-text-molgen`: models finetuned using Mol-Instruction data under different tasks, e.g., forward reaction prediction; - `molcap-llava-hvqvae2-vicuna-v1-3-7b-finetune_lora-50ep`: model fintuned using CHEBI-20 dataset for molecular captioning; - `MoleculeNet-llava-hvqvae2-vicuna-v1-3-7b-finetune_lora-large*`: models finetuned via different classification-based molecular property prediction tasks; ## Citation If you find our model, paper and repo useful, please cite our paper: ```bibtex @inproceedings{chen2025hierarchical, title={Hierarchical Graph Tokenization for Molecule-Language Alignment}, author={Yongqiang Chen and Quanming Yao and Juzheng Zhang and James Cheng and Yatao Bian}, booktitle={Forty-second International Conference on Machine Learning}, year={2025}, url={https://openreview.net/forum?id=wpbNczwAwV} } ```