File size: 1,856 Bytes
3209a6f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
license: mit
tags:
  - DNA-methylation
  - epigenetics
  - foundation-model
  - aging
  - biology
---

# CpGPT Model Checkpoints

Model weights, configurations, and vocabularies for [CpGPT: A Foundation Model for DNA Methylation](https://github.com/lcamillo/CpGPT).

## Contents

```
weights/     # PyTorch Lightning checkpoint files (.ckpt)
config/      # Hydra YAML configuration files
vocab/       # CpG vocabulary files (.json)
```

## Pre-trained Models

| Model | Size | Parameters | Model Name |
|-------|------|------------|------------|
| CpGPT-2M | 30MB | ~2.5M | `small` |
| CpGPT-100M | 1.1GB | ~101M | `large` |

## Download

```bash
# Install huggingface_hub
pip install huggingface_hub

# Download all model files
huggingface-cli download lucascamillomd/cpgpt-models --local-dir dependencies/model

# Or download a specific model
huggingface-cli download lucascamillomd/cpgpt-models weights/small.ckpt config/small.yaml vocab/small.json --local-dir dependencies/model
```

## Dependencies

You will also need the DNA embeddings for your species of interest:
- **Human**: [lucascamillomd/cpgpt-human-dependencies](https://huggingface.co/lucascamillomd/cpgpt-human-dependencies)
- **Mammalian (multi-species)**: [lucascamillomd/cpgpt-mammalian-dependencies](https://huggingface.co/lucascamillomd/cpgpt-mammalian-dependencies)

## Usage

After downloading the model files and species dependencies, follow the tutorials at the [CpGPT GitHub repository](https://github.com/lcamillo/CpGPT) to get started.

## Citation

```bibtex
@article{camillo2024cpgpt,
  title={CpGPT: A Foundation Model for DNA Methylation},
  author={de Lima Camillo, Lucas Paulo et al.},
  journal={bioRxiv},
  year={2024},
  doi={10.1101/2024.10.24.619766}
}
```

## License

MIT License — see the [GitHub repository](https://github.com/lcamillo/CpGPT) for details.