File size: 1,357 Bytes
eda8a86
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
license: apache-2.0
tags:
  - biology
  - protein
  - molecule
  - dna
  - rna
  - graph-neural-network
---

# Cuttlefish-Encoder

Graph encoder component of [Cuttlefish](https://huggingface.co/zihaojing/Cuttlefish), pretrained with masked reconstruction on all-atom structures (proteins, molecules, DNA, RNA).

## Usage

```python
from huggingface_hub import snapshot_download
encoder_dir = snapshot_download("zihaojing/Cuttlefish-Encoder")

# Load via the Cuttlefish codebase
# See https://github.com/your-repo/cuttlefish for full usage
```

## Pretraining data

Pretrained on **[Cuttlefish-Encoder-Data](https://huggingface.co/datasets/zihaojing/Cuttlefish-Encoder-Data)**, covering:
- Molecules (SMILES → 3D graph)
- Proteins (PDB/CIF → all-atom graph)
- DNA and RNA sequences

## Model details

- Architecture: All-atom graph encoder with masked reconstruction pretraining
- Encoder hidden dim: 256
- Modalities: molecule, protein, dna, rna

## Related resources

| Resource | Link |
|---|---|
| Full Cuttlefish LLM | [zihaojing/Cuttlefish](https://huggingface.co/zihaojing/Cuttlefish) |
| SFT instruction data | [zihaojing/Cuttlefish-SFT-Data](https://huggingface.co/datasets/zihaojing/Cuttlefish-SFT-Data) |
| Encoder pretraining data | [zihaojing/Cuttlefish-Encoder-Data](https://huggingface.co/datasets/zihaojing/Cuttlefish-Encoder-Data) |