File size: 2,116 Bytes
c81de59
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
license: llama3
base_model: meta-llama/Llama-3.1-8B-Instruct
tags:
  - biology
  - protein
  - molecule
  - dna
  - rna
  - multimodal
  - structure-grounded
---

# Cuttlefish

**Cuttlefish** is a unified all-atom multimodal LLM that grounds language reasoning in geometric cues while scaling structural tokens with structural complexity. Built on Llama-3.1-8B-Instruct, it extends the base LLM with a graph encoder and a Scaling-Aware Patching connector for processing proteins, molecules, DNA, and RNA structures.

## Quick start

```python
from huggingface_hub import snapshot_download

# Download model
local_dir = snapshot_download("zihaojing/Cuttlefish")

# Run inference (requires cuttlefish codebase)
# python src/runner/inference.py --config configs/inference/octopus_8B_s3_v1_5.yaml
```

## Input format

Cuttlefish accepts a unified parquet schema with structural graph columns:

| Field | Description |
|---|---|
| `modality` | `"molecule"`, `"protein"`, `"dna"`, or `"rna"` |
| `node_feat` | Atom/node features (N × d) |
| `pos` | 3D coordinates in Å (N × 3) |
| `edge_index` | Spatial graph edges in COO (2 × E) |
| `messages` | Chat-style instruction with `<STRUCTURE>` token |

The `<STRUCTURE>` placeholder in the user message is replaced by the encoded structural tokens at inference time.

## Training details

- **Base model**: Llama-3.1-8B-Instruct
- **Encoder**: [Cuttlefish-Encoder](https://huggingface.co/zihaojing/Cuttlefish-Encoder) (pretrained on all-atom graph data)
- **SFT data**: [Cuttlefish-SFT-Data](https://huggingface.co/datasets/zihaojing/Cuttlefish-SFT-Data)
- **Training stages**: 2-stage SFT — connector training then full LLM fine-tuning with LoRA

## Related resources

| Resource | Link |
|---|---|
| Cuttlefish-Encoder | [zihaojing/Cuttlefish-Encoder](https://huggingface.co/zihaojing/Cuttlefish-Encoder) |
| SFT instruction data | [zihaojing/Cuttlefish-SFT-Data](https://huggingface.co/datasets/zihaojing/Cuttlefish-SFT-Data) |
| Encoder pretraining data | [zihaojing/Cuttlefish-Encoder-Data](https://huggingface.co/datasets/zihaojing/Cuttlefish-Encoder-Data) |