Cuttlefish

Cuttlefish is a unified all-atom multimodal LLM that grounds language reasoning in geometric cues while scaling structural tokens with structural complexity. Built on Llama-3.1-8B-Instruct, it extends the base LLM with a graph encoder and a Scaling-Aware Patching connector for processing proteins, molecules, DNA, and RNA structures.

Quick start

from huggingface_hub import snapshot_download

# Download model
local_dir = snapshot_download("zihaojing/Cuttlefish")

# Run inference (requires cuttlefish codebase)
# python src/runner/inference.py --config configs/inference/octopus_8B_s3_v1_5.yaml

Input format

Cuttlefish accepts a unified parquet schema with structural graph columns:

Field	Description
`modality`	`"molecule"`, `"protein"`, `"dna"`, or `"rna"`
`node_feat`	Atom/node features (N × d)
`pos`	3D coordinates in Å (N × 3)
`edge_index`	Spatial graph edges in COO (2 × E)
`messages`	Chat-style instruction with `<STRUCTURE>` token

The <STRUCTURE> placeholder in the user message is replaced by the encoded structural tokens at inference time.

Training details

Base model: Llama-3.1-8B-Instruct
Encoder: Cuttlefish-Encoder (pretrained on all-atom graph data)
SFT data: Cuttlefish-SFT-Data
Training stages: 2-stage SFT — connector training then full LLM fine-tuning with LoRA

Related resources

Resource	Link
Cuttlefish-Encoder	zihaojing/Cuttlefish-Encoder
SFT instruction data	zihaojing/Cuttlefish-SFT-Data
Encoder pretraining data	zihaojing/Cuttlefish-Encoder-Data

Downloads last month: -

Safetensors

Model size

10B params

Tensor type

I64

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zihaojing/Cuttlefish

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Finetuned

(2729)

this model