Cuttlefish

Cuttlefish is a unified all-atom multimodal LLM that grounds language reasoning in geometric cues while scaling structural tokens with structural complexity. Built on Llama-3.1-8B-Instruct, it extends the base LLM with a graph encoder and a Scaling-Aware Patching connector for processing proteins, molecules, DNA, and RNA structures.

Quick start

from huggingface_hub import snapshot_download

# Download model
local_dir = snapshot_download("zihaojing/Cuttlefish")

# Run inference (requires cuttlefish codebase)
# python src/runner/inference.py --config configs/inference/octopus_8B_s3_v1_5.yaml

Input format

Cuttlefish accepts a unified parquet schema with structural graph columns:

Field Description
modality "molecule", "protein", "dna", or "rna"
node_feat Atom/node features (N × d)
pos 3D coordinates in Å (N × 3)
edge_index Spatial graph edges in COO (2 × E)
messages Chat-style instruction with <STRUCTURE> token

The <STRUCTURE> placeholder in the user message is replaced by the encoded structural tokens at inference time.

Training details

  • Base model: Llama-3.1-8B-Instruct
  • Encoder: Cuttlefish-Encoder (pretrained on all-atom graph data)
  • SFT data: Cuttlefish-SFT-Data
  • Training stages: 2-stage SFT — connector training then full LLM fine-tuning with LoRA

Related resources

Resource Link
Cuttlefish-Encoder zihaojing/Cuttlefish-Encoder
SFT instruction data zihaojing/Cuttlefish-SFT-Data
Encoder pretraining data zihaojing/Cuttlefish-Encoder-Data
Downloads last month
-
Safetensors
Model size
10B params
Tensor type
I64
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zihaojing/Cuttlefish

Finetuned
(2729)
this model