Cuttlefish
Cuttlefish is a unified all-atom multimodal LLM that grounds language reasoning in geometric cues while scaling structural tokens with structural complexity. Built on Llama-3.1-8B-Instruct, it extends the base LLM with a graph encoder and a Scaling-Aware Patching connector for processing proteins, molecules, DNA, and RNA structures.
Quick start
from huggingface_hub import snapshot_download
# Download model
local_dir = snapshot_download("zihaojing/Cuttlefish")
# Run inference (requires cuttlefish codebase)
# python src/runner/inference.py --config configs/inference/octopus_8B_s3_v1_5.yaml
Input format
Cuttlefish accepts a unified parquet schema with structural graph columns:
| Field | Description |
|---|---|
modality |
"molecule", "protein", "dna", or "rna" |
node_feat |
Atom/node features (N × d) |
pos |
3D coordinates in Å (N × 3) |
edge_index |
Spatial graph edges in COO (2 × E) |
messages |
Chat-style instruction with <STRUCTURE> token |
The <STRUCTURE> placeholder in the user message is replaced by the encoded structural tokens at inference time.
Training details
- Base model: Llama-3.1-8B-Instruct
- Encoder: Cuttlefish-Encoder (pretrained on all-atom graph data)
- SFT data: Cuttlefish-SFT-Data
- Training stages: 2-stage SFT — connector training then full LLM fine-tuning with LoRA
Related resources
| Resource | Link |
|---|---|
| Cuttlefish-Encoder | zihaojing/Cuttlefish-Encoder |
| SFT instruction data | zihaojing/Cuttlefish-SFT-Data |
| Encoder pretraining data | zihaojing/Cuttlefish-Encoder-Data |
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for zihaojing/Cuttlefish
Base model
meta-llama/Llama-3.1-8B Finetuned
meta-llama/Llama-3.1-8B-Instruct