zihaojing
/

Cuttlefish

structure-grounded

Model card Files Files and versions

zihaojing commited on 5 days ago

Commit

c81de59

·

verified ·

1 Parent(s): a516e1b

Add model card

Files changed (1) hide show

README.md +57 -0

README.md ADDED Viewed

	@@ -0,0 +1,57 @@

+---
+license: llama3
+base_model: meta-llama/Llama-3.1-8B-Instruct
+tags:
+  - biology
+  - protein
+  - molecule
+  - dna
+  - rna
+  - multimodal
+  - structure-grounded
+---
+# Cuttlefish
+**Cuttlefish** is a unified all-atom multimodal LLM that grounds language reasoning in geometric cues while scaling structural tokens with structural complexity. Built on Llama-3.1-8B-Instruct, it extends the base LLM with a graph encoder and a Scaling-Aware Patching connector for processing proteins, molecules, DNA, and RNA structures.
+## Quick start
+```python
+from huggingface_hub import snapshot_download
+# Download model
+local_dir = snapshot_download("zihaojing/Cuttlefish")
+# Run inference (requires cuttlefish codebase)
+# python src/runner/inference.py --config configs/inference/octopus_8B_s3_v1_5.yaml
+```
+## Input format
+Cuttlefish accepts a unified parquet schema with structural graph columns:
+| Field | Description |
+|---|---|
+| `modality` | `"molecule"`, `"protein"`, `"dna"`, or `"rna"` |
+| `node_feat` | Atom/node features (N × d) |
+| `pos` | 3D coordinates in Å (N × 3) |
+| `edge_index` | Spatial graph edges in COO (2 × E) |
+| `messages` | Chat-style instruction with `<STRUCTURE>` token |
+The `<STRUCTURE>` placeholder in the user message is replaced by the encoded structural tokens at inference time.
+## Training details
+- **Base model**: Llama-3.1-8B-Instruct
+- **Encoder**: [Cuttlefish-Encoder](https://huggingface.co/zihaojing/Cuttlefish-Encoder) (pretrained on all-atom graph data)
+- **SFT data**: [Cuttlefish-SFT-Data](https://huggingface.co/datasets/zihaojing/Cuttlefish-SFT-Data)
+- **Training stages**: 2-stage SFT — connector training then full LLM fine-tuning with LoRA
+## Related resources
+| Resource | Link |
+|---|---|
+| Cuttlefish-Encoder | [zihaojing/Cuttlefish-Encoder](https://huggingface.co/zihaojing/Cuttlefish-Encoder) |
+| SFT instruction data | [zihaojing/Cuttlefish-SFT-Data](https://huggingface.co/datasets/zihaojing/Cuttlefish-SFT-Data) |
+| Encoder pretraining data | [zihaojing/Cuttlefish-Encoder-Data](https://huggingface.co/datasets/zihaojing/Cuttlefish-Encoder-Data) |