Haiku / README.md
C1vy's picture
Upload Haiku trimodal checkpoint + tokenizer + marker assets
c39883a verified
---
library_name: pytorch
tags:
- multimodal
- histology
- codex
- retrieval
license: other
---
# Haiku β€” Trimodal (CODEX + H&E + Text) Retrieval Model
This repo bundles a fine-tuned Haiku checkpoint together with the tokenizer
and marker assets needed to run inference without any additional downloads
from `xiangjx/musk` or `microsoft/BiomedNLP-BiomedBERT-*`.
## Contents
- `haiku_state_dict.pt` β€” model weights (CODEX + H&E + Text encoders + projections)
- `config.json` β€” architecture config + marker lists
- `tokenizer/` β€” BiomedBERT tokenizer files (+ bert config)
- `esm_embeddings/` β€” per-biomarker ESM embeddings (also embedded in state_dict; kept here for downstream use)
- `vocab.pkl` β€” marker vocabulary
## Quick start
```python
from models import Haiku
model, tokenizer, marker_embedding = Haiku.from_pretrained(
"zhihuanglab/Haiku",
device="cuda",
token="hf_...", # omit if HF_TOKEN / hf auth login is set
)
model.eval()
```