Haiku / README.md
C1vy's picture
Upload Haiku trimodal checkpoint + tokenizer + marker assets
c39883a verified
metadata
library_name: pytorch
tags:
  - multimodal
  - histology
  - codex
  - retrieval
license: other

Haiku — Trimodal (CODEX + H&E + Text) Retrieval Model

This repo bundles a fine-tuned Haiku checkpoint together with the tokenizer and marker assets needed to run inference without any additional downloads from xiangjx/musk or microsoft/BiomedNLP-BiomedBERT-*.

Contents

  • haiku_state_dict.pt — model weights (CODEX + H&E + Text encoders + projections)
  • config.json — architecture config + marker lists
  • tokenizer/ — BiomedBERT tokenizer files (+ bert config)
  • esm_embeddings/ — per-biomarker ESM embeddings (also embedded in state_dict; kept here for downstream use)
  • vocab.pkl — marker vocabulary

Quick start

from models import Haiku

model, tokenizer, marker_embedding = Haiku.from_pretrained(
    "zhihuanglab/Haiku",
    device="cuda",
    token="hf_...",  # omit if HF_TOKEN / hf auth login is set
)
model.eval()