Haiku / README.md

Upload Haiku trimodal checkpoint + tokenizer + marker assets

c39883a verified about 2 months ago

977 Bytes

	---
	library_name: pytorch
	tags:
	- multimodal
	- histology
	- codex
	- retrieval
	license: other
	---

	# Haiku — Trimodal (CODEX + H&E + Text) Retrieval Model

	This repo bundles a fine-tuned Haiku checkpoint together with the tokenizer
	and marker assets needed to run inference without any additional downloads
	from `xiangjx/musk` or `microsoft/BiomedNLP-BiomedBERT-*`.

	## Contents
	- `haiku_state_dict.pt` — model weights (CODEX + H&E + Text encoders + projections)
	- `config.json` — architecture config + marker lists
	- `tokenizer/` — BiomedBERT tokenizer files (+ bert config)
	- `esm_embeddings/` — per-biomarker ESM embeddings (also embedded in state_dict; kept here for downstream use)
	- `vocab.pkl` — marker vocabulary

	## Quick start
	```python
	from models import Haiku

	model, tokenizer, marker_embedding = Haiku.from_pretrained(
	"zhihuanglab/Haiku",
	device="cuda",
	token="hf_...", # omit if HF_TOKEN / hf auth login is set
	)
	model.eval()
	```