Jolia — A 3D CT foundation model with anatomical representations

Jolia is a 3D CT foundation model that encodes images into vector representations program. It encodes a whole 3D CT volume into:

a global embedding (embed_dim = 576), and
per-organ embeddings — 102 named organ slots produced by organ-query cross-attention pooling, trained to align with per-organ report text.

Installation

pip install torch transformers timm einops numpy safetensors

Quick start

import torch
from transformers import AutoModel

model = AutoModel.from_pretrained("raidium/Jolia", trust_remote_code=True).eval()

# image: a preprocessed CT volume, shape (B, 11, 192, 192, 192) — see Preprocessing
with torch.no_grad():
    cls = model(image).pooler_output          # (B, 576) global embedding

Preprocessing

Raw CT volumes must be brought to the Atlas input format ((11, 192, 192, 192): 1.5 mm isotropic, 192³ crop, 11 CT windowing channels). Grab the bundled preprocessor from the repo:

from huggingface_hub import snapshot_download
import sys
repo = snapshot_download("raidium/Jolia")
sys.path.append(repo)
from preprocessing_jolia import JoliaPreprocessor

pre = JoliaPreprocessor()
# volume: (H, W, D) in Hounsfield units; resolution in mm (row, col, slice)
image = pre(volume, resolution=(0.7, 0.7, 1.0)).unsqueeze(0)   # (1, 11, 192, 192, 192)

Working with organ queries (the easy way)

Per-organ embeddings are addressed by name

# All 102 organs as {name: (B, 576)}
organs = model.encode_organs(image)

# A subset, L2-normalized (cosine-ready)
sub = model.encode_organs(image, organs=["liver", "spleen", "pancreas"], normalize=True)

print(model.organ_slot_names)   # the 102 available organ names

For linear probing, the concatenated normalized feature is one call:

flat = model.extract_flat_feature(image)   # (B, 576 * (1 + num_organs))

Model details


Backbone	`MultiModalAtlas` — multi-scale 3D ViT, `dim=192`, heads `6`, stages `[2, 2, 8]`
Patch embed	`6×6×6`, 11 input channels (CT windowing), `merge_ratio = 4³`
Global embedding	576-d
Organ queries	102 slots × 192-d × 3 scales → 576-d
Parameters	~22 M (89 MB `safetensors`)
Input	`(B, 11, 192, 192, 192)` float32
Training data	INSPECT, CT-RATE, Stanford-Abdominal-CT (chest + abdomen CT)
Objectives	Volume–report CLIP + per-organ ParallelOrganCLIP

The 102 organ-slot names are the alphabetically-sorted union of per-organ report sections across the training datasets; slots 102–199 are unused padding. Methods like encode_organs expose only the named slots.

Outputs

model(image) returns a JoliaOutput with:

pooler_output — (B, 576) global embedding,
organ_queries — (B, num_organs, 576), populated when called with output_organ_queries=True.

Intended use & limitations

⚠️ Research preview. Not a medical device; not for clinical use.

Jolia is a feature extractor for downstream radiology tasks (classification, retrieval, per-organ analysis) via linear probing or fine-tuning. It is trained on adult chest/abdominal CT and will not generalize to other modalities or unusual acquisition protocols. It does not produce diagnoses and must not be used for clinical decision-making.

Citation

@misc{raidium_jolia,
  title  = {Jolia: a 3D CT Atlas foundation model with per-organ queries},
  author = {Raidium},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/raidium/Jolia}}
}

Downloads last month: 19

Safetensors

Model size

22.4M params

Tensor type

F32