PHerc. Paris 4 — 2-class fiber segmentation, DINO-embedding-guided self-training (step 10000)

Segments fiber vs. background (2 classes — not an orientation split, see below) in 3D, directly in micro-CT of PHerc. Paris 4, trained with no fixed ground truth: a pseudo-label is regenerated every step from a frozen teacher UNet's own predictions, adaptively thresholded and refined against a DINO-embedding similarity map.

This is a precursor / component checkpoint, not villa's flagship 4-class fiber/ink model. It is published because it is real, verified lineage: per villa's own scripts/fiber_5class/train.py (PR #985) module docstring, this exact checkpoint — referred to there by its W&B run ID, "ihoo3tpl ckpt" — is loaded as the frozen fiber teacher input to villa's separate, later, 4-class (background / vertical fiber / horizontal-angular fiber / ink) self-distillation trainer. See Relationship to the 4-class model below; we do not currently have that 4-class model's weights to publish.

Model details


Architecture	`vesuvius` `NetworkFromConfig` 3D UNet (`shared_encoder` / `shared_decoder` / `task_heads`, the same family used across villa's segmentation models), single `fibers` head
Output	2 channels, softmax. Channel 0 = background, channel 1 = fiber foreground — confirmed via villa's own `label_generator.py::_to_fg_prob`, the function that loads this exact checkpoint as a frozen teacher (2-channel case: `softmax(logits, dim=1)[:, 1:2]`)
Input	1-channel CT, 256³ patches
This checkpoint	step 10000 · W&B run `ps256_fiber_dinoguided__dino362500__cls3__embbackbone__ddp8__20260522` (`ihoo3tpl`, project `vesuvius_fibers_3d`, state finished)
Weights	`model` (raw) and `ema` (EMA — recommended for inference, decay 0.9995)
Optimisation	SGD + Nesterov (lr 0.005, momentum 0.99, weight_decay 3e-5), cosine LR, 1500-step warmup, bf16 mixed precision, BCE + soft Dice (0.1 label smoothing each), batch size 2 × 8 GPUs (`ddp8`), 12000 total iterations
Trained on	PHerc. Paris 4, 2.4 µm scan (`s3://vesuvius-challenge-open-data/PHercParis4/volumes/20260411134726-2.400um-0.2m-78keV-masked.zarr/`)

Training procedure: DINO-embedding-guided dynamic pseudo-labeling

Unlike a conventional teacher→student distillation with a fixed label set, this run generates a new pseudo-label every step from:

A frozen self-trained fiber UNet's own probability map — scrollprize/fiber_selftrain_teacher_epoch30 (the same checkpoint also warm-starts this model's own weights, i.e. this is continued self-training, not distillation from an unrelated architecture).
An Otsu-adaptive light/dark voxel threshold (otsu_light_threshold=70, plus otsu_min_light_voxels, otsu_fallback_threshold, otsu_tail_floor_percentile, otsu_min_tail_voxels).
A similarity map between the supcon-fine-tuned DINO backbone, step 362500's dense patch embeddings and a single reference "fiber" prototype embedding (avg_fiber_embedding__864d_backbone.npz, bundled in that backbone's repo), computed at stride 128 and blended in with dino_blend_sigma=4.0.

We reconstructed this description from the run's own config field names (dynamic_label.*); the specific script implementing this exact blend was not found in the available villa repository snapshot (unlike the 4-class pipeline discussed below, whose source we did read directly), so treat the precise algorithmic combination as a well-supported inference, not a verbatim account of the code.

Checkpoint provenance: a mid-run bugfix

This run's own config records resume_from_ckpt pointing at its own ckpt_002000.pth, under the same W&B run ID (ihoo3tpl) — confirming training paused at step 2000 and resumed later in the same run (verified both via the W&B API and by loading this checkpoint's own embedded config directly, which lists wandb_run_id: ihoo3tpl and the matching resume_from_ckpt path). This lines up exactly with the two checkpoints produced:

step_002000__resume_point__pre_dark_mask_fix.pth — not published here (superseded).
step_010000__post_dark_mask_fix__latest.pth — this repo.

The filenames describe this as a fix to "dark voxel masking." We can confirm the resume-at-step-2000 mechanics precisely (embedded config + matching filenames + same run ID) but cannot independently confirm the exact nature of the underlying bug/fix — plausible candidate parameters visible in the (post-fix) config relate to dark/light voxel handling (input_mask_threshold, otsu_light_threshold, the dataset's dark_threshold), but we only have the post-fix config, not a diff against the pre-fix run.

Metrics

This run has no held-out validation metric — its W&B summary contains no val_* key at all, because the entire pipeline is label-free self-training (there is no independent ground truth to validate against). Final logged values at step 11999 (of a 12000-step schedule; run marked finished):

metric	value
`loss` (bce + dice)	0.7824
`loss_bce`	0.3318
`loss_dice`	0.4506
`pseudo_fg_frac` (dynamic label's foreground fraction)	0.132
`sim_mean` (mean DINO-embedding similarity to the reference)	0.432
`otsu_threshold` (final adaptive cut)	0.503
`lr`	≈1.1×10⁻¹⁰ (cosine-annealed to ~0)

These are training-loop / pseudo-label-consistency statistics, not accuracy against independently verified ground truth.

Relationship to the 4-class fiber/ink model — please read before assuming this is that model

Villa's actual finished 4-class (background / vertical fiber / horizontal-angular fiber / ink) self-distillation model — matching scripts/fiber_5class exactly (watershed-from-minima + per-instance PCA orientation split + ink-teacher override + dark-voxel guard) — is a separate checkpoint: W&B run p4_4class_ddp8_20260526 (36pykwky, project paris4-full-features, state finished, created 2026-05-26 — four days after this run). Its config confirms it uses this checkpoint's lineage as its frozen fiber teacher, plus a separate frozen ink teacher that we do not have. We do not currently have that 4-class model's weight files: its training out_dir was an ephemeral cloud-instance scratch path (/ephemeral/fiber_5class_ckpts/p4_4class_ddp8_20260526), not S3, and a scoped search of s3://philodemos/giorgio/PHercParis4/ turned up only DINO-backbone checkpoints and what appears to be stitched inference output (not model weights). It is not published on HuggingFace at this time.

For reference, that run's self-consistency metrics (student vs. its own pseudo-label on the training crop — again, not held-out validation): dice_0_bg=0.961, dice_1_vert_fiber=0.673, dice_2_horiz_fiber=0.705, dice_3_ink=0.791, dice_fg_mean=0.723.

Prior / sibling work

An earlier, independently-trained, supervised 2-class horizontal/vertical fiber model (traced from WebKnossos skeleton annotations via cross-frame affine registration, villa PR #825) is already published as scrollprize/fiber_hz_vt (W&B run xnjpitfg, project fibers, val_hzvt_mean_dice=0.60, val_hzvt_mean_iou=0.52). That model predicts horizontal-vs-vertical orientation from real annotations; this model predicts fiber-vs-background from self-generated pseudo-labels. They are not directly comparable.

Files

File	Size	Role
`step_010000__post_dark_mask_fix__latest.pth`	~2.1 GB	`model` (raw) + `ema.model_state` (recommended for inference) + `optimizer` + embedded `config`.
`config.json`	—	The same training config embedded in the checkpoint, for quick inspection without loading the full file.

Usage

import torch
from huggingface_hub import hf_hub_download

path = hf_hub_download(
    "scrollprize/fiber_dinoguided_2class_step010000",
    "step_010000__post_dark_mask_fix__latest.pth",
)
ckpt = torch.load(path, map_location="cpu", weights_only=False)
state = ckpt["ema"]["model_state"]   # recommended over ckpt["model"]
# Build with vesuvius' NetworkFromConfig (target "fibers", out_channels=2,
# in_channels=1, patch_size 256^3) then load_state_dict(state).

The vesuvius package is in https://github.com/ScrollPrize/villa.

Related models

Frozen fiber teacher / weight init for this run: scrollprize/fiber_selftrain_teacher_epoch30
Frozen DINO backbone used for guidance: scrollprize/dinovol_v2_ps8_supcon3class_step362500
Prior supervised hz/vt model: scrollprize/fiber_hz_vt

License

MIT.

Downloads last month: -

Collection including scrollprize/fiber_dinoguided_2class_step010000

Representation

Collection

11 items • Updated about 10 hours ago

scrollprize
/

fiber_dinoguided_2class_step010000