Add bf16_backbone safetensors variant for smaller downloads

Adds a second safetensors file alongside the existing canonical model.safetensors
that stores the EUPE-ViT-B backbone weights in bfloat16 instead of float32.
The two files contain the same model and produce identical inference behavior;
the smaller variant exists to give users with limited download bandwidth or
disk space a 49% smaller file (170 MB vs 334 MB).

Files
-----
| File | Backbone | Heads | Size |
|-----------------------------------|----------|-------|---------|
| model.safetensors | FP32 | FP32 | 334 MB |
| model.bf16_backbone.safetensors | BF16 | FP32 | 170 MB |

Loading
-------
The new variant follows Hugging Face's convention:

from transformers import AutoModel
model = AutoModel.from_pretrained(
"phanerozoic/argus",
trust_remote_code=True,
variant="bf16_backbone",
)

Calling without continues to load the canonical
FP32 file and is fully backwards compatible with all prior usage.

Why this is safe at inference
------------------------------
The EUPE-ViT-B backbone already runs through
inside the existing inference path, which means the BF16 stored weights match
the precision used in the forward pass to begin with. Storing them at BF16 on
disk is therefore not a precision sacrifice on the backbone side.

Both variants load into the same FP32 model in memory because PyTorch
automatically upcasts the BF16 stored weights to FP32 at construction time.
The smaller variant is a download optimization, not an in-memory inference
memory optimization. kNN top-1 outputs match between the two variants within
sub-millisecond cosine-similarity drift on the standard cat test image
( from FP32, from BF16 backbone), well below
the noise floor of any downstream task.

Why the heads are kept at FP32
-------------------------------
The segmentation head, depth head, linear softmax classifier, and class
prototypes together account for under 10 MB of the total checkpoint size.
A full-BF16 variant was implemented and tested but rejected because the
additional savings from casting the heads to BF16 amount to roughly 3 MB on
top of the BF16 backbone, which is not a meaningful storage win and would
introduce real precision risk on the small head weights.

Files changed
-------------
- model.bf16_backbone.safetensors: new file, 170 MB
- README.md: brief Precision variants section listing both files and the
load command for each

Files changed (1) hide show

README.md +9 -0

README.md CHANGED Viewed

@@ -174,6 +174,15 @@ The backbone is frozen for every task. Only the task heads are trained, and the
 The trainable heads sum to approximately 1.09M parameters (seg 117K + depth 201K + linear classifier 769K), which is 1.3% of the 85.7M backbone. The unified `model.safetensors` is 334 MB, almost entirely the backbone.
 ### Architecture details
 **Segmentation head** is `BatchNorm2d(768) → Conv2d(768, 150, 1×1)` — 116,886 parameters, 1.4 MB on disk. Trained at 512×512 with cross-entropy loss, AdamW (lr 1e-3, weight decay 1e-3), WarmupOneCycleLR with 1500-step warmup, batch size 16.

 The trainable heads sum to approximately 1.09M parameters (seg 117K + depth 201K + linear classifier 769K), which is 1.3% of the 85.7M backbone. The unified `model.safetensors` is 334 MB, almost entirely the backbone.
+### Precision variants
+Two safetensors files with the same weights at different on-disk precision. Inference behavior is identical; the smaller file is for users with limited bandwidth or storage.
+| File | Size | Load |
+|---|---|---|
+| `model.safetensors` | 334 MB | `AutoModel.from_pretrained("phanerozoic/argus", trust_remote_code=True)` |
+| `model.bf16_backbone.safetensors` | 170 MB | `AutoModel.from_pretrained("phanerozoic/argus", trust_remote_code=True, variant="bf16_backbone")` |
 ### Architecture details
 **Segmentation head** is `BatchNorm2d(768) → Conv2d(768, 150, 1×1)` — 116,886 parameters, 1.4 MB on disk. Trained at 512×512 with cross-entropy loss, AdamW (lr 1e-3, weight decay 1e-3), WarmupOneCycleLR with 1500-step warmup, batch size 16.