Scene Atlas Models
Collection
ONNX-exported and quantised vision-language models for Scene Atlas. These are derivative works — see individual model cards for attribution. • 3 items • Updated
This is an ONNX export and quantisation of google/siglip2-base-patch16-224, not an original model.
All credit for the original model goes to its authors. This repo exists solely to host pre-exported ONNX variants for use by Scene Atlas, a tabletop RPG scene management tool.
| Encoder | Quantisation | Size |
|---|---|---|
| Vision | fp16 | 177.4 MB |
| Text | int8 | 290.6 MB |
| Total | 468.0 MB |
| Parameter | Value |
|---|---|
model_family |
siglip2 |
embedding_dim |
768 |
image_size |
224 |
image_mean |
0.5000, 0.5000, 0.5000 |
image_std |
0.5000, 0.5000, 0.5000 |
interpolation |
bilinear |
resize_mode |
direct_resize |
tokenizer_type |
sentencepiece |
tokenizer_max_length |
64 |
These models are intended for use with Scene Atlas. The repo contains
clip_vision_encoder.onnx, clip_text_encoder.onnx, manifest.json,
and a tokenizer/ directory — all at the repo root.
from huggingface_hub import hf_hub_download
# Download encoder files
hf_hub_download(
repo_id="jennis0/scene-atlas-medium",
filename="clip_vision_encoder.onnx",
)
Please refer to the original model card for licensing information.
Base model
google/siglip2-base-patch16-224