ttkacheff's picture
Upload folder using huggingface_hub
de7a6b5 verified

Preprocessing Specification

Image (visual.onnx)

  • Input shape: [N, 3, 336, 336] (NCHW, batch first)
  • Input dtype: float32
  • Layout: RGB
  • Resolution: 336×336 (center crop or resize without distortion to fill)
  • Normalization: per-channel (pixel / 255 - mean) / std
Channel mean std
R 0.48145466 0.26862954
G 0.4578275 0.26130258
B 0.40821073 0.27577711

Text (textual.onnx)

  • Input shape: [N, 77]
  • Input dtype: int64
  • Lowercase: yes
  • Sequence: [BOS] + token_ids + [EOS], pad with 0 to length 77
  • Special IDs: pad=0, unk=1, bos=2, eos=3
  • Tokenizer: tokenizer.json or bpe.model (YouTokenToMe)