Preprocessing Specification
Image (visual.onnx)
- Input shape:
[N, 3, 336, 336](NCHW, batch first) - Input dtype: float32
- Layout: RGB
- Resolution: 336×336 (center crop or resize without distortion to fill)
- Normalization: per-channel
(pixel / 255 - mean) / std
| Channel | mean | std |
|---|---|---|
| R | 0.48145466 | 0.26862954 |
| G | 0.4578275 | 0.26130258 |
| B | 0.40821073 | 0.27577711 |
Text (textual.onnx)
- Input shape:
[N, 77] - Input dtype: int64
- Lowercase: yes
- Sequence:
[BOS] + token_ids + [EOS], pad with 0 to length 77 - Special IDs: pad=0, unk=1, bos=2, eos=3
- Tokenizer:
tokenizer.jsonorbpe.model(YouTokenToMe)