OleehyO/latex-formulas-80M
Viewer • Updated • 78.2M • 18.6k • 25
ONNX export of OleehyO/TexTeller, an image-to-LaTeX model based on VisionEncoderDecoderModel.
This export is tuned for:
decoder_with_past_model.onnx with dynamic batch)Main files:
encoder_model.onnx
pixel_values of shape [batch_size, 1, 448, 448]decoder_model.onnx
decoder_with_past_model.onnx
input_ids: [batch_size, decoder_sequence_length]encoder_hidden_states: [batch_size, encoder_sequence_length, 768]past_key_values.N.decoder.{key,value}: [batch_size, 16, past_decoder_sequence_length, 64]past_key_values.N.encoder.{key,value}: [batch_size, 16, encoder_sequence_length, 64]Supporting files:
config.json – model configtokenizer.json / tokenizer_config.json – tokenizerpreprocessor_config.json – image preprocessing configpreprocessor_config.json:
{
"do_resize": true,
"size": { "height": 448, "width": 448 },
"resample": 3,
"do_normalize": true,
"image_mean": [0.9545467],
"image_std": [0.15394445],
"do_convert_rgb": false,
"num_channels": 1,
"feature_extractor_type": "ViTFeatureExtractor"
}
Important:
Input must be grayscale (num_channels = 1)
Resize to 448 × 448
Normalize (per channel):
x = (x / 255.0 - 0.9545467) / 0.15394445
If you use AutoProcessor / ImageProcessor in transformers / transformers.js with this repo, it will apply these settings automatically.
import { pipeline } from "@huggingface/transformers";
// Replace with this repo id
const MODEL_ID = "your-username/texteller-onnx";
const captioner = await pipeline("image-to-text", MODEL_ID, {
device: "webgpu", // or "wasm"
dtype: "fp16", // good default for WebGPU
});
// Any image source supported by transformers.js: URL, HTMLImageElement, etc.
const outputs = await captioner("path-or-url-to-image.png", {
max_new_tokens: 128,
});
console.log(outputs[0]?.generated_text);
decoder_with_past_model.onnx with dynamic batch, so you can implement your own batched, KV-cached beam search on top of model.forward and past_key_values.pipeline("image-to-text", ...) as shown above is enough.Base model
OleehyO/TexTeller