CoEdIT-Large ONNX (INT8 Quantized)

ONNX export of grammarly/coedit-large (770M params, flan-t5-large) optimized for @huggingface/transformers v3+.

Includes both FP32 and INT8 quantized versions. The INT8 quantized model is ~780MB total and runs in browser via WASM or WebGPU.

Usage

import { pipeline } from '@huggingface/transformers';

const pipe = await pipeline('text2text-generation', 'rabden/coedit-large-onnx', {
  quantized: true,
  dtype: 'q8',
});

const result = await pipe(
  'Fix grammatical errors in this sentence: ' +
  'The protocol utilize a novel encryption scheme that ensure data integrity across multiple node.',
  {
    max_new_tokens: 64,
  }
);

console.log(result[0].generated_text);
// "The protocol utilizes a novel encryption scheme that ensures data integrity across multiple nodes."

Generation Config

The model has repetition_penalty: 1.5 baked in by default to prevent repeated output. You can override it:

const result = await pipe(text, {
  max_new_tokens: 64,
  repetition_penalty: 1.0, // disable
});

Files

File Size Description
onnx/encoder_model_quantized.onnx 326 MB INT8 quantized encoder
onnx/decoder_model_merged_quantized.onnx 454 MB INT8 quantized decoder (with lm_head)
onnx/encoder_model.onnx 1302 MB FP32 encoder
onnx/decoder_model_merged.onnx 1812 MB FP32 decoder (with lm_head)
config.json — T5 config
generation_config.json — Generation parameters
tokenizer.json / spiece.model — T5 tokenizer

Performance

Tested on Node.js (WASM backend, Intel Xeon, quantized):

  • Load time: ~7s (cached)
  • Inference: 300ms–1300ms per sentence (varies with length)

WebGPU backend is faster but requires browser with WebGPU support.

Model Details

License

CC-BY-NC-4.0 (same as the original model).

Downloads last month
147
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rabden/coedit-large-onnx

Quantized
(5)
this model

Paper for rabden/coedit-large-onnx