grammarly/coedit
Viewer • Updated • 70.8k • 1.09k • 96
How to use jbochi/candle-coedit-quantized with Transformers:
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("jbochi/candle-coedit-quantized")
model = AutoModelForSeq2SeqLM.from_pretrained("jbochi/candle-coedit-quantized")Quantized weights of CoEdIT for inference with candle.
You can run the smaller models directly from the browser using this space.
Clone candle, and run the quantized-t5 example:
$ cargo run --example quantized-t5 --release -- \
--model-id "jbochi/candle-coedit-quantized" \
--prompt "Make this text coherent: Their flight is weak. They run quickly through the tree canopy." \
--temperature 0
...
Although their flight is weak, they run quickly through the tree canopy.
By default, it will use CoEdIT-large with q6k quantization (770M params, 643 MB).
To use CoEdIT-xl (3B params, 2.34 GB), or any other provided model, specify the weight-file and config-file:
$ cargo run --example quantized-t5 --release -- \
--model-id "jbochi/candle-coedit-quantized" \
--weight-file "model-xl.gguf" \
--config-file "config-xl.json" \
--prompt "Rewrite to make this easier to understand: Note that a storm surge is what forecasters consider a hurricane's most treacherous aspect." \
--temperature 0
...
Note that a storm surge is what forecasters consider a hurricane's most dangerous part.
These are all the available formats. Weight file is named {model}.gguf and the config file is config-{base_model}.json
| Model | Base model | Quantization | # Params | Size |
|---|---|---|---|---|
| - | small (unofficial) | None | 77M | 308 MB |
| model-small | small | 6k | 77M | 78.2 MB |
| model-small-q4k | small | 4k | 77M | 59.6 MB |
| model-small-q4_0 | small | 4_0 | 77M | 43.4 MB |
| - | base (unofficial) | None | 248M | 990 MB |
| model-base | base | 6k | 248M | 194M |
| model-base-q4k | base | 4k | 248M | 133M |
| model-base-q4_0 | base | 4_0 | 248M | 133M |
| - | large | None | 770M | 3.13 GB |
| model | large | 6k | 770M | 643 MB |
| model-q4k | large | 4k | 770M | 441 MB |
| model-q4_0 | large | 4_0 | 770M | 441 MB |
| - | xl | None | 3B | 11.4 GB |
| model-xl | xl | 6k | 3B | 2.34 GB |
| model-xl-q4k | xl | 4k | 3B | 1.6 GB |
| model-xl-q4_0 | xl | 4_0 | 3B | 1.6 GB |
| - | xxl | None | 11B | 44.5 GB |
| model-xxl | xxl | 6k | 11B | 9.14 GB |
| model-xxl-q4k | xxl | 4k | 11B | 6.27 GB |
| model-xxl-q4_0 | xxl | 4_0 | 11B | 6.27 GB |
The weights were quantized using candle:
cargo run --example tensor-tools --release -- quantize \
--quantization q6k \
/path/to/coedit-<version>/model.safetensors \
--out-file model<version>.gguf
# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("jbochi/candle-coedit-quantized") model = AutoModelForSeq2SeqLM.from_pretrained("jbochi/candle-coedit-quantized")