Add/update the quantized ONNX model files and README.md for Transformers.js v3

## Applied Quantizations

### ✅ Based on `decoder_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_model_bnb4.onnx` (added)

### ✅ Based on `decoder_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_model_bnb4.onnx` (added)

### ✅ Based on `encoder_model.onnx` *with* slimming

↳ ✅ `int8`: `encoder_model_int8.onnx` (added)
↳ ✅ `uint8`: `encoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `encoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `encoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `encoder_model_bnb4.onnx` (added)

### ✅ Based on `encoder_model.onnx` *with* slimming

↳ ✅ `int8`: `encoder_model_int8.onnx` (added)
↳ ✅ `uint8`: `encoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `encoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `encoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `encoder_model_bnb4.onnx` (added)

### ✅ Based on `decoder_with_past_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_with_past_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_with_past_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_with_past_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_with_past_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_with_past_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_with_past_model_bnb4.onnx` (added)

### ✅ Based on `decoder_with_past_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_with_past_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_with_past_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_with_past_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_with_past_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_with_past_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_with_past_model_bnb4.onnx` (added)

### ❌ Based on `decoder_model_merged.onnx` *with* slimming

```
0%| | 0/1 [00:00<?, ?it/s]
Processing /tmp/tmpuvq7br1v/decoder_model_merged.onnx: 0%| | 0/1 [00:00<?, ?it/s]

0%| | 0/6 [00:00<?, ?it/s][A

- Quantizing to fp16: 0%| | 0/6 [00:00<?, ?it/s][A/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 1.8291345868504116e-11 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -4.861599123273663e-09 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -4.000130360282128e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -6.774890692895497e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:85: UserWarning: the float32 number -3.4028234663852886e+38 will be truncated to -10000.0
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -8.337324430840454e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -9.715018478573256e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -2.690358336110421e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 1.158727425121242e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 2.0278964285580514e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -3.8145287817314966e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 4.8068901747910786e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -7.762974973957171e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -8.846985366517401e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -8.856412492264099e-09 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 2.5772958878178542e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -5.76245504646522e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -5.932278224207721e-09 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.034425333225954e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -1.0529447536100633e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -3.34704033377875e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 9.640467624194571e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 7.381922273452801e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -1.2450706243782861e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.574938745032341e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 8.586270894284098e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 9.717368953943151e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -3.032227358090722e-09 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.808339954503026e-08 will be truncated to 1e-07
warnings.warn(

- Quantizing to fp16: 0%| | 0/6 [00:04<?, ?it/s]

Processing /tmp/tmpuvq7br1v/decoder_model_merged.onnx: 0%| | 0/1 [00:04<?, ?it/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 377, in <module>
main()
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 374, in main
quantize(input_folder, output_folder, quantization_args)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 309, in quantize
quantize_fp16(
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 223, in quantize_fp16
check_and_save_model(model_fp16, save_path)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 29, in check_and_save_model
strict_check_model(model)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 21, in strict_check_model
raise e
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 16, in strict_check_model
onnx.checker.check_model(model_or_path, full_check=True)
File "/home/ubuntu/.cache/uv/archive-v0/7hYcxZ8pwavXeKpAYRaHY/lib/python3.12/site-packages/onnx/checker.py", line 179, in check_model
C.check_model(
onnx.onnx_cpp2py_export.shape_inference.InferenceError: [ShapeInferenceError] Inference error(s): (op_type:If, node name: optimum::if): [ShapeInferenceError] Inference error(s): (op_type:Add, node name: /model/decoder/embed_positions/Add): [ShapeInferenceError] Inferred shape and existing shape differ in rank: (1) vs (0)
```

### ✅ Based on `decoder_model_merged.onnx` *without* slimming

↳ ✅ `fp16`: `decoder_model_merged_fp16.onnx` (replaced because it was invalid)
↳ ✅ `int8`: `decoder_model_merged_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_model_merged_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_model_merged_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_model_merged_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_model_merged_bnb4.onnx` (added)

Files changed (23) hide show

onnx/decoder_model_bnb4.onnx +3 -0
onnx/decoder_model_fp16.onnx +3 -0
onnx/decoder_model_int8.onnx +3 -0
onnx/decoder_model_merged_bnb4.onnx +3 -0
onnx/decoder_model_merged_fp16.onnx +2 -2
onnx/decoder_model_merged_int8.onnx +3 -0
onnx/decoder_model_merged_q4.onnx +3 -0
onnx/decoder_model_merged_q4f16.onnx +3 -0
onnx/decoder_model_merged_uint8.onnx +3 -0
onnx/decoder_model_q4.onnx +3 -0
onnx/decoder_model_q4f16.onnx +3 -0
onnx/decoder_model_uint8.onnx +3 -0
onnx/decoder_with_past_model_bnb4.onnx +3 -0
onnx/decoder_with_past_model_fp16.onnx +3 -0
onnx/decoder_with_past_model_int8.onnx +3 -0
onnx/decoder_with_past_model_q4.onnx +3 -0
onnx/decoder_with_past_model_q4f16.onnx +3 -0
onnx/decoder_with_past_model_uint8.onnx +3 -0
onnx/encoder_model_bnb4.onnx +3 -0
onnx/encoder_model_int8.onnx +3 -0
onnx/encoder_model_q4.onnx +3 -0
onnx/encoder_model_q4f16.onnx +3 -0
onnx/encoder_model_uint8.onnx +3 -0

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:93ba8753ff3c9c8e1a567b6d184648ede759a15256300f71c3a567c0d78958df
-size 133773911

+version https://git-lfs.github.com/spec/v1
+oid sha256:8e76359400f6676644e3e5d889ab784ac8525e1660a9377a1abc49ae804594e0
+size 179940254

+version https://git-lfs.github.com/spec/v1
+oid sha256:600d7fcdecaaee863f2cd08dbce383c6dc56fc0a96f87fcd4341796a9867a1d4
+size 133351448

+version https://git-lfs.github.com/spec/v1
+oid sha256:052711ff0dcfc78f1237d25c0812e06ab3a68e4751b10856568ea01995b08a30
+size 231205646

+version https://git-lfs.github.com/spec/v1
+oid sha256:a4fd928b7e9843805c67b9eeab564b8202150af7adcd6783be4b8f017f732f10
+size 180197943

 version https://git-lfs.github.com/spec/v1
+oid sha256:54df323793a4b34bd4b2900573bdc620198f1992a3e41ed1154879aec66d9b2e
+size 133779505

+version https://git-lfs.github.com/spec/v1
+oid sha256:c0334a57c21714aac08c1229684e28013f9a290d7fbd5bd0d513e19c7a73bfc6
+size 67770561

+version https://git-lfs.github.com/spec/v1
+oid sha256:f32f8032d20b64fa61045aee21e35064656ecfbb6fe5976b555d135f84fbada1
+size 181769895

+version https://git-lfs.github.com/spec/v1
+oid sha256:ea38bcb5c208ea21015b21e98669e68118a0c12b78add1ea7526cd12ea3126a5
+size 97611056

+version https://git-lfs.github.com/spec/v1
+oid sha256:bd925a2c7efaa6b1a5777b9342f238dff5c557be76c42cdf392916d276423c27
+size 67770594

+version https://git-lfs.github.com/spec/v1
+oid sha256:88fb85030cfeed728308663acc7e0390ff1d5808f688b03968efd24076a6dd15
+size 181512638

	@@ -0,0 +1,3 @@

	@@ -0,0 +1,3 @@

	@@ -0,0 +1,3 @@

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:443b612845832ec754fb9574582f8eec2797b193730d9e465b2f9f47779ac918
+size 97184523

+version https://git-lfs.github.com/spec/v1
+oid sha256:aeb4554f426a25317beb2883ab70fc753dd4c2bb8c2f61cb56a0747e477843cc
+size 231205679

+version https://git-lfs.github.com/spec/v1
+oid sha256:9dc9289824f561021680ce3898e70e59f28d5767aae9f502793bfd5473d39dfe
+size 178077750

+version https://git-lfs.github.com/spec/v1
+oid sha256:a992cbe87720f7b7a242756bff8aeca96383c6c247e7528b14cbecf814a3039c
+size 126985820

+version https://git-lfs.github.com/spec/v1
+oid sha256:a2fba1f63bf022662786353115113151480eba300987f808bc7665b05b285713
+size 227956853

+version https://git-lfs.github.com/spec/v1
+oid sha256:7e9b914f399cfafb403a352544ef259f96a568d25e430ad1d7559ae7803209c4
+size 179453622

+version https://git-lfs.github.com/spec/v1
+oid sha256:afb77c19501cf444cd0eef010255f2b1d1ab4d8ce9d44452b05f2a8f9bc6cb3d
+size 95339091

+version https://git-lfs.github.com/spec/v1
+oid sha256:6a101bdd706ec43257394e00c667a3df4493c2e0bd4480041d4f891d53064299
+size 227956880

+version https://git-lfs.github.com/spec/v1
+oid sha256:f8b3412f1c83856296b15f37dd27fd23e7a55d15d14058825cacee769e92aa09
+size 175855549

+version https://git-lfs.github.com/spec/v1
+oid sha256:af0c1ef4fcd9c472e591cf7babd42ca3c3bae3f0cf2e1353c55b123638dc9bd0
+size 60423960

+version https://git-lfs.github.com/spec/v1
+oid sha256:f3cea818e78342e0246888753888e19b2ad3dc3f0a2ff6cea490814574e0d7ce
+size 177034909

+version https://git-lfs.github.com/spec/v1
+oid sha256:6de936a6571c42ec07f2a55adc25ba5ea3a07f51229ad7c7056a3bd8beb23775
+size 93295649

+version https://git-lfs.github.com/spec/v1
+oid sha256:72706d432e703d8d89bb4289b8a02a9752ec7630d34e37a5174c101d86b3b0d9
+size 60423974