Add/update the quantized ONNX model files and README.md for Transformers.js v3

## Applied Quantizations

### ✅ Based on `decoder_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_model_bnb4.onnx` (added)

### ✅ Based on `decoder_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_model_bnb4.onnx` (added)

### ✅ Based on `encoder_model.onnx` *with* slimming

↳ ✅ `int8`: `encoder_model_int8.onnx` (added)
↳ ✅ `uint8`: `encoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `encoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `encoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `encoder_model_bnb4.onnx` (added)

### ✅ Based on `encoder_model.onnx` *with* slimming

↳ ✅ `int8`: `encoder_model_int8.onnx` (added)
↳ ✅ `uint8`: `encoder_model_uint8.onnx` (added)
↳ ✅ `q4`: `encoder_model_q4.onnx` (added)
↳ ✅ `q4f16`: `encoder_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `encoder_model_bnb4.onnx` (added)

### ✅ Based on `decoder_with_past_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_with_past_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_with_past_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_with_past_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_with_past_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_with_past_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_with_past_model_bnb4.onnx` (added)

### ✅ Based on `decoder_with_past_model.onnx` *with* slimming

↳ ✅ `fp16`: `decoder_with_past_model_fp16.onnx` (added)
↳ ✅ `int8`: `decoder_with_past_model_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_with_past_model_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_with_past_model_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_with_past_model_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_with_past_model_bnb4.onnx` (added)

### ❌ Based on `decoder_model_merged.onnx` *with* slimming

```
0%| | 0/1 [00:00<?, ?it/s]
Processing /tmp/tmpib281xyf/decoder_model_merged.onnx: 0%| | 0/1 [00:00<?, ?it/s]

0%| | 0/6 [00:00<?, ?it/s][A

- Quantizing to fp16: 0%| | 0/6 [00:00<?, ?it/s][A/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.0047397903132378e-09 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -1.2693035511546213e-09 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -4.166123002136146e-09 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.941624626691009e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:85: UserWarning: the float32 number -3.4028234663852886e+38 will be truncated to -10000.0
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -2.9807503132417423e-09 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 5.334725372563298e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 4.0370661480437775e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.5335105508238485e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -2.4782831786751558e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 9.57075414476094e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -7.643991217776147e-09 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -1.443019037594695e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 2.609251481544561e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -2.3766582035733563e-09 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -7.4339041589155386e-09 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 4.379340268201304e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -4.9057206297220546e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 5.537021952051191e-09 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.092543110483348e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -7.253074585378272e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 6.596820156801186e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 4.217941196316133e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 4.548723353536843e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -7.784755950979161e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -1.813730321487128e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -7.175447080953745e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 4.1332341993438604e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.185881425338266e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -2.7688393799962796e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -8.754327041060606e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 6.787500694827031e-08 will be truncated to 1e-07
warnings.warn(

- Quantizing to fp16: 0%| | 0/6 [00:04<?, ?it/s]

Processing /tmp/tmpib281xyf/decoder_model_merged.onnx: 0%| | 0/1 [00:04<?, ?it/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 377, in <module>
main()
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 374, in main
quantize(input_folder, output_folder, quantization_args)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 309, in quantize
quantize_fp16(
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 223, in quantize_fp16
check_and_save_model(model_fp16, save_path)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 29, in check_and_save_model
strict_check_model(model)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 21, in strict_check_model
raise e
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 16, in strict_check_model
onnx.checker.check_model(model_or_path, full_check=True)
File "/home/ubuntu/.cache/uv/archive-v0/7hYcxZ8pwavXeKpAYRaHY/lib/python3.12/site-packages/onnx/checker.py", line 179, in check_model
C.check_model(
onnx.onnx_cpp2py_export.shape_inference.InferenceError: [ShapeInferenceError] Inference error(s): (op_type:If, node name: optimum::if): [ShapeInferenceError] Inference error(s): (op_type:Add, node name: /model/decoder/embed_positions/Add): [ShapeInferenceError] Inferred shape and existing shape differ in rank: (1) vs (0)
```

### ✅ Based on `decoder_model_merged.onnx` *without* slimming

↳ ✅ `fp16`: `decoder_model_merged_fp16.onnx` (replaced because it was invalid)
↳ ✅ `int8`: `decoder_model_merged_int8.onnx` (added)
↳ ✅ `uint8`: `decoder_model_merged_uint8.onnx` (added)
↳ ✅ `q4`: `decoder_model_merged_q4.onnx` (added)
↳ ✅ `q4f16`: `decoder_model_merged_q4f16.onnx` (added)
↳ ✅ `bnb4`: `decoder_model_merged_bnb4.onnx` (added)

Files changed (23) hide show

onnx/decoder_model_bnb4.onnx +3 -0
onnx/decoder_model_fp16.onnx +3 -0
onnx/decoder_model_int8.onnx +3 -0
onnx/decoder_model_merged_bnb4.onnx +3 -0
onnx/decoder_model_merged_fp16.onnx +2 -2
onnx/decoder_model_merged_int8.onnx +3 -0
onnx/decoder_model_merged_q4.onnx +3 -0
onnx/decoder_model_merged_q4f16.onnx +3 -0
onnx/decoder_model_merged_uint8.onnx +3 -0
onnx/decoder_model_q4.onnx +3 -0
onnx/decoder_model_q4f16.onnx +3 -0
onnx/decoder_model_uint8.onnx +3 -0
onnx/decoder_with_past_model_bnb4.onnx +3 -0
onnx/decoder_with_past_model_fp16.onnx +3 -0
onnx/decoder_with_past_model_int8.onnx +3 -0
onnx/decoder_with_past_model_q4.onnx +3 -0
onnx/decoder_with_past_model_q4f16.onnx +3 -0
onnx/decoder_with_past_model_uint8.onnx +3 -0
onnx/encoder_model_bnb4.onnx +3 -0
onnx/encoder_model_int8.onnx +3 -0
onnx/encoder_model_q4.onnx +3 -0
onnx/encoder_model_q4f16.onnx +3 -0
onnx/encoder_model_uint8.onnx +3 -0

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a676609d22e44204847bd60d56494056806c7f41ee5033152802590ec9306b0e
-size 112223807

+version https://git-lfs.github.com/spec/v1
+oid sha256:db408802c688d5ddbf2223107f41df80142942b54f6b08c021ab30dd21c53914
+size 136840046

+version https://git-lfs.github.com/spec/v1
+oid sha256:fbdcb04ec1348b9e63cc74300d1220fa4db55f02f2422d98b751b4182fdcc86a
+size 111801344

+version https://git-lfs.github.com/spec/v1
+oid sha256:783a1d8a595f32ff22f6abca9310d3e706e7f52689f5a3f508ccd5b3fd67ae69
+size 177351391

+version https://git-lfs.github.com/spec/v1
+oid sha256:ddf4f09e4a463f09b53d872cab9854cf2e6d78cd87ee22b6908bd87f69887bf9
+size 137097735

 version https://git-lfs.github.com/spec/v1
+oid sha256:0d0c9cf530de4308d319d08d29ef98f2113841b39d9f0e2a651d993c4842043a
+size 112229401

+version https://git-lfs.github.com/spec/v1
+oid sha256:11a3bb388acdcd2f22ee652e07f5f2b28a135600a8fd30d3fd351ca94464da51
+size 56932498

+version https://git-lfs.github.com/spec/v1
+oid sha256:342cf7ee96f3e7fde97f95c2426d04e3940bf1d466282f6e89dbef451480108f
+size 138669687

+version https://git-lfs.github.com/spec/v1
+oid sha256:5f904b4c151965bb8af156ee57a05740d6daa0b346e249ee4858fd312faaac57
+size 76060952

+version https://git-lfs.github.com/spec/v1
+oid sha256:8d860d41195346de168b92a69d7d5c4f4253bb477add8477287244077bec2f51
+size 56932523

+version https://git-lfs.github.com/spec/v1
+oid sha256:d689b99abb21f3e4f835ae7cf46a4e557453bdb91c1036bd1a2b5560bd419af0
+size 138412430

	@@ -0,0 +1,3 @@

	@@ -0,0 +1,3 @@

	@@ -0,0 +1,3 @@

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fa0afe12c41e89f7504902af83efa9d65888d062eb505bc693c3f300f9366536
+size 75634419

+version https://git-lfs.github.com/spec/v1
+oid sha256:8e1672191fdefcef738c3feb881fed634a92c9b38e6844f2428f1021738dd36a
+size 177351416

+version https://git-lfs.github.com/spec/v1
+oid sha256:e2a2ebb0069afa690c8ffa7355f1c457fefbc09cd145fdfc366521472316504f
+size 134977542

+version https://git-lfs.github.com/spec/v1
+oid sha256:657dcf7192ceed71febbc66e81b4406ae3cf4e3d240b75d3965a0b237b9e5c51
+size 105435716

+version https://git-lfs.github.com/spec/v1
+oid sha256:48ef8f79e3759e90c36537ecb600168bdee930d3d22ae05c3951232576227a22
+size 174102598

+version https://git-lfs.github.com/spec/v1
+oid sha256:72025a1ffa599a794aa39d1442febfaac9d2e107c30f720156c5bb65ca047278
+size 136353414

+version https://git-lfs.github.com/spec/v1
+oid sha256:fd15b9380c65d8cb671341885dbc1c2155b29888926f20f0b7fccda41bb79692
+size 73788987

+version https://git-lfs.github.com/spec/v1
+oid sha256:9103761bf24709af2eb1d0971cb032180accf36154a7a20d80fa2338d14a4897
+size 174102616

+version https://git-lfs.github.com/spec/v1
+oid sha256:b48722f39f41f90fdc25da7717d09b1616b3b6ebac2fa76194f2bc2176f117c1
+size 132839357

+version https://git-lfs.github.com/spec/v1
+oid sha256:8ce52ad2d78e124c74bb812c5db4eb1e06389affaac2dd1363e33f9ae71654b3
+size 49669913

+version https://git-lfs.github.com/spec/v1
+oid sha256:8c21f39f09a07f8e28ad3f0985a814a7759c14e21c12fc43a541e2ec15f8974a
+size 134018717

+version https://git-lfs.github.com/spec/v1
+oid sha256:0fa45b28ba93fc815b793355c64bb0f6422d532d4d5f0ca1a9a8c75dc71ae2bd
+size 71787553

+version https://git-lfs.github.com/spec/v1
+oid sha256:d4df7086e2ce569a7f9398d0c253a0b64df6523b32059e636b162a8676a42060
+size 49669927