Add/update the quantized ONNX model files and README.md for Transformers.js v3
Browse files## Applied Quantizations
### β
Based on `decoder_model.onnx` *with* slimming
β³ β
`fp16`: `decoder_model_fp16.onnx` (added)
β³ β
`int8`: `decoder_model_int8.onnx` (added)
β³ β
`uint8`: `decoder_model_uint8.onnx` (added)
β³ β
`q4`: `decoder_model_q4.onnx` (added)
β³ β
`q4f16`: `decoder_model_q4f16.onnx` (added)
β³ β
`bnb4`: `decoder_model_bnb4.onnx` (added)
### β
Based on `decoder_model.onnx` *with* slimming
β³ β
`fp16`: `decoder_model_fp16.onnx` (added)
β³ β
`int8`: `decoder_model_int8.onnx` (added)
β³ β
`uint8`: `decoder_model_uint8.onnx` (added)
β³ β
`q4`: `decoder_model_q4.onnx` (added)
β³ β
`q4f16`: `decoder_model_q4f16.onnx` (added)
β³ β
`bnb4`: `decoder_model_bnb4.onnx` (added)
### β
Based on `encoder_model.onnx` *with* slimming
β³ β
`int8`: `encoder_model_int8.onnx` (added)
β³ β
`uint8`: `encoder_model_uint8.onnx` (added)
β³ β
`q4`: `encoder_model_q4.onnx` (added)
β³ β
`q4f16`: `encoder_model_q4f16.onnx` (added)
β³ β
`bnb4`: `encoder_model_bnb4.onnx` (added)
### β
Based on `encoder_model.onnx` *with* slimming
β³ β
`int8`: `encoder_model_int8.onnx` (added)
β³ β
`uint8`: `encoder_model_uint8.onnx` (added)
β³ β
`q4`: `encoder_model_q4.onnx` (added)
β³ β
`q4f16`: `encoder_model_q4f16.onnx` (added)
β³ β
`bnb4`: `encoder_model_bnb4.onnx` (added)
### β
Based on `decoder_with_past_model.onnx` *with* slimming
β³ β
`fp16`: `decoder_with_past_model_fp16.onnx` (added)
β³ β
`int8`: `decoder_with_past_model_int8.onnx` (added)
β³ β
`uint8`: `decoder_with_past_model_uint8.onnx` (added)
β³ β
`q4`: `decoder_with_past_model_q4.onnx` (added)
β³ β
`q4f16`: `decoder_with_past_model_q4f16.onnx` (added)
β³ β
`bnb4`: `decoder_with_past_model_bnb4.onnx` (added)
### β
Based on `decoder_with_past_model.onnx` *with* slimming
β³ β
`fp16`: `decoder_with_past_model_fp16.onnx` (added)
β³ β
`int8`: `decoder_with_past_model_int8.onnx` (added)
β³ β
`uint8`: `decoder_with_past_model_uint8.onnx` (added)
β³ β
`q4`: `decoder_with_past_model_q4.onnx` (added)
β³ β
`q4f16`: `decoder_with_past_model_q4f16.onnx` (added)
β³ β
`bnb4`: `decoder_with_past_model_bnb4.onnx` (added)
### β Based on `decoder_model_merged.onnx` *with* slimming
```
0%| | 0/1 [00:00<?, ?it/s]
Processing /tmp/tmpa34nu2qx/decoder_model_merged.onnx: 0%| | 0/1 [00:00<?, ?it/s]
0%| | 0/6 [00:00<?, ?it/s][A
- Quantizing to fp16: 0%| | 0/6 [00:00<?, ?it/s][A/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.446366081405472e-09 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -3.1080171769559684e-09 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -7.953022418405453e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:85: UserWarning: the float32 number -3.4028234663852886e+38 will be truncated to -10000.0
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 7.836803916916324e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 5.362019805943419e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.657325819972357e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 2.9455724526172844e-09 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 1.700034957252683e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -9.196103434305769e-09 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 7.955793535074918e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -5.602445440899828e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.788588642805735e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 5.940266945003714e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -7.774109178626532e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -1.7319222722633754e-09 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 6.268315200230745e-09 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 8.321229216790016e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.1365843256025983e-09 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -1.698215967849137e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 9.575310144782634e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -8.730299327908142e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 5.570460714920955e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -8.719653266098248e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -6.365218752080182e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 1.1447780501327998e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 6.171065791704677e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -2.451006864134797e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 1.4866670561275441e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.4402241055886407e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -8.671668894066897e-08 will be truncated to -1e-07
warnings.warn(
- Quantizing to fp16: 0%| | 0/6 [00:04<?, ?it/s]
Processing /tmp/tmpa34nu2qx/decoder_model_merged.onnx: 0%| | 0/1 [00:04<?, ?it/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 377, in <module>
main()
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 374, in main
quantize(input_folder, output_folder, quantization_args)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 309, in quantize
quantize_fp16(
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 223, in quantize_fp16
check_and_save_model(model_fp16, save_path)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 29, in check_and_save_model
strict_check_model(model)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 21, in strict_check_model
raise e
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 16, in strict_check_model
onnx.checker.check_model(model_or_path, full_check=True)
File "/home/ubuntu/.cache/uv/archive-v0/7hYcxZ8pwavXeKpAYRaHY/lib/python3.12/site-packages/onnx/checker.py", line 179, in check_model
C.check_model(
onnx.onnx_cpp2py_export.shape_inference.InferenceError: [ShapeInferenceError] Inference error(s): (op_type:If, node name: optimum::if): [ShapeInferenceError] Inference error(s): (op_type:Add, node name: /model/decoder/embed_positions/Add): [ShapeInferenceError] Inferred shape and existing shape differ in rank: (1) vs (0)
```
### β
Based on `decoder_model_merged.onnx` *without* slimming
β³ β
`fp16`: `decoder_model_merged_fp16.onnx` (replaced because it was invalid)
β³ β
`int8`: `decoder_model_merged_int8.onnx` (added)
β³ β
`uint8`: `decoder_model_merged_uint8.onnx` (added)
β³ β
`q4`: `decoder_model_merged_q4.onnx` (added)
β³ β
`q4f16`: `decoder_model_merged_q4f16.onnx` (added)
β³ β
`bnb4`: `decoder_model_merged_bnb4.onnx` (added)
- onnx/decoder_model_bnb4.onnx +3 -0
- onnx/decoder_model_fp16.onnx +3 -0
- onnx/decoder_model_int8.onnx +3 -0
- onnx/decoder_model_merged_bnb4.onnx +3 -0
- onnx/decoder_model_merged_fp16.onnx +2 -2
- onnx/decoder_model_merged_int8.onnx +3 -0
- onnx/decoder_model_merged_q4.onnx +3 -0
- onnx/decoder_model_merged_q4f16.onnx +3 -0
- onnx/decoder_model_merged_uint8.onnx +3 -0
- onnx/decoder_model_q4.onnx +3 -0
- onnx/decoder_model_q4f16.onnx +3 -0
- onnx/decoder_model_uint8.onnx +3 -0
- onnx/decoder_with_past_model_bnb4.onnx +3 -0
- onnx/decoder_with_past_model_fp16.onnx +3 -0
- onnx/decoder_with_past_model_int8.onnx +3 -0
- onnx/decoder_with_past_model_q4.onnx +3 -0
- onnx/decoder_with_past_model_q4f16.onnx +3 -0
- onnx/decoder_with_past_model_uint8.onnx +3 -0
- onnx/encoder_model_bnb4.onnx +3 -0
- onnx/encoder_model_int8.onnx +3 -0
- onnx/encoder_model_q4.onnx +3 -0
- onnx/encoder_model_q4f16.onnx +3 -0
- onnx/encoder_model_uint8.onnx +3 -0
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9c531fb2d9d5dd2d782d6a8c6a35a8dc897a840817e292f2a54512a6538f2c5a
|
| 3 |
+
size 137496686
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2ad24186687eede810770f7912dfff5a759df58965f1a0ea629b21908f098fe3
|
| 3 |
+
size 112129664
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:41d2689cd22703a7e79b0cb10dc203560b1637f346fb8c047b0b5537c8769528
|
| 3 |
+
size 178171870
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2baec3a4bcbe46b00c649ebd0ae20ead199f758866f8ce8e960263ed8def4204
|
| 3 |
+
size 137754375
|
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b14bf9a027fe8644c61f8d6f200c755de7e945bc2d190d41b3c1fd59c91961e8
|
| 3 |
+
size 112557721
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bb817165dc950031733734a65604a519770cc0f033d41689a353cf2e4f70d220
|
| 3 |
+
size 57097617
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:edfa222a2fd06f656d376258743b2c1b434ded32e8966ecba0d9473706691f15
|
| 3 |
+
size 139326327
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:20067b2403b93389693c6da10200c637c0a16974693182df0c3183bd1f0013c4
|
| 3 |
+
size 76389272
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:99ceaddc3dd3e04d1f628fad1f4f42e381872ab472cc4fcb25f8ddae9d1e664d
|
| 3 |
+
size 57097649
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7dc58151ae7079fec0f4cb353f7dccb7724ffdae84ff4b2d0b1a809cc50479b1
|
| 3 |
+
size 139069070
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8deff515dd4907593a57b1ba54511e9ce3f2279c7884d289afb4e36e065da1c8
|
| 3 |
+
size 75962739
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:925707f810d8f07e70389949916bca3d17fc28f061082599c0e80a6e2abdbc76
|
| 3 |
+
size 178171902
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b9ea2817217a69a54204a4757e2d6b0ff94461e7cd5d6b3242f33684a9a14faf
|
| 3 |
+
size 135633914
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:19d46837d3518b83a8a1076474db2f7e11e067d4f85513cddc7aa1271e270823
|
| 3 |
+
size 105763712
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f93e2ac2b8f3fe1560e26ddada37c6fe4101d58a575ff043f8f72bdd00d238b2
|
| 3 |
+
size 174922809
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2174162dce7eccd70707e5816427e94af58547f4398e8333ef831b831ad9073c
|
| 3 |
+
size 137009786
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b7c60b54ccd7f140298b2193cde2a26890c012ae191af31be7f45ca58742a942
|
| 3 |
+
size 74116983
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7a285bbe47abcf2924e7b45fac18f5ee63d8feb459eeff4397f48773800f9dd8
|
| 3 |
+
size 174922834
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7be88f8b13dc393261b676b8b2e0e553a58c8d7e343c84c5e3dee71901b35399
|
| 3 |
+
size 133494717
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1f58d5eaa606886b5a4eafeee7617f01677ac929c2cb4015a363a951e123d93e
|
| 3 |
+
size 49833752
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8601287665e28f1eff0cef4bb239e03122e8bdf5420618ce5d2a44c8d265ccd2
|
| 3 |
+
size 134674077
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d1114bb2fde482d5ac36facef6d118f30992340c22a23ca95123de1f0a4b6454
|
| 3 |
+
size 72115233
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:46e7271e1741af9ea99c8627d0e4aad37287214d5efdadba0652a270ea19e772
|
| 3 |
+
size 49833772
|