Add/update the quantized ONNX model files and README.md for Transformers.js v3
Browse files## Applied Quantizations
### β
Based on `decoder_model.onnx` *with* slimming
β³ β
`fp16`: `decoder_model_fp16.onnx` (added)
β³ β
`int8`: `decoder_model_int8.onnx` (added)
β³ β
`uint8`: `decoder_model_uint8.onnx` (added)
β³ β
`q4`: `decoder_model_q4.onnx` (added)
β³ β
`q4f16`: `decoder_model_q4f16.onnx` (added)
β³ β
`bnb4`: `decoder_model_bnb4.onnx` (added)
### β
Based on `decoder_model.onnx` *with* slimming
β³ β
`fp16`: `decoder_model_fp16.onnx` (added)
β³ β
`int8`: `decoder_model_int8.onnx` (added)
β³ β
`uint8`: `decoder_model_uint8.onnx` (added)
β³ β
`q4`: `decoder_model_q4.onnx` (added)
β³ β
`q4f16`: `decoder_model_q4f16.onnx` (added)
β³ β
`bnb4`: `decoder_model_bnb4.onnx` (added)
### β
Based on `encoder_model.onnx` *with* slimming
β³ β
`int8`: `encoder_model_int8.onnx` (added)
β³ β
`uint8`: `encoder_model_uint8.onnx` (added)
β³ β
`q4`: `encoder_model_q4.onnx` (added)
β³ β
`q4f16`: `encoder_model_q4f16.onnx` (added)
β³ β
`bnb4`: `encoder_model_bnb4.onnx` (added)
### β
Based on `encoder_model.onnx` *with* slimming
β³ β
`int8`: `encoder_model_int8.onnx` (added)
β³ β
`uint8`: `encoder_model_uint8.onnx` (added)
β³ β
`q4`: `encoder_model_q4.onnx` (added)
β³ β
`q4f16`: `encoder_model_q4f16.onnx` (added)
β³ β
`bnb4`: `encoder_model_bnb4.onnx` (added)
### β
Based on `decoder_with_past_model.onnx` *with* slimming
β³ β
`fp16`: `decoder_with_past_model_fp16.onnx` (added)
β³ β
`int8`: `decoder_with_past_model_int8.onnx` (added)
β³ β
`uint8`: `decoder_with_past_model_uint8.onnx` (added)
β³ β
`q4`: `decoder_with_past_model_q4.onnx` (added)
β³ β
`q4f16`: `decoder_with_past_model_q4f16.onnx` (added)
β³ β
`bnb4`: `decoder_with_past_model_bnb4.onnx` (added)
### β
Based on `decoder_with_past_model.onnx` *with* slimming
β³ β
`fp16`: `decoder_with_past_model_fp16.onnx` (added)
β³ β
`int8`: `decoder_with_past_model_int8.onnx` (added)
β³ β
`uint8`: `decoder_with_past_model_uint8.onnx` (added)
β³ β
`q4`: `decoder_with_past_model_q4.onnx` (added)
β³ β
`q4f16`: `decoder_with_past_model_q4f16.onnx` (added)
β³ β
`bnb4`: `decoder_with_past_model_bnb4.onnx` (added)
### β Based on `decoder_model_merged.onnx` *with* slimming
```
0%| | 0/1 [00:00<?, ?it/s]
Processing /tmp/tmp59xkksu4/decoder_model_merged.onnx: 0%| | 0/1 [00:00<?, ?it/s]
0%| | 0/6 [00:00<?, ?it/s][A
- Quantizing to fp16: 0%| | 0/6 [00:00<?, ?it/s][A/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 1.6495648180736566e-09 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -1.3363538053923207e-09 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -1.7515723982342024e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 4.4199801152444707e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -4.350173909983823e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:85: UserWarning: the float32 number -3.4028234663852886e+38 will be truncated to -10000.0
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 8.183863542399195e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 4.766229011465839e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 4.0666400025202165e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -8.20791115074826e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -6.626061122005922e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -1.7874123514616258e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 5.131577651695807e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -2.5224631272635634e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 4.904357808754867e-09 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 5.929094015755254e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -9.692475799738531e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -6.248178152645778e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.684704763529112e-09 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 7.506191224138092e-09 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -3.7239978212255664e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 6.604170721402625e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 9.33914350298437e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -4.449031720810126e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 9.942363021764322e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -3.938788140089855e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 9.286748081649421e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -4.0659259070707776e-08 will be truncated to -1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.205246557058672e-08 will be truncated to 1e-07
warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -6.007522301842982e-08 will be truncated to -1e-07
warnings.warn(
- Quantizing to fp16: 0%| | 0/6 [00:04<?, ?it/s]
Processing /tmp/tmp59xkksu4/decoder_model_merged.onnx: 0%| | 0/1 [00:04<?, ?it/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 377, in <module>
main()
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 374, in main
quantize(input_folder, output_folder, quantization_args)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 309, in quantize
quantize_fp16(
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 223, in quantize_fp16
check_and_save_model(model_fp16, save_path)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 29, in check_and_save_model
strict_check_model(model)
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 21, in strict_check_model
raise e
File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 16, in strict_check_model
onnx.checker.check_model(model_or_path, full_check=True)
File "/home/ubuntu/.cache/uv/archive-v0/7hYcxZ8pwavXeKpAYRaHY/lib/python3.12/site-packages/onnx/checker.py", line 179, in check_model
C.check_model(
onnx.onnx_cpp2py_export.shape_inference.InferenceError: [ShapeInferenceError] Inference error(s): (op_type:If, node name: optimum::if): [ShapeInferenceError] Inference error(s): (op_type:Add, node name: /model/decoder/embed_positions/Add): [ShapeInferenceError] Inferred shape and existing shape differ in rank: (1) vs (0)
```
### β
Based on `decoder_model_merged.onnx` *without* slimming
β³ β
`fp16`: `decoder_model_merged_fp16.onnx` (replaced because it was invalid)
β³ β
`int8`: `decoder_model_merged_int8.onnx` (added)
β³ β
`uint8`: `decoder_model_merged_uint8.onnx` (added)
β³ β
`q4`: `decoder_model_merged_q4.onnx` (added)
β³ β
`q4f16`: `decoder_model_merged_q4f16.onnx` (added)
β³ β
`bnb4`: `decoder_model_merged_bnb4.onnx` (added)
- onnx/decoder_model_bnb4.onnx +3 -0
- onnx/decoder_model_fp16.onnx +3 -0
- onnx/decoder_model_int8.onnx +3 -0
- onnx/decoder_model_merged_bnb4.onnx +3 -0
- onnx/decoder_model_merged_fp16.onnx +2 -2
- onnx/decoder_model_merged_int8.onnx +3 -0
- onnx/decoder_model_merged_q4.onnx +3 -0
- onnx/decoder_model_merged_q4f16.onnx +3 -0
- onnx/decoder_model_merged_uint8.onnx +3 -0
- onnx/decoder_model_q4.onnx +3 -0
- onnx/decoder_model_q4f16.onnx +3 -0
- onnx/decoder_model_uint8.onnx +3 -0
- onnx/decoder_with_past_model_bnb4.onnx +3 -0
- onnx/decoder_with_past_model_fp16.onnx +3 -0
- onnx/decoder_with_past_model_int8.onnx +3 -0
- onnx/decoder_with_past_model_q4.onnx +3 -0
- onnx/decoder_with_past_model_q4f16.onnx +3 -0
- onnx/decoder_with_past_model_uint8.onnx +3 -0
- onnx/encoder_model_bnb4.onnx +3 -0
- onnx/encoder_model_int8.onnx +3 -0
- onnx/encoder_model_q4.onnx +3 -0
- onnx/encoder_model_q4f16.onnx +3 -0
- onnx/encoder_model_uint8.onnx +3 -0
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a808360400e5f76f3de53221d1150206b65a2137ec73d501d534ebc140e942d8
|
| 3 |
+
size 143562398
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1a09d9745712995816e7914a68702b32a8f614b9395b907f823b75b5a3f8725e
|
| 3 |
+
size 115162520
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:19c53f9c92b0dc0782b0c3355712d05a41594e70b205de1ea905c2783995c20c
|
| 3 |
+
size 185751054
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f3759082abff98e02500eb0322f6136588924933655cd5708afa5838034f6460
|
| 3 |
+
size 143820087
|
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:21118338890d113b1c0f13cbddc10851667ba8ea61d8f1c8560505e7bf596b11
|
| 3 |
+
size 115590577
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3951f7c80286b6fbfa932665c018653bbe8c554f2cbd33011d3b36dead64bbd4
|
| 3 |
+
size 58622913
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8fd202a712a6b43b1dd8bf24115cc007144eaa8ad1d23bf52c31cc2998bc4412
|
| 3 |
+
size 145392039
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8498f247bd7fc33d36fcc028c8b3f426a797574467fac560b400fac0ce506dc8
|
| 3 |
+
size 79422128
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4f5cc5984d89a5161483d1c85238648944ffb45c71f625b9e743b3b5abd665ef
|
| 3 |
+
size 58622938
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a35f6899a7522f7b707c00e30efd71a920e0ec1ba84c1489730b3c7dd4e0ce2b
|
| 3 |
+
size 145134782
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fffc98331e67e1b57110f6c610ed2552b4b10bfda5fc8b32ac7bd3a34ac3c2cb
|
| 3 |
+
size 78995595
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5043dce35f8e699a922e51b3ef88a13ecc008b732e266a5d631d7e8adb8003ae
|
| 3 |
+
size 185751079
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d606ae777de8b76d1c052e47d797b24985e2070e72e9ae12db6758641ca1c4c3
|
| 3 |
+
size 141699626
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:080a2e45db1f7635d08d48a0a18383af03a10bda3485e3a224032dd5a8accf79
|
| 3 |
+
size 108796568
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ae8d4c0632f683a6442572b020661f9ae0800fed5487dae0c45ef8a116e9066c
|
| 3 |
+
size 182501993
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8d6b00255a2d978a366618512ac8c83c99c4740e33b4df77f62019edf4034bc9
|
| 3 |
+
size 143075498
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c55879770611111312cba620b3a98640ae947120b0d646382229d2af69a6c5da
|
| 3 |
+
size 77149839
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6475b98fef623da8623492ebd8296a85fe1b5c29c6c1bfc5c508ba153b612340
|
| 3 |
+
size 182502016
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:399e7a485834e1287ef27dff2e5f9ab57e524aa5c97a5cc915e1da86b4146f15
|
| 3 |
+
size 139548605
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c576f57d3d4ccbbb15a4f27068f160edf0e421ca597667771e933be99f269b63
|
| 3 |
+
size 51347224
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5f556893d1f0e0e1fb7fc683516e4f2639ffad2ce001ec616e8db57a1e5282a2
|
| 3 |
+
size 140727965
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:94babbc030b48d75eced0ea03aacf0e94ea47e0457cc901a1980fa93134b0298
|
| 3 |
+
size 75142177
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:94e3ac7206f1826d703839eed70beb5ce965b063f9d89e0dc4993874fb8e8f78
|
| 3 |
+
size 51347243
|