Add/update the quantized ONNX model files and README.md for Transformers.js v3

#1
by whitphx - opened

Applied Quantizations

✅ Based on decoder_model.onnx with slimming

↳ ✅ fp16: decoder_model_fp16.onnx (added)
↳ ✅ int8: decoder_model_int8.onnx (added)
↳ ✅ uint8: decoder_model_uint8.onnx (added)
↳ ✅ q4: decoder_model_q4.onnx (added)
↳ ✅ q4f16: decoder_model_q4f16.onnx (added)
↳ ✅ bnb4: decoder_model_bnb4.onnx (added)

✅ Based on decoder_model.onnx with slimming

↳ ✅ fp16: decoder_model_fp16.onnx (added)
↳ ✅ int8: decoder_model_int8.onnx (added)
↳ ✅ uint8: decoder_model_uint8.onnx (added)
↳ ✅ q4: decoder_model_q4.onnx (added)
↳ ✅ q4f16: decoder_model_q4f16.onnx (added)
↳ ✅ bnb4: decoder_model_bnb4.onnx (added)

✅ Based on encoder_model.onnx with slimming

↳ ✅ int8: encoder_model_int8.onnx (added)
↳ ✅ uint8: encoder_model_uint8.onnx (added)
↳ ✅ q4: encoder_model_q4.onnx (added)
↳ ✅ q4f16: encoder_model_q4f16.onnx (added)
↳ ✅ bnb4: encoder_model_bnb4.onnx (added)

✅ Based on encoder_model.onnx with slimming

↳ ✅ int8: encoder_model_int8.onnx (added)
↳ ✅ uint8: encoder_model_uint8.onnx (added)
↳ ✅ q4: encoder_model_q4.onnx (added)
↳ ✅ q4f16: encoder_model_q4f16.onnx (added)
↳ ✅ bnb4: encoder_model_bnb4.onnx (added)

✅ Based on decoder_with_past_model.onnx with slimming

↳ ✅ fp16: decoder_with_past_model_fp16.onnx (added)
↳ ✅ int8: decoder_with_past_model_int8.onnx (added)
↳ ✅ uint8: decoder_with_past_model_uint8.onnx (added)
↳ ✅ q4: decoder_with_past_model_q4.onnx (added)
↳ ✅ q4f16: decoder_with_past_model_q4f16.onnx (added)
↳ ✅ bnb4: decoder_with_past_model_bnb4.onnx (added)

✅ Based on decoder_with_past_model.onnx with slimming

↳ ✅ fp16: decoder_with_past_model_fp16.onnx (added)
↳ ✅ int8: decoder_with_past_model_int8.onnx (added)
↳ ✅ uint8: decoder_with_past_model_uint8.onnx (added)
↳ ✅ q4: decoder_with_past_model_q4.onnx (added)
↳ ✅ q4f16: decoder_with_past_model_q4f16.onnx (added)
↳ ✅ bnb4: decoder_with_past_model_bnb4.onnx (added)

❌ Based on decoder_model_merged.onnx with slimming

0%|          | 0/1 [00:00<?, ?it/s]
Processing /tmp/tmpqopnw9c8/decoder_model_merged.onnx:   0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/6 [00:00<?, ?it/s]

 - Quantizing to fp16:   0%|          | 0/6 [00:00<?, ?it/s]/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 6.1294223030472494e-09 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -5.751064069414724e-09 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -9.619665775062458e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:85: UserWarning: the float32 number -3.4028234663852886e+38 will be truncated to -10000.0
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -1.2204069754773172e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -3.845694607207406e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 5.351443022050262e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -1.2672708216143747e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 1.4000772097233494e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -2.3795612591470672e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 5.06032336033968e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 8.698530251649572e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 4.554648569410347e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -4.152644805799355e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.445883933750338e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -4.296316191698679e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -8.630758685512774e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.635313916561245e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 1.0497694269417934e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.5891036809232446e-09 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 4.3217813328055854e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 7.715105709849013e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -2.0817747525825325e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 5.714840511927832e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 5.580512763003753e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -3.0644184079164916e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -5.232065092286575e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 5.14776914428694e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 5.816076509290724e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 3.312242924380371e-09 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 8.488490266245208e-08 will be truncated to 1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -5.631318522603124e-09 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -2.870932469534182e-08 will be truncated to -1e-07
  warnings.warn(
/home/ubuntu/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 1.8263493828385435e-08 will be truncated to 1e-07
  warnings.warn(

 - Quantizing to fp16:   0%|          | 0/6 [00:04<?, ?it/s]

Processing /tmp/tmpqopnw9c8/decoder_model_merged.onnx:   0%|          | 0/1 [00:04<?, ?it/s]
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 377, in <module>
    main()
  File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 374, in main
    quantize(input_folder, output_folder, quantization_args)
  File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 309, in quantize
    quantize_fp16(
  File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/quantize.py", line 223, in quantize_fp16
    check_and_save_model(model_fp16, save_path)
  File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 29, in check_and_save_model
    strict_check_model(model)
  File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 21, in strict_check_model
    raise e
  File "/home/ubuntu/src/tjsmigration/transformers.js/scripts/utils.py", line 16, in strict_check_model
    onnx.checker.check_model(model_or_path, full_check=True)
  File "/home/ubuntu/.cache/uv/archive-v0/7hYcxZ8pwavXeKpAYRaHY/lib/python3.12/site-packages/onnx/checker.py", line 179, in check_model
    C.check_model(
onnx.onnx_cpp2py_export.shape_inference.InferenceError: [ShapeInferenceError] Inference error(s): (op_type:If, node name: optimum::if): [ShapeInferenceError] Inference error(s): (op_type:Add, node name: /model/decoder/embed_positions/Add): [ShapeInferenceError] Inferred shape and existing shape differ in rank: (1) vs (0)

✅ Based on decoder_model_merged.onnx without slimming

↳ ✅ fp16: decoder_model_merged_fp16.onnx (replaced because it was invalid)
↳ ✅ int8: decoder_model_merged_int8.onnx (added)
↳ ✅ uint8: decoder_model_merged_uint8.onnx (added)
↳ ✅ q4: decoder_model_merged_q4.onnx (added)
↳ ✅ q4f16: decoder_model_merged_q4f16.onnx (added)
↳ ✅ bnb4: decoder_model_merged_bnb4.onnx (added)

Xenova changed pull request status to merged

Sign up or log in to comment