Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -196,3 +196,9 @@ fn main() -> ort::Result<()> {
|
|
| 196 |
- **ONNX Export:** Exported with `torch.onnx.export(dynamo=True)` + `onnxscript`. The old JIT tracer (`dynamo=False`) is incompatible with the new masking utilities in `transformers >= 5.x`.
|
| 197 |
- **INT8 Quantization:** Symmetric per-tensor quantization applied directly to the ONNX graph initializers (numpy-based). PyTorch `quantize_dynamic` models are not exportable via the dynamo exporter (`LinearPackedParamsBase` is not serializable by `torch.export`).
|
| 198 |
- **ONNX Architecture:** To overcome issues with ByT5 relative positional embeddings dynamically broadcasting at runtime, the model is exported as a **separated Encoder and Decoder**. The Decoder expects a fixed-length sequence of 16, which is updated sequentially using a padding mask during the autoregressive loop (see Python and Rust examples above).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 196 |
- **ONNX Export:** Exported with `torch.onnx.export(dynamo=True)` + `onnxscript`. The old JIT tracer (`dynamo=False`) is incompatible with the new masking utilities in `transformers >= 5.x`.
|
| 197 |
- **INT8 Quantization:** Symmetric per-tensor quantization applied directly to the ONNX graph initializers (numpy-based). PyTorch `quantize_dynamic` models are not exportable via the dynamo exporter (`LinearPackedParamsBase` is not serializable by `torch.export`).
|
| 198 |
- **ONNX Architecture:** To overcome issues with ByT5 relative positional embeddings dynamically broadcasting at runtime, the model is exported as a **separated Encoder and Decoder**. The Decoder expects a fixed-length sequence of 16, which is updated sequentially using a padding mask during the autoregressive loop (see Python and Rust examples above).
|
| 199 |
+
|
| 200 |
+
## Author & Contact
|
| 201 |
+
|
| 202 |
+
- **Author:** Dario Finardi
|
| 203 |
+
- **Company:** [Semplifica](https://semplifica.ai)
|
| 204 |
+
- **Email:** hf@semplifica.ai
|