| # ONNX export |
|
|
| This repo uses **split environments**: |
|
|
| - main env: latest training stack (`transformers` 5.x) |
| - ONNX env: exporter-compatible stack from `training/requirements-export.txt` |
|
|
| Reason: current `optimum-onnx` release still requires `transformers <4.58`, while training uses latest `transformers 5.x`. |
|
|
| ## Why split env is acceptable |
|
|
| For stable architectures like DistilBERT token classification, exporting from a separately loaded saved checkpoint is normal practice. Safety comes from validating exported ONNX outputs against PyTorch outputs. |
|
|
| ## Training |
|
|
| ```bash |
| uv sync |
| python -m training.train_ner |
| ``` |
|
|
| Best checkpoint is saved under: |
|
|
| ```text |
| training/output/resume-ner/distilbert/best |
| ``` |
|
|
| ## ONNX export environment |
|
|
| ```bash |
| uv venv .venv-onnx-export |
| uv pip install --python .venv-onnx-export/bin/python -r training/requirements-export.txt |
| source .venv-onnx-export/bin/activate |
| ``` |
|
|
| ## Export command |
|
|
| ```bash |
| optimum-cli export onnx \ |
| --model training/output/resume-ner/distilbert/best \ |
| --task token-classification \ |
| onnx/ |
| ``` |
|
|
| This writes: |
|
|
| - `onnx/model.onnx` |
| - tokenizer/config files in `onnx/` |
|
|
| ## Quantization |
|
|
| ```bash |
| python - <<'PY' |
| from optimum.onnxruntime import ORTModelForTokenClassification, ORTQuantizer |
| from optimum.onnxruntime.configuration import AutoQuantizationConfig |
| |
| model = ORTModelForTokenClassification.from_pretrained("onnx") |
| quantizer = ORTQuantizer.from_pretrained(model) |
| qconfig = AutoQuantizationConfig.arm64(is_static=False, per_channel=False) |
| quantizer.quantize(save_dir="onnx", quantization_config=qconfig) |
| PY |
| ``` |
|
|
| This writes: |
|
|
| - `onnx/model_quantized.onnx` |
|
|
| ## Validation |
|
|
| Checked-in helper scripts: |
|
|
| - `python -m training.export_onnx` |
| - `python -m training.validate_onnx` |
| - `python -m training.quantize_onnx` |
| - `python -m training.benchmark_structured --model-dir .` — internal structured benchmark |
|
|
|
|
| Always compare PyTorch vs ONNX on same tokenized input. |
|
|
| Recommended checks: |
|
|
| 1. output shape match |
| 2. `np.allclose(..., rtol=1e-3, atol=1e-5)` for non-quantized ONNX |
| 3. `argmax_equal=True` |
| 4. quantized ONNX at least preserves argmax predictions on smoke test inputs |
|
|
| Minimal validation example: |
|
|
| ```python |
| import numpy as np |
| import torch |
| from transformers import AutoModelForTokenClassification, AutoTokenizer |
| from optimum.onnxruntime import ORTModelForTokenClassification |
| |
| model_dir = "training/output/resume-ner/distilbert/best" |
| text = "Rajesh Kumar worked at Infosys in Bangalore from April 2020 to Present using Python and AWS." |
| |
| tokenizer = AutoTokenizer.from_pretrained(model_dir) |
| pt_model = AutoModelForTokenClassification.from_pretrained(model_dir) |
| pt_model.eval() |
| ort_model = ORTModelForTokenClassification.from_pretrained("onnx") |
| |
| inputs_pt = {k: v for k, v in tokenizer(text, return_tensors="pt", truncation=True).items() if k in {"input_ids", "attention_mask"}} |
| inputs_np = {k: v for k, v in tokenizer(text, return_tensors="np", truncation=True).items() if k in {"input_ids", "attention_mask"}} |
| |
| with torch.no_grad(): |
| pt_logits = pt_model(**inputs_pt).logits.cpu().numpy() |
| ort_logits = ort_model(**inputs_np).logits |
| |
| print(np.allclose(pt_logits, ort_logits, rtol=1e-3, atol=1e-5)) |
| print(np.array_equal(pt_logits.argmax(-1), ort_logits.argmax(-1))) |
| ``` |
|
|
| ## Hugging Face artifact policy |
|
|
| Upload both: |
|
|
| ### Root |
| - `model.safetensors` |
| - `config.json` |
| - tokenizer files |
| - `companies.json` |
| - `label_config.json` |
| - `resume_config.json` |
|
|
| ### `onnx/` |
| - `model.onnx` |
| - `model_quantized.onnx` |
| - ONNX tokenizer/config files |
|
|
| This keeps repo usable for: |
| - Transformers users |
| - ONNX users |
| - future re-export or debugging |
|
|