ONNX export
This repo uses split environments:
- main env: latest training stack (
transformers5.x) - ONNX env: exporter-compatible stack from
training/requirements-export.txt
Reason: current optimum-onnx release still requires transformers <4.58, while training uses latest transformers 5.x.
Why split env is acceptable
For stable architectures like DistilBERT token classification, exporting from a separately loaded saved checkpoint is normal practice. Safety comes from validating exported ONNX outputs against PyTorch outputs.
Training
uv sync
python -m training.train_ner
Best checkpoint is saved under:
training/output/resume-ner/distilbert/best
ONNX export environment
uv venv .venv-onnx-export
uv pip install --python .venv-onnx-export/bin/python -r training/requirements-export.txt
source .venv-onnx-export/bin/activate
Export command
optimum-cli export onnx \
--model training/output/resume-ner/distilbert/best \
--task token-classification \
onnx/
This writes:
onnx/model.onnx- tokenizer/config files in
onnx/
Quantization
python - <<'PY'
from optimum.onnxruntime import ORTModelForTokenClassification, ORTQuantizer
from optimum.onnxruntime.configuration import AutoQuantizationConfig
model = ORTModelForTokenClassification.from_pretrained("onnx")
quantizer = ORTQuantizer.from_pretrained(model)
qconfig = AutoQuantizationConfig.arm64(is_static=False, per_channel=False)
quantizer.quantize(save_dir="onnx", quantization_config=qconfig)
PY
This writes:
onnx/model_quantized.onnx
Validation
Checked-in helper scripts:
python -m training.export_onnxpython -m training.validate_onnxpython -m training.quantize_onnxpython -m training.benchmark_structured --model-dir .— internal structured benchmark
Always compare PyTorch vs ONNX on same tokenized input.
Recommended checks:
- output shape match
np.allclose(..., rtol=1e-3, atol=1e-5)for non-quantized ONNXargmax_equal=True- quantized ONNX at least preserves argmax predictions on smoke test inputs
Minimal validation example:
import numpy as np
import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer
from optimum.onnxruntime import ORTModelForTokenClassification
model_dir = "training/output/resume-ner/distilbert/best"
text = "Rajesh Kumar worked at Infosys in Bangalore from April 2020 to Present using Python and AWS."
tokenizer = AutoTokenizer.from_pretrained(model_dir)
pt_model = AutoModelForTokenClassification.from_pretrained(model_dir)
pt_model.eval()
ort_model = ORTModelForTokenClassification.from_pretrained("onnx")
inputs_pt = {k: v for k, v in tokenizer(text, return_tensors="pt", truncation=True).items() if k in {"input_ids", "attention_mask"}}
inputs_np = {k: v for k, v in tokenizer(text, return_tensors="np", truncation=True).items() if k in {"input_ids", "attention_mask"}}
with torch.no_grad():
pt_logits = pt_model(**inputs_pt).logits.cpu().numpy()
ort_logits = ort_model(**inputs_np).logits
print(np.allclose(pt_logits, ort_logits, rtol=1e-3, atol=1e-5))
print(np.array_equal(pt_logits.argmax(-1), ort_logits.argmax(-1)))
Hugging Face artifact policy
Upload both:
Root
model.safetensorsconfig.json- tokenizer files
companies.jsonlabel_config.jsonresume_config.json
onnx/
model.onnxmodel_quantized.onnx- ONNX tokenizer/config files
This keeps repo usable for:
- Transformers users
- ONNX users
- future re-export or debugging