resume-ner / docs /export.md
Somasundaram Ayyappan
Clean up training pipeline and add export benchmarks
750e1a2

ONNX export

This repo uses split environments:

  • main env: latest training stack (transformers 5.x)
  • ONNX env: exporter-compatible stack from training/requirements-export.txt

Reason: current optimum-onnx release still requires transformers <4.58, while training uses latest transformers 5.x.

Why split env is acceptable

For stable architectures like DistilBERT token classification, exporting from a separately loaded saved checkpoint is normal practice. Safety comes from validating exported ONNX outputs against PyTorch outputs.

Training

uv sync
python -m training.train_ner

Best checkpoint is saved under:

training/output/resume-ner/distilbert/best

ONNX export environment

uv venv .venv-onnx-export
uv pip install --python .venv-onnx-export/bin/python -r training/requirements-export.txt
source .venv-onnx-export/bin/activate

Export command

optimum-cli export onnx \
  --model training/output/resume-ner/distilbert/best \
  --task token-classification \
  onnx/

This writes:

  • onnx/model.onnx
  • tokenizer/config files in onnx/

Quantization

python - <<'PY'
from optimum.onnxruntime import ORTModelForTokenClassification, ORTQuantizer
from optimum.onnxruntime.configuration import AutoQuantizationConfig

model = ORTModelForTokenClassification.from_pretrained("onnx")
quantizer = ORTQuantizer.from_pretrained(model)
qconfig = AutoQuantizationConfig.arm64(is_static=False, per_channel=False)
quantizer.quantize(save_dir="onnx", quantization_config=qconfig)
PY

This writes:

  • onnx/model_quantized.onnx

Validation

Checked-in helper scripts:

  • python -m training.export_onnx
  • python -m training.validate_onnx
  • python -m training.quantize_onnx
  • python -m training.benchmark_structured --model-dir . — internal structured benchmark

Always compare PyTorch vs ONNX on same tokenized input.

Recommended checks:

  1. output shape match
  2. np.allclose(..., rtol=1e-3, atol=1e-5) for non-quantized ONNX
  3. argmax_equal=True
  4. quantized ONNX at least preserves argmax predictions on smoke test inputs

Minimal validation example:

import numpy as np
import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer
from optimum.onnxruntime import ORTModelForTokenClassification

model_dir = "training/output/resume-ner/distilbert/best"
text = "Rajesh Kumar worked at Infosys in Bangalore from April 2020 to Present using Python and AWS."

tokenizer = AutoTokenizer.from_pretrained(model_dir)
pt_model = AutoModelForTokenClassification.from_pretrained(model_dir)
pt_model.eval()
ort_model = ORTModelForTokenClassification.from_pretrained("onnx")

inputs_pt = {k: v for k, v in tokenizer(text, return_tensors="pt", truncation=True).items() if k in {"input_ids", "attention_mask"}}
inputs_np = {k: v for k, v in tokenizer(text, return_tensors="np", truncation=True).items() if k in {"input_ids", "attention_mask"}}

with torch.no_grad():
    pt_logits = pt_model(**inputs_pt).logits.cpu().numpy()
ort_logits = ort_model(**inputs_np).logits

print(np.allclose(pt_logits, ort_logits, rtol=1e-3, atol=1e-5))
print(np.array_equal(pt_logits.argmax(-1), ort_logits.argmax(-1)))

Hugging Face artifact policy

Upload both:

Root

  • model.safetensors
  • config.json
  • tokenizer files
  • companies.json
  • label_config.json
  • resume_config.json

onnx/

  • model.onnx
  • model_quantized.onnx
  • ONNX tokenizer/config files

This keeps repo usable for:

  • Transformers users
  • ONNX users
  • future re-export or debugging