resume-ner / docs /export.md
Somasundaram Ayyappan
Clean up training pipeline and add export benchmarks
750e1a2
# ONNX export
This repo uses **split environments**:
- main env: latest training stack (`transformers` 5.x)
- ONNX env: exporter-compatible stack from `training/requirements-export.txt`
Reason: current `optimum-onnx` release still requires `transformers <4.58`, while training uses latest `transformers 5.x`.
## Why split env is acceptable
For stable architectures like DistilBERT token classification, exporting from a separately loaded saved checkpoint is normal practice. Safety comes from validating exported ONNX outputs against PyTorch outputs.
## Training
```bash
uv sync
python -m training.train_ner
```
Best checkpoint is saved under:
```text
training/output/resume-ner/distilbert/best
```
## ONNX export environment
```bash
uv venv .venv-onnx-export
uv pip install --python .venv-onnx-export/bin/python -r training/requirements-export.txt
source .venv-onnx-export/bin/activate
```
## Export command
```bash
optimum-cli export onnx \
--model training/output/resume-ner/distilbert/best \
--task token-classification \
onnx/
```
This writes:
- `onnx/model.onnx`
- tokenizer/config files in `onnx/`
## Quantization
```bash
python - <<'PY'
from optimum.onnxruntime import ORTModelForTokenClassification, ORTQuantizer
from optimum.onnxruntime.configuration import AutoQuantizationConfig
model = ORTModelForTokenClassification.from_pretrained("onnx")
quantizer = ORTQuantizer.from_pretrained(model)
qconfig = AutoQuantizationConfig.arm64(is_static=False, per_channel=False)
quantizer.quantize(save_dir="onnx", quantization_config=qconfig)
PY
```
This writes:
- `onnx/model_quantized.onnx`
## Validation
Checked-in helper scripts:
- `python -m training.export_onnx`
- `python -m training.validate_onnx`
- `python -m training.quantize_onnx`
- `python -m training.benchmark_structured --model-dir .` — internal structured benchmark
Always compare PyTorch vs ONNX on same tokenized input.
Recommended checks:
1. output shape match
2. `np.allclose(..., rtol=1e-3, atol=1e-5)` for non-quantized ONNX
3. `argmax_equal=True`
4. quantized ONNX at least preserves argmax predictions on smoke test inputs
Minimal validation example:
```python
import numpy as np
import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer
from optimum.onnxruntime import ORTModelForTokenClassification
model_dir = "training/output/resume-ner/distilbert/best"
text = "Rajesh Kumar worked at Infosys in Bangalore from April 2020 to Present using Python and AWS."
tokenizer = AutoTokenizer.from_pretrained(model_dir)
pt_model = AutoModelForTokenClassification.from_pretrained(model_dir)
pt_model.eval()
ort_model = ORTModelForTokenClassification.from_pretrained("onnx")
inputs_pt = {k: v for k, v in tokenizer(text, return_tensors="pt", truncation=True).items() if k in {"input_ids", "attention_mask"}}
inputs_np = {k: v for k, v in tokenizer(text, return_tensors="np", truncation=True).items() if k in {"input_ids", "attention_mask"}}
with torch.no_grad():
pt_logits = pt_model(**inputs_pt).logits.cpu().numpy()
ort_logits = ort_model(**inputs_np).logits
print(np.allclose(pt_logits, ort_logits, rtol=1e-3, atol=1e-5))
print(np.array_equal(pt_logits.argmax(-1), ort_logits.argmax(-1)))
```
## Hugging Face artifact policy
Upload both:
### Root
- `model.safetensors`
- `config.json`
- tokenizer files
- `companies.json`
- `label_config.json`
- `resume_config.json`
### `onnx/`
- `model.onnx`
- `model_quantized.onnx`
- ONNX tokenizer/config files
This keeps repo usable for:
- Transformers users
- ONNX users
- future re-export or debugging