Somasundaram Ayyappan

Clean up training pipeline and add export benchmarks

750e1a2 9 days ago

3.59 kB

	# ONNX export

	This repo uses split environments:

	- main env: latest training stack (`transformers` 5.x)
	- ONNX env: exporter-compatible stack from `training/requirements-export.txt`

	Reason: current `optimum-onnx` release still requires `transformers <4.58`, while training uses latest `transformers 5.x`.

	## Why split env is acceptable

	For stable architectures like DistilBERT token classification, exporting from a separately loaded saved checkpoint is normal practice. Safety comes from validating exported ONNX outputs against PyTorch outputs.

	## Training

	```bash
	uv sync
	python -m training.train_ner
	```

	Best checkpoint is saved under:

	```text
	training/output/resume-ner/distilbert/best
	```

	## ONNX export environment

	```bash
	uv venv .venv-onnx-export
	uv pip install --python .venv-onnx-export/bin/python -r training/requirements-export.txt
	source .venv-onnx-export/bin/activate
	```

	## Export command

	```bash
	optimum-cli export onnx \
	--model training/output/resume-ner/distilbert/best \
	--task token-classification \
	onnx/
	```

	This writes:

	- `onnx/model.onnx`
	- tokenizer/config files in `onnx/`

	## Quantization

	```bash
	python - <<'PY'
	from optimum.onnxruntime import ORTModelForTokenClassification, ORTQuantizer
	from optimum.onnxruntime.configuration import AutoQuantizationConfig

	model = ORTModelForTokenClassification.from_pretrained("onnx")
	quantizer = ORTQuantizer.from_pretrained(model)
	qconfig = AutoQuantizationConfig.arm64(is_static=False, per_channel=False)
	quantizer.quantize(save_dir="onnx", quantization_config=qconfig)
	PY
	```

	This writes:

	- `onnx/model_quantized.onnx`

	## Validation

	Checked-in helper scripts:

	- `python -m training.export_onnx`
	- `python -m training.validate_onnx`
	- `python -m training.quantize_onnx`
	- `python -m training.benchmark_structured --model-dir .` — internal structured benchmark


	Always compare PyTorch vs ONNX on same tokenized input.

	Recommended checks:

	1. output shape match
	2. `np.allclose(..., rtol=1e-3, atol=1e-5)` for non-quantized ONNX
	3. `argmax_equal=True`
	4. quantized ONNX at least preserves argmax predictions on smoke test inputs

	Minimal validation example:

	```python
	import numpy as np
	import torch
	from transformers import AutoModelForTokenClassification, AutoTokenizer
	from optimum.onnxruntime import ORTModelForTokenClassification

	model_dir = "training/output/resume-ner/distilbert/best"
	text = "Rajesh Kumar worked at Infosys in Bangalore from April 2020 to Present using Python and AWS."

	tokenizer = AutoTokenizer.from_pretrained(model_dir)
	pt_model = AutoModelForTokenClassification.from_pretrained(model_dir)
	pt_model.eval()
	ort_model = ORTModelForTokenClassification.from_pretrained("onnx")

	inputs_pt = {k: v for k, v in tokenizer(text, return_tensors="pt", truncation=True).items() if k in {"input_ids", "attention_mask"}}
	inputs_np = {k: v for k, v in tokenizer(text, return_tensors="np", truncation=True).items() if k in {"input_ids", "attention_mask"}}

	with torch.no_grad():
	pt_logits = pt_model(**inputs_pt).logits.cpu().numpy()
	ort_logits = ort_model(**inputs_np).logits

	print(np.allclose(pt_logits, ort_logits, rtol=1e-3, atol=1e-5))
	print(np.array_equal(pt_logits.argmax(-1), ort_logits.argmax(-1)))
	```

	## Hugging Face artifact policy

	Upload both:

	### Root
	- `model.safetensors`
	- `config.json`
	- tokenizer files
	- `companies.json`
	- `label_config.json`
	- `resume_config.json`

	### `onnx/`
	- `model.onnx`
	- `model_quantized.onnx`
	- ONNX tokenizer/config files

	This keeps repo usable for:
	- Transformers users
	- ONNX users
	- future re-export or debugging