|
|
--- |
|
|
base_model: broadfield-dev/bert-small-ner-pii-tuned-12261022 |
|
|
library_name: transformers |
|
|
tags: |
|
|
- onnx |
|
|
- transformers |
|
|
- optimum |
|
|
- onnxruntime |
|
|
- token-classification |
|
|
- int8 |
|
|
- quantized |
|
|
- mobile |
|
|
language: en |
|
|
pipeline_tag: token-classification |
|
|
--- |
|
|
|
|
|
# ONNX Export: broadfield-dev/bert-small-ner-pii-tuned-12261022 |
|
|
|
|
|
This is a version of [broadfield-dev/bert-small-ner-pii-tuned-12261022](https://huggingface.co/broadfield-dev/bert-small-ner-pii-tuned-12261022) that has been converted to ONNX and optimized. |
|
|
|
|
|
## Model Details |
|
|
- **Base Model:** `broadfield-dev/bert-small-ner-pii-tuned-12261022` |
|
|
- **Task:** `token-classification` |
|
|
- **Opset Version:** `17` |
|
|
- **Optimization:** `INT8 - Optimized for Mobile (ARM64)` |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Installation |
|
|
```bash |
|
|
pip install onnxruntime transformers |
|
|
``` |
|
|
|
|
|
### Python Example |
|
|
```python |
|
|
from tokenizers import Tokenizer |
|
|
import onnxruntime as ort |
|
|
import numpy as np |
|
|
|
|
|
# 1. Load the lightweight tokenizer (No Transformers dependency needed) |
|
|
tokenizer = Tokenizer.from_pretrained("broadfield-dev/bert-small-ner-pii-tuned-12261022-onnx") |
|
|
|
|
|
# 2. Load the ONNX model |
|
|
session = ort.InferenceSession("model.onnx") |
|
|
|
|
|
# 3. Preprocess (Simple text encoding) |
|
|
text = "Run inference on mobile!" |
|
|
encoding = tokenizer.encode(text) |
|
|
|
|
|
# Prepare inputs (Exact names vary by model, usually input_ids + attention_mask) |
|
|
inputs = {{ |
|
|
"input_ids": np.array([encoding.ids], dtype=np.int64), |
|
|
"attention_mask": np.array([encoding.attention_mask], dtype=np.int64) |
|
|
}} |
|
|
|
|
|
# 4. Run Inference |
|
|
outputs = session.run(None, inputs) |
|
|
print("Output logits shape:", outputs[0].shape) |
|
|
|
|
|
``` |
|
|
|
|
|
## About this Export |
|
|
This model was exported using [Optimum](https://huggingface.co/docs/optimum/index) and `onnxruntime`. |
|
|
It includes the `INT8 - Optimized for Mobile (ARM64)` quantization settings. |
|
|
|