vivekkaushal commited on
Commit
5d0f073
Β·
verified Β·
1 Parent(s): 186406f

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. README.md +100 -90
  2. invoice_classifier_int8_qdq.onnx +2 -2
  3. sha256.txt +1 -1
README.md CHANGED
@@ -8,18 +8,17 @@ tags:
8
  - mobile
9
  - on-device
10
  - document-classification
11
- - quantized
12
- - int8
13
- - qdq
14
  library_name: onnx
15
  pipeline_tag: image-classification
16
  metrics:
17
  - accuracy
 
18
  base_model: timm/mobilenetv3_small_100.lamb_in1k
19
  datasets: []
20
  ---
21
 
22
- # tally-ocr-document-classifier
23
 
24
  A small on-device document classifier that sorts a single image into one of:
25
 
@@ -27,17 +26,20 @@ A small on-device document classifier that sorts a single image into one of:
27
  - `invoice`
28
  - `other`
29
 
30
- It is the first-stage triage model in the [Tally OCR](https://github.com/) Flutter
31
- app β€” every uploaded or scanned page hits this model before any OCR or
32
- downstream extraction is attempted, so it has to be **fast, small, and run
33
- fully offline**.
34
 
35
- The repo ships two artifacts:
36
 
37
- | File | Format | Size | Use |
38
- |------|--------|-----:|------|
39
- | `invoice_classifier_int8_qdq.onnx` | ONNX, **QDQ static int8** | ~1.7 MB | **Ship this on-device.** Runs on ONNX Runtime Mobile. |
40
- | `invoice_classifier_fp32.onnx` | ONNX, fp32 | ~5.8 MB | Reference / desktop / accuracy comparisons. |
 
 
 
 
 
41
 
42
  ## Model details
43
 
@@ -57,50 +59,32 @@ The repo ships two artifacts:
57
  ```
58
 
59
  - **Opset**: 18.
60
- - **Quantization**: static, **QDQ format**, per-channel,
61
- `QuantType.QUInt8` activations / `QuantType.QInt8` weights, calibrated
62
- on ~200 in-domain images.
63
-
64
- ### Why QDQ?
65
-
66
- ONNX Runtime Mobile (the kernel set used by the
67
- [`onnxruntime` Flutter package](https://pub.dev/packages/onnxruntime))
68
- does **not** include `ConvInteger` / `MatMulInteger` operators. A model
69
- quantized with `QuantFormat.QOperator` or `quantize_dynamic` will load
70
- fine on desktop ORT and then fail at runtime on mobile with
71
- `code=9 (NOT_IMPLEMENTED)`. QDQ keeps the original `Conv` / `MatMul`
72
- nodes and surrounds them with `QuantizeLinear` / `DequantizeLinear`,
73
- which is the path ORT Mobile actually executes. Use the QDQ build for
74
- any phone deployment.
75
 
76
  ## Intended use
77
 
78
- - Triage page on whether an uploaded document is worth running heavyweight
79
- invoice / statement extraction on.
80
- - Lightweight client-side filtering before backend OCR to save round-trips.
81
 
82
  ### Out of scope
83
 
84
- - **Not an OCR model** β€” it doesn't extract text, totals, dates, or
85
- account numbers. Pair it with a downstream OCR stage.
86
  - **Not a fraud / authenticity detector.**
87
- - **Not a layout analyzer.** It looks at the page as a whole, not at
88
- regions.
89
- - Any class outside `{bank_statement, invoice}` collapses into `other`.
90
- Don't expect meaningful gradients between `other` sub-types
91
- (receipts vs IDs vs photos).
92
 
93
  ## How to use
94
 
95
- ### Python (ONNX Runtime)
96
-
97
  ```python
98
  import json
99
  import numpy as np
100
  import onnxruntime as ort
101
  from PIL import Image
102
 
103
- session = ort.InferenceSession("invoice_classifier_int8_qdq.onnx")
104
  labels = json.load(open("labels.json"))
105
  mean = np.array([0.485, 0.456, 0.406], dtype=np.float32).reshape(3, 1, 1)
106
  std = np.array([0.229, 0.224, 0.225], dtype=np.float32).reshape(3, 1, 1)
@@ -117,13 +101,6 @@ probs /= probs.sum()
117
  print(labels[int(probs.argmax())], float(probs.max()))
118
  ```
119
 
120
- ### Flutter (ONNX Runtime Mobile)
121
-
122
- The companion Flutter app loads the model at startup, verifies its SHA-256,
123
- and runs inference per uploaded image / first PDF page. See `pinned_model.dart`
124
- in the app repo. The preprocessing pipeline (resize 256 β†’ center-crop 224 β†’
125
- ImageNet normalize β†’ NCHW) matches the Python snippet above byte-for-byte.
126
-
127
  ## Preprocessing
128
 
129
  | Step | Value |
@@ -137,8 +114,8 @@ ImageNet normalize β†’ NCHW) matches the Python snippet above byte-for-byte.
137
  | Layout | NCHW |
138
  | Dtype | float32 |
139
 
140
- These are the standard ImageNet stats β€” also captured in
141
- `preprocess.json` for programmatic loading.
142
 
143
  ## Training
144
 
@@ -152,39 +129,78 @@ These are the standard ImageNet stats β€” also captured in
152
  (discriminative learning rates).
153
  - **Loss**: `CrossEntropyLoss` with inverse-frequency class weights and
154
  label smoothing 0.05.
155
- - **Augmentation**: Resize(256) β†’ RandomResizedCrop(224, scale 0.7–1.0)
156
- β†’ ColorJitter (brightness/contrast/saturation/hue) β†’ small RandomRotation
157
  β†’ occasional grayscale β†’ ImageNet normalize.
158
  - **Best checkpoint**: selected by validation accuracy.
159
 
160
- The training, export, and quantization scripts are open-sourced in the
161
- [Tally OCR Flutter repo](https://github.com/) under `training/`.
162
-
163
  ## Evaluation
164
 
165
- > **TODO**: replace with measured numbers from your held-out test set.
166
 
167
- Recommended metrics to fill in before publishing a v1.0 model card:
168
 
169
- | Metric | fp32 | int8 (QDQ) |
170
- |--------|-----:|-----------:|
171
- | Top-1 accuracy (val) | _–_ | _–_ |
172
- | Macro F1 (val) | _–_ | _–_ |
173
- | Per-class F1 | _–_ | _–_ |
174
- | Top-1 disagreement vs fp32 | n/a | _–_ |
175
 
176
- ## Quantization quality check
 
 
 
 
177
 
178
- Always validate the int8 build before shipping:
179
 
180
- ```bash
181
- python -m src.infer --model outputs/invoice_classifier_fp32.onnx --image test/...
182
- python -m src.infer --model outputs/invoice_classifier_int8_qdq.onnx --image test/...
183
- ```
 
 
 
 
 
 
 
 
 
184
 
185
- If int8 disagrees with fp32 on more than ~1–2% of held-out test images,
186
- retry with more calibration data, switch to per-tensor weights, or fall
187
- back to fp32 (still only ~6 MB).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
188
 
189
  ## Limitations and bias
190
 
@@ -194,20 +210,19 @@ back to fp32 (still only ~6 MB).
194
  - **Photo conditions matter.** Heavy glare, motion blur, extreme skew
195
  (>~15Β°), or occlusion shifts predictions toward `other`.
196
  - **`other` is an open set.** Its decision boundary is determined entirely
197
- by what is present in the training data's `other/` folder. Receipts,
198
- IDs, screenshots, and shipping labels were included; any class not seen
199
- in training may be classified inconsistently.
200
  - **No PII handling.** Documents are processed as opaque pixels; the model
201
- does not redact or filter sensitive fields. Add your own redaction layer
202
- if uploading user data anywhere downstream.
203
 
204
  ## Files
205
 
206
  | File | Purpose |
207
  |------|---------|
208
- | `invoice_classifier_int8_qdq.onnx` | Mobile-ready int8 model (ship this). |
209
- | `invoice_classifier_fp32.onnx` | fp32 reference model. |
210
- | `labels.json` | Class name list, in model index order. |
211
  | `preprocess.json` | Input shape + ImageNet mean/std. |
212
  | `sha256.txt` | SHA-256 hashes + file sizes for pinned downloads. |
213
 
@@ -215,12 +230,9 @@ back to fp32 (still only ~6 MB).
215
 
216
  ```
217
  8f006366fcd633caae958ce511cdba87eb4a6d9d5de302e3d0cb8dd070d774dc invoice_classifier_fp32.onnx 6084524
218
- c39c3352d38379ee707642a056e55926719d7940f3e886be40e7afcc05526687 invoice_classifier_int8_qdq.onnx 1779282
219
  ```
220
 
221
- These are referenced verbatim in the Flutter app's `pinned_model.dart`
222
- to refuse any downloaded model whose hash doesn't match.
223
-
224
  ## License
225
 
226
  Apache-2.0. The pretrained ImageNet backbone is also Apache-2.0
@@ -228,13 +240,11 @@ Apache-2.0. The pretrained ImageNet backbone is also Apache-2.0
228
 
229
  ## Citation
230
 
231
- If you use this model, please cite:
232
-
233
  ```bibtex
234
- @software{tally_ocr_document_classifier,
235
- title = {Tally OCR Document Classifier (MobileNetV3-Small, QDQ int8)},
236
- author = {Tally OCR contributors},
237
  year = {2026},
238
- url = {https://huggingface.co/<your-username>/tally-ocr-document-classifier}
239
  }
240
  ```
 
8
  - mobile
9
  - on-device
10
  - document-classification
11
+ - tally
 
 
12
  library_name: onnx
13
  pipeline_tag: image-classification
14
  metrics:
15
  - accuracy
16
+ - f1
17
  base_model: timm/mobilenetv3_small_100.lamb_in1k
18
  datasets: []
19
  ---
20
 
21
+ # DocRex
22
 
23
  A small on-device document classifier that sorts a single image into one of:
24
 
 
26
  - `invoice`
27
  - `other`
28
 
29
+ Designed as a first-stage triage step before any heavyweight OCR or
30
+ extraction β€” small enough to ship inside a mobile app and run fully offline.
 
 
31
 
32
+ ## Recommended artifact
33
 
34
+ | File | Format | Size | Top-1 acc | Use |
35
+ |------|--------|-----:|----------:|------|
36
+ | **`invoice_classifier_fp32.onnx`** | ONNX, fp32 | ~5.8 MB | **98.35%** | **Ship this.** |
37
+ | `invoice_classifier_int8_qdq.onnx` | ONNX, QDQ static int8 | ~1.7 MB | 58.85% | ⚠️ Experimental β€” see _Quantization notes_. |
38
+
39
+ **TL;DR β€” use the fp32 model.** It's only ~6 MB, runs in well under
40
+ 100 ms per image on modern phone CPUs, and has no accuracy drop. The int8
41
+ build is included for reference but is **not recommended for deployment**
42
+ (details below).
43
 
44
  ## Model details
45
 
 
59
  ```
60
 
61
  - **Opset**: 18.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
  ## Intended use
64
 
65
+ - Triage classifier deciding whether a page is worth running invoice /
66
+ statement extraction on.
67
+ - Lightweight client-side filtering before backend OCR.
68
 
69
  ### Out of scope
70
 
71
+ - **Not an OCR model** β€” does not extract text, totals, dates, or account
72
+ numbers.
73
  - **Not a fraud / authenticity detector.**
74
+ - **Not a layout analyzer** β€” looks at the page as a whole.
75
+ - Anything outside `{bank_statement, invoice}` collapses into `other`. The
76
+ model does not distinguish sub-types of `other` (receipts vs IDs vs
77
+ photos).
 
78
 
79
  ## How to use
80
 
 
 
81
  ```python
82
  import json
83
  import numpy as np
84
  import onnxruntime as ort
85
  from PIL import Image
86
 
87
+ session = ort.InferenceSession("invoice_classifier_fp32.onnx")
88
  labels = json.load(open("labels.json"))
89
  mean = np.array([0.485, 0.456, 0.406], dtype=np.float32).reshape(3, 1, 1)
90
  std = np.array([0.229, 0.224, 0.225], dtype=np.float32).reshape(3, 1, 1)
 
101
  print(labels[int(probs.argmax())], float(probs.max()))
102
  ```
103
 
 
 
 
 
 
 
 
104
  ## Preprocessing
105
 
106
  | Step | Value |
 
114
  | Layout | NCHW |
115
  | Dtype | float32 |
116
 
117
+ Standard ImageNet stats β€” also captured in `preprocess.json` for
118
+ programmatic loading.
119
 
120
  ## Training
121
 
 
129
  (discriminative learning rates).
130
  - **Loss**: `CrossEntropyLoss` with inverse-frequency class weights and
131
  label smoothing 0.05.
132
+ - **Augmentation**: Resize(256) β†’ RandomResizedCrop(224, scale 0.7–1.0) β†’
133
+ ColorJitter (brightness/contrast/saturation/hue) β†’ small RandomRotation
134
  β†’ occasional grayscale β†’ ImageNet normalize.
135
  - **Best checkpoint**: selected by validation accuracy.
136
 
 
 
 
137
  ## Evaluation
138
 
139
+ Held-out test set: **243 images** across the three classes.
140
 
141
+ ### fp32
142
 
143
+ | Metric | Value |
144
+ |--------|------:|
145
+ | Top-1 accuracy | **98.35%** |
146
+ | Macro F1 | 0.9801 |
 
 
147
 
148
+ | Class | F1 |
149
+ |-------|---:|
150
+ | `bank_statement` | 0.9783 |
151
+ | `invoice` | 0.9697 |
152
+ | `other` | 0.9924 |
153
 
154
+ Confusion matrix (rows = true, cols = predicted):
155
 
156
+ | | bank_statement | invoice | other |
157
+ |--------------------|---------------:|--------:|------:|
158
+ | **bank_statement** | 45 | 2 | 0 |
159
+ | **invoice** | 0 | 64 | 0 |
160
+ | **other** | 0 | 2 | 130 |
161
+
162
+ ### int8 (QDQ) β€” not recommended
163
+
164
+ | Metric | Value |
165
+ |--------|------:|
166
+ | Top-1 accuracy | 58.85% |
167
+ | Macro F1 | 0.4517 |
168
+ | Top-1 disagreement vs fp32 | **40.74% (99/243)** |
169
 
170
+ Best result observed across `MinMax` / `Entropy` / `Percentile` calibration
171
+ Γ— per-channel / per-tensor weights. All configurations produce a similar
172
+ collapse (45–58% accuracy).
173
+
174
+ ## Quantization notes
175
+
176
+ Post-training static quantization of MobileNetV3-Small is a known-difficult
177
+ problem. The architecture's **Hardswish** activations and
178
+ **Squeeze-and-Excitation** blocks produce activation distributions with
179
+ extreme outliers that don't fit cleanly into INT8 scales. PTQ β€” regardless
180
+ of QDQ vs QOperator format, calibration method, or per-channel vs
181
+ per-tensor β€” accumulates enough error across ~140 tensors to collapse one
182
+ or more classes.
183
+
184
+ If you need a smaller model, in increasing order of effort:
185
+
186
+ 1. **FP16** β€” usually within rounding error of fp32. Simplest path to ~3 MB.
187
+ 2. **Quantization-aware training (QAT)** β€” torchvision provides
188
+ `models.quantization.mobilenet_v3_small`. Requires a retraining run but
189
+ typically lands within 1–2 points of fp32.
190
+ 3. **Switch architectures** β€” MobileNetV2, EfficientNet-Lite0, or a small
191
+ ConvNeXt variant all post-train-quantize more reliably than MNV3.
192
+
193
+ The shipped int8 file is left in this repo only as evidence of the failure
194
+ mode, not as a deployable artifact.
195
+
196
+ > **Why QDQ format anyway?** ONNX Runtime Mobile does not include
197
+ > `ConvInteger` / `MatMulInteger` operators. A model quantized with
198
+ > `QuantFormat.QOperator` or `quantize_dynamic` will load on desktop ORT
199
+ > and then fail at runtime on mobile with `code=9 (NOT_IMPLEMENTED)`. QDQ
200
+ > keeps standard `Conv` / `MatMul` nodes surrounded by
201
+ > `QuantizeLinear` / `DequantizeLinear`, which is the path ORT Mobile
202
+ > executes. So if you do produce a working int8 build (e.g. via QAT),
203
+ > export it as QDQ.
204
 
205
  ## Limitations and bias
206
 
 
210
  - **Photo conditions matter.** Heavy glare, motion blur, extreme skew
211
  (>~15Β°), or occlusion shifts predictions toward `other`.
212
  - **`other` is an open set.** Its decision boundary is determined entirely
213
+ by the contents of the training data's `other/` folder. Receipts, IDs,
214
+ screenshots, and shipping labels were included; any class not seen in
215
+ training may be classified inconsistently.
216
  - **No PII handling.** Documents are processed as opaque pixels; the model
217
+ does not redact or filter sensitive fields.
 
218
 
219
  ## Files
220
 
221
  | File | Purpose |
222
  |------|---------|
223
+ | `invoice_classifier_fp32.onnx` | **Recommended** β€” fp32 ONNX model. |
224
+ | `invoice_classifier_int8_qdq.onnx` | Experimental int8 build (not recommended). |
225
+ | `labels.json` | Class names in model index order. |
226
  | `preprocess.json` | Input shape + ImageNet mean/std. |
227
  | `sha256.txt` | SHA-256 hashes + file sizes for pinned downloads. |
228
 
 
230
 
231
  ```
232
  8f006366fcd633caae958ce511cdba87eb4a6d9d5de302e3d0cb8dd070d774dc invoice_classifier_fp32.onnx 6084524
233
+ 4190fa4b171544ea667089383af1a6e5747fa7229ed3d21344ef979bf6491a67 invoice_classifier_int8_qdq.onnx 1795776
234
  ```
235
 
 
 
 
236
  ## License
237
 
238
  Apache-2.0. The pretrained ImageNet backbone is also Apache-2.0
 
240
 
241
  ## Citation
242
 
 
 
243
  ```bibtex
244
+ @software{DocRex,
245
+ title = {DocRex (MobileNetV3-Small)},
246
+ author = {Vivek Kaushal},
247
  year = {2026},
248
+ url = {https://huggingface.co/vivekkaushal/DocRex}
249
  }
250
  ```
invoice_classifier_int8_qdq.onnx CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:c39c3352d38379ee707642a056e55926719d7940f3e886be40e7afcc05526687
3
- size 1779282
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4190fa4b171544ea667089383af1a6e5747fa7229ed3d21344ef979bf6491a67
3
+ size 1795776
sha256.txt CHANGED
@@ -1,2 +1,2 @@
1
  8f006366fcd633caae958ce511cdba87eb4a6d9d5de302e3d0cb8dd070d774dc invoice_classifier_fp32.onnx 6084524
2
- c39c3352d38379ee707642a056e55926719d7940f3e886be40e7afcc05526687 invoice_classifier_int8_qdq.onnx 1779282
 
1
  8f006366fcd633caae958ce511cdba87eb4a6d9d5de302e3d0cb8dd070d774dc invoice_classifier_fp32.onnx 6084524
2
+ 4190fa4b171544ea667089383af1a6e5747fa7229ed3d21344ef979bf6491a67 invoice_classifier_int8_qdq.onnx 1795776