Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

README.md +100 -90
invoice_classifier_int8_qdq.onnx +2 -2
sha256.txt +1 -1

README.md CHANGED Viewed

@@ -8,18 +8,17 @@ tags:
   - mobile
   - on-device
   - document-classification
-  - quantized
-  - int8
-  - qdq
 library_name: onnx
 pipeline_tag: image-classification
 metrics:
   - accuracy
 base_model: timm/mobilenetv3_small_100.lamb_in1k
 datasets: []
 ---
-# tally-ocr-document-classifier
 A small on-device document classifier that sorts a single image into one of:
@@ -27,17 +26,20 @@ A small on-device document classifier that sorts a single image into one of:
 - `invoice`
 - `other`
-It is the first-stage triage model in the [Tally OCR](https://github.com/) Flutter
-app — every uploaded or scanned page hits this model before any OCR or
-downstream extraction is attempted, so it has to be **fast, small, and run
-fully offline**.
-The repo ships two artifacts:
-| File | Format | Size | Use |
-|------|--------|-----:|------|
-| `invoice_classifier_int8_qdq.onnx` | ONNX, **QDQ static int8** | ~1.7 MB | **Ship this on-device.** Runs on ONNX Runtime Mobile. |
-| `invoice_classifier_fp32.onnx` | ONNX, fp32 | ~5.8 MB | Reference / desktop / accuracy comparisons. |
 ## Model details
@@ -57,50 +59,32 @@ The repo ships two artifacts:
   ```
 - **Opset**: 18.
-- **Quantization**: static, **QDQ format**, per-channel,
-  `QuantType.QUInt8` activations / `QuantType.QInt8` weights, calibrated
-  on ~200 in-domain images.
-### Why QDQ?
-ONNX Runtime Mobile (the kernel set used by the
-[`onnxruntime` Flutter package](https://pub.dev/packages/onnxruntime))
-does **not** include `ConvInteger` / `MatMulInteger` operators. A model
-quantized with `QuantFormat.QOperator` or `quantize_dynamic` will load
-fine on desktop ORT and then fail at runtime on mobile with
-`code=9 (NOT_IMPLEMENTED)`. QDQ keeps the original `Conv` / `MatMul`
-nodes and surrounds them with `QuantizeLinear` / `DequantizeLinear`,
-which is the path ORT Mobile actually executes. Use the QDQ build for
-any phone deployment.
 ## Intended use
-- Triage page on whether an uploaded document is worth running heavyweight
-  invoice / statement extraction on.
-- Lightweight client-side filtering before backend OCR to save round-trips.
 ### Out of scope
-- **Not an OCR model** — it doesn't extract text, totals, dates, or
-  account numbers. Pair it with a downstream OCR stage.
 - **Not a fraud / authenticity detector.**
-- **Not a layout analyzer.** It looks at the page as a whole, not at
-  regions.
-- Any class outside `{bank_statement, invoice}` collapses into `other`.
-  Don't expect meaningful gradients between `other` sub-types
-  (receipts vs IDs vs photos).
 ## How to use
-### Python (ONNX Runtime)
 ```python
 import json
 import numpy as np
 import onnxruntime as ort
 from PIL import Image
-session = ort.InferenceSession("invoice_classifier_int8_qdq.onnx")
 labels = json.load(open("labels.json"))
 mean = np.array([0.485, 0.456, 0.406], dtype=np.float32).reshape(3, 1, 1)
 std  = np.array([0.229, 0.224, 0.225], dtype=np.float32).reshape(3, 1, 1)
@@ -117,13 +101,6 @@ probs /= probs.sum()
 print(labels[int(probs.argmax())], float(probs.max()))
 ```
-### Flutter (ONNX Runtime Mobile)
-The companion Flutter app loads the model at startup, verifies its SHA-256,
-and runs inference per uploaded image / first PDF page. See `pinned_model.dart`
-in the app repo. The preprocessing pipeline (resize 256 → center-crop 224 →
-ImageNet normalize → NCHW) matches the Python snippet above byte-for-byte.
 ## Preprocessing
 | Step | Value |
@@ -137,8 +114,8 @@ ImageNet normalize → NCHW) matches the Python snippet above byte-for-byte.
 | Layout | NCHW |
 | Dtype | float32 |
-These are the standard ImageNet stats — also captured in
-`preprocess.json` for programmatic loading.
 ## Training
@@ -152,39 +129,78 @@ These are the standard ImageNet stats — also captured in
   (discriminative learning rates).
 - **Loss**: `CrossEntropyLoss` with inverse-frequency class weights and
   label smoothing 0.05.
-- **Augmentation**: Resize(256) → RandomResizedCrop(224, scale 0.7–1.0)
-  → ColorJitter (brightness/contrast/saturation/hue) → small RandomRotation
   → occasional grayscale → ImageNet normalize.
 - **Best checkpoint**: selected by validation accuracy.
-The training, export, and quantization scripts are open-sourced in the
-[Tally OCR Flutter repo](https://github.com/) under `training/`.
 ## Evaluation
-> **TODO**: replace with measured numbers from your held-out test set.
-Recommended metrics to fill in before publishing a v1.0 model card:
-| Metric | fp32 | int8 (QDQ) |
-|--------|-----:|-----------:|
-| Top-1 accuracy (val) | _–_ | _–_ |
-| Macro F1 (val) | _–_ | _–_ |
-| Per-class F1 | _–_ | _–_ |
-| Top-1 disagreement vs fp32 | n/a | _–_ |
-## Quantization quality check
-Always validate the int8 build before shipping:
-```bash
-python -m src.infer --model outputs/invoice_classifier_fp32.onnx     --image test/...
-python -m src.infer --model outputs/invoice_classifier_int8_qdq.onnx --image test/...
-```
-If int8 disagrees with fp32 on more than ~1–2% of held-out test images,
-retry with more calibration data, switch to per-tensor weights, or fall
-back to fp32 (still only ~6 MB).
 ## Limitations and bias
@@ -194,20 +210,19 @@ back to fp32 (still only ~6 MB).
 - **Photo conditions matter.** Heavy glare, motion blur, extreme skew
   (>~15°), or occlusion shifts predictions toward `other`.
 - **`other` is an open set.** Its decision boundary is determined entirely
-  by what is present in the training data's `other/` folder. Receipts,
-  IDs, screenshots, and shipping labels were included; any class not seen
-  in training may be classified inconsistently.
 - **No PII handling.** Documents are processed as opaque pixels; the model
-  does not redact or filter sensitive fields. Add your own redaction layer
-  if uploading user data anywhere downstream.
 ## Files
 | File | Purpose |
 |------|---------|
-| `invoice_classifier_int8_qdq.onnx` | Mobile-ready int8 model (ship this). |
-| `invoice_classifier_fp32.onnx` | fp32 reference model. |
-| `labels.json` | Class name list, in model index order. |
 | `preprocess.json` | Input shape + ImageNet mean/std. |
 | `sha256.txt` | SHA-256 hashes + file sizes for pinned downloads. |
@@ -215,12 +230,9 @@ back to fp32 (still only ~6 MB).
 ```
 8f006366fcd633caae958ce511cdba87eb4a6d9d5de302e3d0cb8dd070d774dc  invoice_classifier_fp32.onnx     6084524
-c39c3352d38379ee707642a056e55926719d7940f3e886be40e7afcc05526687  invoice_classifier_int8_qdq.onnx 1779282
 ```
-These are referenced verbatim in the Flutter app's `pinned_model.dart`
-to refuse any downloaded model whose hash doesn't match.
 ## License
 Apache-2.0. The pretrained ImageNet backbone is also Apache-2.0
@@ -228,13 +240,11 @@ Apache-2.0. The pretrained ImageNet backbone is also Apache-2.0
 ## Citation
-If you use this model, please cite:
 ```bibtex
-@software{tally_ocr_document_classifier,
-  title  = {Tally OCR Document Classifier (MobileNetV3-Small, QDQ int8)},
-  author = {Tally OCR contributors},
   year   = {2026},
-  url    = {https://huggingface.co/<your-username>/tally-ocr-document-classifier}
 }
 ```

   - mobile
   - on-device
   - document-classification
+  - tally
 library_name: onnx
 pipeline_tag: image-classification
 metrics:
   - accuracy
+  - f1
 base_model: timm/mobilenetv3_small_100.lamb_in1k
 datasets: []
 ---
+# DocRex
 A small on-device document classifier that sorts a single image into one of:
 - `invoice`
 - `other`
+Designed as a first-stage triage step before any heavyweight OCR or
+extraction — small enough to ship inside a mobile app and run fully offline.
+## Recommended artifact
+| File | Format | Size | Top-1 acc | Use |
+|------|--------|-----:|----------:|------|
+| **`invoice_classifier_fp32.onnx`** | ONNX, fp32 | ~5.8 MB | **98.35%** | **Ship this.** |
+| `invoice_classifier_int8_qdq.onnx` | ONNX, QDQ static int8 | ~1.7 MB | 58.85% | ⚠️ Experimental — see _Quantization notes_. |
+**TL;DR — use the fp32 model.** It's only ~6 MB, runs in well under
+100 ms per image on modern phone CPUs, and has no accuracy drop. The int8
+build is included for reference but is **not recommended for deployment**
+(details below).
 ## Model details
   ```
 - **Opset**: 18.
 ## Intended use
+- Triage classifier deciding whether a page is worth running invoice /
+  statement extraction on.
+- Lightweight client-side filtering before backend OCR.
 ### Out of scope
+- **Not an OCR model** — does not extract text, totals, dates, or account
+  numbers.
 - **Not a fraud / authenticity detector.**
+- **Not a layout analyzer** — looks at the page as a whole.
+- Anything outside `{bank_statement, invoice}` collapses into `other`. The
+  model does not distinguish sub-types of `other` (receipts vs IDs vs
+  photos).
 ## How to use
 ```python
 import json
 import numpy as np
 import onnxruntime as ort
 from PIL import Image
+session = ort.InferenceSession("invoice_classifier_fp32.onnx")
 labels = json.load(open("labels.json"))
 mean = np.array([0.485, 0.456, 0.406], dtype=np.float32).reshape(3, 1, 1)
 std  = np.array([0.229, 0.224, 0.225], dtype=np.float32).reshape(3, 1, 1)
 print(labels[int(probs.argmax())], float(probs.max()))
 ```
 ## Preprocessing
 | Step | Value |
 | Layout | NCHW |
 | Dtype | float32 |
+Standard ImageNet stats — also captured in `preprocess.json` for
+programmatic loading.
 ## Training
   (discriminative learning rates).
 - **Loss**: `CrossEntropyLoss` with inverse-frequency class weights and
   label smoothing 0.05.
+- **Augmentation**: Resize(256) → RandomResizedCrop(224, scale 0.7–1.0) →
+  ColorJitter (brightness/contrast/saturation/hue) → small RandomRotation
   → occasional grayscale → ImageNet normalize.
 - **Best checkpoint**: selected by validation accuracy.
 ## Evaluation
+Held-out test set: **243 images** across the three classes.
+### fp32
+| Metric | Value |
+|--------|------:|
+| Top-1 accuracy | **98.35%** |
+| Macro F1 | 0.9801 |
+| Class | F1 |
+|-------|---:|
+| `bank_statement` | 0.9783 |
+| `invoice` | 0.9697 |
+| `other` | 0.9924 |
+Confusion matrix (rows = true, cols = predicted):
+|                    | bank_statement | invoice | other |
+|--------------------|---------------:|--------:|------:|
+| **bank_statement** | 45 | 2 | 0 |
+| **invoice**        | 0  | 64 | 0 |
+| **other**          | 0  | 2 | 130 |
+### int8 (QDQ) — not recommended
+| Metric | Value |
+|--------|------:|
+| Top-1 accuracy | 58.85% |
+| Macro F1 | 0.4517 |
+| Top-1 disagreement vs fp32 | **40.74% (99/243)** |
+Best result observed across `MinMax` / `Entropy` / `Percentile` calibration
+× per-channel / per-tensor weights. All configurations produce a similar
+collapse (45–58% accuracy).
+## Quantization notes
+Post-training static quantization of MobileNetV3-Small is a known-difficult
+problem. The architecture's **Hardswish** activations and
+**Squeeze-and-Excitation** blocks produce activation distributions with
+extreme outliers that don't fit cleanly into INT8 scales. PTQ — regardless
+of QDQ vs QOperator format, calibration method, or per-channel vs
+per-tensor — accumulates enough error across ~140 tensors to collapse one
+or more classes.
+If you need a smaller model, in increasing order of effort:
+1. **FP16** — usually within rounding error of fp32. Simplest path to ~3 MB.
+2. **Quantization-aware training (QAT)** — torchvision provides
+   `models.quantization.mobilenet_v3_small`. Requires a retraining run but
+   typically lands within 1–2 points of fp32.
+3. **Switch architectures** — MobileNetV2, EfficientNet-Lite0, or a small
+   ConvNeXt variant all post-train-quantize more reliably than MNV3.
+The shipped int8 file is left in this repo only as evidence of the failure
+mode, not as a deployable artifact.
+> **Why QDQ format anyway?** ONNX Runtime Mobile does not include
+> `ConvInteger` / `MatMulInteger` operators. A model quantized with
+> `QuantFormat.QOperator` or `quantize_dynamic` will load on desktop ORT
+> and then fail at runtime on mobile with `code=9 (NOT_IMPLEMENTED)`. QDQ
+> keeps standard `Conv` / `MatMul` nodes surrounded by
+> `QuantizeLinear` / `DequantizeLinear`, which is the path ORT Mobile
+> executes. So if you do produce a working int8 build (e.g. via QAT),
+> export it as QDQ.
 ## Limitations and bias
 - **Photo conditions matter.** Heavy glare, motion blur, extreme skew
   (>~15°), or occlusion shifts predictions toward `other`.
 - **`other` is an open set.** Its decision boundary is determined entirely
+  by the contents of the training data's `other/` folder. Receipts, IDs,
+  screenshots, and shipping labels were included; any class not seen in
+  training may be classified inconsistently.
 - **No PII handling.** Documents are processed as opaque pixels; the model
+  does not redact or filter sensitive fields.
 ## Files
 | File | Purpose |
 |------|---------|
+| `invoice_classifier_fp32.onnx` | **Recommended** — fp32 ONNX model. |
+| `invoice_classifier_int8_qdq.onnx` | Experimental int8 build (not recommended). |
+| `labels.json` | Class names in model index order. |
 | `preprocess.json` | Input shape + ImageNet mean/std. |
 | `sha256.txt` | SHA-256 hashes + file sizes for pinned downloads. |
 ```
 8f006366fcd633caae958ce511cdba87eb4a6d9d5de302e3d0cb8dd070d774dc  invoice_classifier_fp32.onnx     6084524
+4190fa4b171544ea667089383af1a6e5747fa7229ed3d21344ef979bf6491a67  invoice_classifier_int8_qdq.onnx 1795776
 ```
 ## License
 Apache-2.0. The pretrained ImageNet backbone is also Apache-2.0
 ## Citation
 ```bibtex
+@software{DocRex,
+  title  = {DocRex (MobileNetV3-Small)},
+  author = {Vivek Kaushal},
   year   = {2026},
+  url    = {https://huggingface.co/vivekkaushal/DocRex}
 }
 ```

invoice_classifier_int8_qdq.onnx CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c39c3352d38379ee707642a056e55926719d7940f3e886be40e7afcc05526687
-size 1779282

 version https://git-lfs.github.com/spec/v1
+oid sha256:4190fa4b171544ea667089383af1a6e5747fa7229ed3d21344ef979bf6491a67
+size 1795776

sha256.txt CHANGED Viewed

	@@ -1,2 +1,2 @@
1	8f006366fcd633caae958ce511cdba87eb4a6d9d5de302e3d0cb8dd070d774dc invoice_classifier_fp32.onnx 6084524
2	- ~~c39c3352d38379ee707642a056e55926719d7940f3e886be40e7afcc05526687~~ invoice_classifier_int8_qdq.onnx ~~1779282~~


1	8f006366fcd633caae958ce511cdba87eb4a6d9d5de302e3d0cb8dd070d774dc invoice_classifier_fp32.onnx 6084524
2	+ 4190fa4b171544ea667089383af1a6e5747fa7229ed3d21344ef979bf6491a67 invoice_classifier_int8_qdq.onnx 1795776