--- library_name: transformers.js pipeline_tag: token-classification base_model: microsoft/xtremedistil-l6-h256-uncased language: - en license: cc-by-4.0 tags: - token-classification - slot-filling - ner - transformers.js - onnx - quantized datasets: - AmazonScience/massive metrics: - f1 - precision - recall - accuracy model-index: - name: notes-slots results: - task: type: token-classification name: Slot Extraction dataset: name: MASSIVE en-US (+ synthetic productivity/realistic) type: AmazonScience/massive metrics: - type: f1 value: 0.8535 name: Token F1 (fp32) - type: f1 value: 0.7392 name: Token F1 (q8, shipped) - type: accuracy value: 0.9550 name: Accuracy (fp32) --- # notes-slots Compact token-classification model that extracts scheduling/task slots from short English notes — **participants**, **datetimes**, **priorities**, and **recurrences** — and runs fully client-side via [Transformers.js](https://huggingface.co/docs/transformers.js). The shipped artifact is an **INT8-quantized ONNX** bundle (~13 MB) intended for in-browser WASM inference, not a PyTorch checkpoint. ## Model details | | | |---|---| | Base model | [`microsoft/xtremedistil-l6-h256-uncased`](https://huggingface.co/microsoft/xtremedistil-l6-h256-uncased) (MIT) | | Architecture | `BertForTokenClassification` — 6 layers, hidden size 256, 8 heads, intermediate 1024, vocab 30522, max positions 512 (~13M params) | | Task | Token classification (BIO slot tagging) | | Schema version | `slot-labels-v0.3.0` | | Model version | 0.1.0 | | Languages | English | | Runtime | Transformers.js v4, WASM device, dtype `q8` | | Bundle size | 13.32 MB | | `transformers` (training) | 4.57.6 | | License | CC BY 4.0 | ### Labels (9, BIO) `O`, `B-PARTICIPANT`, `I-PARTICIPANT`, `B-PRIORITY`, `I-PRIORITY`, `B-DATETIME`, `I-DATETIME`, `B-RECURRENCE`, `I-RECURRENCE` A bundled `transitions.json` carries empirical BIO transition log-probabilities (Laplace-smoothed, invalid transitions hard-zeroed) for optional Viterbi-style decoding on top of the raw token logits. ## Intended use - **In scope:** extracting participants / datetimes / priorities / recurrence cues from short, informal English notes and reminders (calendar, to-do, email intent style text). - **Out of scope:** long documents, languages other than English, normalization of extracted spans into structured datetimes (use a downstream parser such as `chrono-node` for that), and any high-stakes decisioning. ## Usage (Transformers.js) ```js import { pipeline } from "@huggingface/transformers"; const tagger = await pipeline("token-classification", "jottypro/notes-slots", { dtype: "q8", }); const out = await tagger("call Sarah next Friday at 5pm, high priority, every week"); console.log(out); ``` The ONNX weights live at `onnx/model_quantized.onnx`, which is the layout Transformers.js expects when loading from the Hub. ## Training - **Data:** [AmazonScience/MASSIVE](https://huggingface.co/datasets/AmazonScience/massive) `en-US` (config `en-US`, revision `d2362678…`), filtered to the `calendar` / `datetime` / `email` / `lists` scenarios with MASSIVE slots remapped onto the local 4-slot schema (e.g. `person`/`relation`/`email_address` → `PARTICIPANT`, `date`/`time`/`time_zone` → `DATETIME`, `general_frequency` → `RECURRENCE`), combined with synthetic *productivity* and *realistic* note generators. - **Augmentation:** light, training-split only (`AUGMENT_FACTOR=2`) — random filler-word prefix, trailing punctuation, occasional `O`-token dropout; deduplicated against originals. - **Hyperparameters:** 10 epochs with early stopping (patience 2, restore best by F1), batch size 64, learning rate 5e-5, cosine schedule, warmup ratio 0.1, weight decay 0.01, label smoothing 0.1, max sequence length 128, seed 42. - **Quantization:** dynamic, per-channel `QInt8`, applied to `MatMul` and `Gather` ops via ONNX Runtime. ## Evaluation Token-level metrics (seqeval) on the held-out test split (n ≈ 559). **The q8 column reflects the artifact actually shipped in this repo.** | Metric | fp32 | q8 (shipped) | |---|---|---| | Accuracy | 0.9550 | 0.9050 | | Precision | 0.8283 | 0.8718 | | Recall | 0.8802 | 0.6416 | | **F1** | **0.8535** | **0.7392** | | DATETIME F1 | 0.8191 | 0.6724 | | PARTICIPANT F1 | 0.9208 | 0.8943 | | PRIORITY F1 | 0.7979 | 0.6316 | | RECURRENCE F1 | 0.8981 | 0.7093 | ## Limitations and bias - **Quantization cost:** INT8 quantization raises precision slightly but cuts recall substantially (0.88 → 0.64; F1 0.85 → 0.74). The model misses more true spans than the fp32 model; tune downstream thresholds accordingly. - **Domain:** trained on short calendar/task-style English notes plus synthetic data; expect degradation on long-form text, other domains, or other languages. - **Synthetic data:** part of the training distribution is generated, so phrasing diversity and demographic coverage of names/relations is limited and may carry generator biases. - **No span normalization:** the model tags spans only; converting a `DATETIME` span to an actual timestamp is a downstream concern. ## License and attribution Released under **CC BY 4.0**, consistent with the MASSIVE training data (CC BY 4.0). The base model `microsoft/xtremedistil-l6-h256-uncased` is MIT. Part of the training data is derived from the MASSIVE dataset; CC BY 4.0 requires attribution to that source: ```bibtex @misc{fitzgerald2022massive, title = {MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages}, author = {FitzGerald, Jack and others}, year = {2022}, eprint = {2204.08582}, archivePrefix = {arXiv} } ```