notes-slots

Compact token-classification model that extracts scheduling/task slots from short English notes β€” participants, datetimes, priorities, and recurrences β€” and runs fully client-side via Transformers.js.

The shipped artifact is an INT8-quantized ONNX bundle (~13 MB) intended for in-browser WASM inference, not a PyTorch checkpoint.

Model details

Base model microsoft/xtremedistil-l6-h256-uncased (MIT)
Architecture BertForTokenClassification β€” 6 layers, hidden size 256, 8 heads, intermediate 1024, vocab 30522, max positions 512 (~13M params)
Task Token classification (BIO slot tagging)
Schema version slot-labels-v0.3.0
Model version 0.1.0
Languages English
Runtime Transformers.js v4, WASM device, dtype q8
Bundle size 13.32 MB
transformers (training) 4.57.6
License CC BY 4.0

Labels (9, BIO)

O, B-PARTICIPANT, I-PARTICIPANT, B-PRIORITY, I-PRIORITY, B-DATETIME, I-DATETIME, B-RECURRENCE, I-RECURRENCE

A bundled transitions.json carries empirical BIO transition log-probabilities (Laplace-smoothed, invalid transitions hard-zeroed) for optional Viterbi-style decoding on top of the raw token logits.

Intended use

  • In scope: extracting participants / datetimes / priorities / recurrence cues from short, informal English notes and reminders (calendar, to-do, email intent style text).
  • Out of scope: long documents, languages other than English, normalization of extracted spans into structured datetimes (use a downstream parser such as chrono-node for that), and any high-stakes decisioning.

Usage (Transformers.js)

import { pipeline } from "@huggingface/transformers";

const tagger = await pipeline("token-classification", "jottypro/notes-slots", {
  dtype: "q8",
});

const out = await tagger("call Sarah next Friday at 5pm, high priority, every week");
console.log(out);

The ONNX weights live at onnx/model_quantized.onnx, which is the layout Transformers.js expects when loading from the Hub.

Training

  • Data: AmazonScience/MASSIVE en-US (config en-US, revision d2362678…), filtered to the calendar / datetime / email / lists scenarios with MASSIVE slots remapped onto the local 4-slot schema (e.g. person/relation/email_address β†’ PARTICIPANT, date/time/time_zone β†’ DATETIME, general_frequency β†’ RECURRENCE), combined with synthetic productivity and realistic note generators.
  • Augmentation: light, training-split only (AUGMENT_FACTOR=2) β€” random filler-word prefix, trailing punctuation, occasional O-token dropout; deduplicated against originals.
  • Hyperparameters: 10 epochs with early stopping (patience 2, restore best by F1), batch size 64, learning rate 5e-5, cosine schedule, warmup ratio 0.1, weight decay 0.01, label smoothing 0.1, max sequence length 128, seed 42.
  • Quantization: dynamic, per-channel QInt8, applied to MatMul and Gather ops via ONNX Runtime.

Evaluation

Token-level metrics (seqeval) on the held-out test split (n β‰ˆ 559). The q8 column reflects the artifact actually shipped in this repo.

Metric fp32 q8 (shipped)
Accuracy 0.9550 0.9050
Precision 0.8283 0.8718
Recall 0.8802 0.6416
F1 0.8535 0.7392
DATETIME F1 0.8191 0.6724
PARTICIPANT F1 0.9208 0.8943
PRIORITY F1 0.7979 0.6316
RECURRENCE F1 0.8981 0.7093

Limitations and bias

  • Quantization cost: INT8 quantization raises precision slightly but cuts recall substantially (0.88 β†’ 0.64; F1 0.85 β†’ 0.74). The model misses more true spans than the fp32 model; tune downstream thresholds accordingly.
  • Domain: trained on short calendar/task-style English notes plus synthetic data; expect degradation on long-form text, other domains, or other languages.
  • Synthetic data: part of the training distribution is generated, so phrasing diversity and demographic coverage of names/relations is limited and may carry generator biases.
  • No span normalization: the model tags spans only; converting a DATETIME span to an actual timestamp is a downstream concern.

License and attribution

Released under CC BY 4.0, consistent with the MASSIVE training data (CC BY 4.0). The base model microsoft/xtremedistil-l6-h256-uncased is MIT.

Part of the training data is derived from the MASSIVE dataset; CC BY 4.0 requires attribution to that source:

@misc{fitzgerald2022massive,
  title  = {MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages},
  author = {FitzGerald, Jack and others},
  year   = {2022},
  eprint = {2204.08582},
  archivePrefix = {arXiv}
}
Downloads last month
37
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jottypro/notes-slots

Quantized
(4)
this model

Dataset used to train jottypro/notes-slots

Paper for jottypro/notes-slots

Evaluation results

  • Token F1 (fp32) on MASSIVE en-US (+ synthetic productivity/realistic)
    self-reported
    0.854
  • Token F1 (q8, shipped) on MASSIVE en-US (+ synthetic productivity/realistic)
    self-reported
    0.739
  • Accuracy (fp32) on MASSIVE en-US (+ synthetic productivity/realistic)
    self-reported
    0.955