---
library_name: transformers.js
pipeline_tag: token-classification
base_model: microsoft/xtremedistil-l6-h256-uncased
language:
  - en
license: cc-by-4.0
tags:
  - token-classification
  - slot-filling
  - ner
  - transformers.js
  - onnx
  - quantized
datasets:
  - AmazonScience/massive
metrics:
  - f1
  - precision
  - recall
  - accuracy
model-index:
  - name: notes-slots
    results:
      - task:
          type: token-classification
          name: Slot Extraction
        dataset:
          name: MASSIVE en-US (+ synthetic productivity/realistic)
          type: AmazonScience/massive
        metrics:
          - type: f1
            value: 0.8535
            name: Token F1 (fp32)
          - type: f1
            value: 0.7392
            name: Token F1 (q8, shipped)
          - type: accuracy
            value: 0.9550
            name: Accuracy (fp32)
---

# notes-slots

Compact token-classification model that extracts scheduling/task slots from short
English notes — **participants**, **datetimes**, **priorities**, and
**recurrences** — and runs fully client-side via [Transformers.js](https://huggingface.co/docs/transformers.js).

The shipped artifact is an **INT8-quantized ONNX** bundle (~13 MB) intended for
in-browser WASM inference, not a PyTorch checkpoint.

## Model details

| | |
|---|---|
| Base model | [`microsoft/xtremedistil-l6-h256-uncased`](https://huggingface.co/microsoft/xtremedistil-l6-h256-uncased) (MIT) |
| Architecture | `BertForTokenClassification` — 6 layers, hidden size 256, 8 heads, intermediate 1024, vocab 30522, max positions 512 (~13M params) |
| Task | Token classification (BIO slot tagging) |
| Schema version | `slot-labels-v0.3.0` |
| Model version | 0.1.0 |
| Languages | English |
| Runtime | Transformers.js v4, WASM device, dtype `q8` |
| Bundle size | 13.32 MB |
| `transformers` (training) | 4.57.6 |
| License | CC BY 4.0 |

### Labels (9, BIO)

`O`, `B-PARTICIPANT`, `I-PARTICIPANT`, `B-PRIORITY`, `I-PRIORITY`,
`B-DATETIME`, `I-DATETIME`, `B-RECURRENCE`, `I-RECURRENCE`

A bundled `transitions.json` carries empirical BIO transition log-probabilities
(Laplace-smoothed, invalid transitions hard-zeroed) for optional Viterbi-style
decoding on top of the raw token logits.

## Intended use

- **In scope:** extracting participants / datetimes / priorities / recurrence
  cues from short, informal English notes and reminders (calendar, to-do, email
  intent style text).
- **Out of scope:** long documents, languages other than English, normalization
  of extracted spans into structured datetimes (use a downstream parser such as
  `chrono-node` for that), and any high-stakes decisioning.

## Usage (Transformers.js)

```js
import { pipeline } from "@huggingface/transformers";

const tagger = await pipeline("token-classification", "jottypro/notes-slots", {
  dtype: "q8",
});

const out = await tagger("call Sarah next Friday at 5pm, high priority, every week");
console.log(out);
```

The ONNX weights live at `onnx/model_quantized.onnx`, which is the layout
Transformers.js expects when loading from the Hub.

## Training

- **Data:** [AmazonScience/MASSIVE](https://huggingface.co/datasets/AmazonScience/massive)
  `en-US` (config `en-US`, revision `d2362678…`), filtered to the
  `calendar` / `datetime` / `email` / `lists` scenarios with MASSIVE slots
  remapped onto the local 4-slot schema (e.g. `person`/`relation`/`email_address`
  → `PARTICIPANT`, `date`/`time`/`time_zone` → `DATETIME`,
  `general_frequency` → `RECURRENCE`), combined with synthetic
  *productivity* and *realistic* note generators.
- **Augmentation:** light, training-split only (`AUGMENT_FACTOR=2`) — random
  filler-word prefix, trailing punctuation, occasional `O`-token dropout;
  deduplicated against originals.
- **Hyperparameters:** 10 epochs with early stopping (patience 2, restore best
  by F1), batch size 64, learning rate 5e-5, cosine schedule, warmup ratio 0.1,
  weight decay 0.01, label smoothing 0.1, max sequence length 128, seed 42.
- **Quantization:** dynamic, per-channel `QInt8`, applied to `MatMul` and
  `Gather` ops via ONNX Runtime.

## Evaluation

Token-level metrics (seqeval) on the held-out test split (n ≈ 559).
**The q8 column reflects the artifact actually shipped in this repo.**

| Metric | fp32 | q8 (shipped) |
|---|---|---|
| Accuracy | 0.9550 | 0.9050 |
| Precision | 0.8283 | 0.8718 |
| Recall | 0.8802 | 0.6416 |
| **F1** | **0.8535** | **0.7392** |
| DATETIME F1 | 0.8191 | 0.6724 |
| PARTICIPANT F1 | 0.9208 | 0.8943 |
| PRIORITY F1 | 0.7979 | 0.6316 |
| RECURRENCE F1 | 0.8981 | 0.7093 |

## Limitations and bias

- **Quantization cost:** INT8 quantization raises precision slightly but cuts
  recall substantially (0.88 → 0.64; F1 0.85 → 0.74). The model misses more
  true spans than the fp32 model; tune downstream thresholds accordingly.
- **Domain:** trained on short calendar/task-style English notes plus synthetic
  data; expect degradation on long-form text, other domains, or other languages.
- **Synthetic data:** part of the training distribution is generated, so phrasing
  diversity and demographic coverage of names/relations is limited and may carry
  generator biases.
- **No span normalization:** the model tags spans only; converting a `DATETIME`
  span to an actual timestamp is a downstream concern.

## License and attribution

Released under **CC BY 4.0**, consistent with the MASSIVE training data
(CC BY 4.0). The base model `microsoft/xtremedistil-l6-h256-uncased` is MIT.

Part of the training data is derived from the MASSIVE dataset; CC BY 4.0
requires attribution to that source:

```bibtex
@misc{fitzgerald2022massive,
  title  = {MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages},
  author = {FitzGerald, Jack and others},
  year   = {2022},
  eprint = {2204.08582},
  archivePrefix = {arXiv}
}
```