notes-slots / README.md
maadgrom's picture
Upload folder using huggingface_hub
0cecd81 verified
---
library_name: transformers.js
pipeline_tag: token-classification
base_model: microsoft/xtremedistil-l6-h256-uncased
language:
- en
license: cc-by-4.0
tags:
- token-classification
- slot-filling
- ner
- transformers.js
- onnx
- quantized
datasets:
- AmazonScience/massive
metrics:
- f1
- precision
- recall
- accuracy
model-index:
- name: notes-slots
results:
- task:
type: token-classification
name: Slot Extraction
dataset:
name: MASSIVE en-US (+ synthetic productivity/realistic)
type: AmazonScience/massive
metrics:
- type: f1
value: 0.8535
name: Token F1 (fp32)
- type: f1
value: 0.7392
name: Token F1 (q8, shipped)
- type: accuracy
value: 0.9550
name: Accuracy (fp32)
---
# notes-slots
Compact token-classification model that extracts scheduling/task slots from short
English notes β€” **participants**, **datetimes**, **priorities**, and
**recurrences** β€” and runs fully client-side via [Transformers.js](https://huggingface.co/docs/transformers.js).
The shipped artifact is an **INT8-quantized ONNX** bundle (~13 MB) intended for
in-browser WASM inference, not a PyTorch checkpoint.
## Model details
| | |
|---|---|
| Base model | [`microsoft/xtremedistil-l6-h256-uncased`](https://huggingface.co/microsoft/xtremedistil-l6-h256-uncased) (MIT) |
| Architecture | `BertForTokenClassification` β€” 6 layers, hidden size 256, 8 heads, intermediate 1024, vocab 30522, max positions 512 (~13M params) |
| Task | Token classification (BIO slot tagging) |
| Schema version | `slot-labels-v0.3.0` |
| Model version | 0.1.0 |
| Languages | English |
| Runtime | Transformers.js v4, WASM device, dtype `q8` |
| Bundle size | 13.32 MB |
| `transformers` (training) | 4.57.6 |
| License | CC BY 4.0 |
### Labels (9, BIO)
`O`, `B-PARTICIPANT`, `I-PARTICIPANT`, `B-PRIORITY`, `I-PRIORITY`,
`B-DATETIME`, `I-DATETIME`, `B-RECURRENCE`, `I-RECURRENCE`
A bundled `transitions.json` carries empirical BIO transition log-probabilities
(Laplace-smoothed, invalid transitions hard-zeroed) for optional Viterbi-style
decoding on top of the raw token logits.
## Intended use
- **In scope:** extracting participants / datetimes / priorities / recurrence
cues from short, informal English notes and reminders (calendar, to-do, email
intent style text).
- **Out of scope:** long documents, languages other than English, normalization
of extracted spans into structured datetimes (use a downstream parser such as
`chrono-node` for that), and any high-stakes decisioning.
## Usage (Transformers.js)
```js
import { pipeline } from "@huggingface/transformers";
const tagger = await pipeline("token-classification", "jottypro/notes-slots", {
dtype: "q8",
});
const out = await tagger("call Sarah next Friday at 5pm, high priority, every week");
console.log(out);
```
The ONNX weights live at `onnx/model_quantized.onnx`, which is the layout
Transformers.js expects when loading from the Hub.
## Training
- **Data:** [AmazonScience/MASSIVE](https://huggingface.co/datasets/AmazonScience/massive)
`en-US` (config `en-US`, revision `d2362678…`), filtered to the
`calendar` / `datetime` / `email` / `lists` scenarios with MASSIVE slots
remapped onto the local 4-slot schema (e.g. `person`/`relation`/`email_address`
β†’ `PARTICIPANT`, `date`/`time`/`time_zone` β†’ `DATETIME`,
`general_frequency` β†’ `RECURRENCE`), combined with synthetic
*productivity* and *realistic* note generators.
- **Augmentation:** light, training-split only (`AUGMENT_FACTOR=2`) β€” random
filler-word prefix, trailing punctuation, occasional `O`-token dropout;
deduplicated against originals.
- **Hyperparameters:** 10 epochs with early stopping (patience 2, restore best
by F1), batch size 64, learning rate 5e-5, cosine schedule, warmup ratio 0.1,
weight decay 0.01, label smoothing 0.1, max sequence length 128, seed 42.
- **Quantization:** dynamic, per-channel `QInt8`, applied to `MatMul` and
`Gather` ops via ONNX Runtime.
## Evaluation
Token-level metrics (seqeval) on the held-out test split (n β‰ˆ 559).
**The q8 column reflects the artifact actually shipped in this repo.**
| Metric | fp32 | q8 (shipped) |
|---|---|---|
| Accuracy | 0.9550 | 0.9050 |
| Precision | 0.8283 | 0.8718 |
| Recall | 0.8802 | 0.6416 |
| **F1** | **0.8535** | **0.7392** |
| DATETIME F1 | 0.8191 | 0.6724 |
| PARTICIPANT F1 | 0.9208 | 0.8943 |
| PRIORITY F1 | 0.7979 | 0.6316 |
| RECURRENCE F1 | 0.8981 | 0.7093 |
## Limitations and bias
- **Quantization cost:** INT8 quantization raises precision slightly but cuts
recall substantially (0.88 β†’ 0.64; F1 0.85 β†’ 0.74). The model misses more
true spans than the fp32 model; tune downstream thresholds accordingly.
- **Domain:** trained on short calendar/task-style English notes plus synthetic
data; expect degradation on long-form text, other domains, or other languages.
- **Synthetic data:** part of the training distribution is generated, so phrasing
diversity and demographic coverage of names/relations is limited and may carry
generator biases.
- **No span normalization:** the model tags spans only; converting a `DATETIME`
span to an actual timestamp is a downstream concern.
## License and attribution
Released under **CC BY 4.0**, consistent with the MASSIVE training data
(CC BY 4.0). The base model `microsoft/xtremedistil-l6-h256-uncased` is MIT.
Part of the training data is derived from the MASSIVE dataset; CC BY 4.0
requires attribution to that source:
```bibtex
@misc{fitzgerald2022massive,
title = {MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages},
author = {FitzGerald, Jack and others},
year = {2022},
eprint = {2204.08582},
archivePrefix = {arXiv}
}
```