Upload folder using huggingface_hub

0cecd81 verified 5 days ago

5.92 kB

library_name: transformers.js
pipeline_tag: token-classification
base_model: microsoft/xtremedistil-l6-h256-uncased
language:
  - en
license: cc-by-4.0
tags:
  - token-classification
  - slot-filling
  - ner
  - transformers.js
  - onnx
  - quantized
datasets:
  - AmazonScience/massive
metrics:
  - f1
  - precision
  - recall
  - accuracy
model-index:
  - name: notes-slots
    results:
      - task:
          type: token-classification
          name: Slot Extraction
        dataset:
          name: MASSIVE en-US (+ synthetic productivity/realistic)
          type: AmazonScience/massive
        metrics:
          - type: f1
            value: 0.8535
            name: Token F1 (fp32)
          - type: f1
            value: 0.7392
            name: Token F1 (q8, shipped)
          - type: accuracy
            value: 0.955
            name: Accuracy (fp32)

notes-slots

Compact token-classification model that extracts scheduling/task slots from short English notes — participants, datetimes, priorities, and recurrences — and runs fully client-side via Transformers.js.

The shipped artifact is an INT8-quantized ONNX bundle (~13 MB) intended for in-browser WASM inference, not a PyTorch checkpoint.

Model details


Base model	`microsoft/xtremedistil-l6-h256-uncased` (MIT)
Architecture	`BertForTokenClassification` — 6 layers, hidden size 256, 8 heads, intermediate 1024, vocab 30522, max positions 512 (~13M params)
Task	Token classification (BIO slot tagging)
Schema version	`slot-labels-v0.3.0`
Model version	0.1.0
Languages	English
Runtime	Transformers.js v4, WASM device, dtype `q8`
Bundle size	13.32 MB
`transformers` (training)	4.57.6
License	CC BY 4.0

Labels (9, BIO)

O, B-PARTICIPANT, I-PARTICIPANT, B-PRIORITY, I-PRIORITY, B-DATETIME, I-DATETIME, B-RECURRENCE, I-RECURRENCE

A bundled transitions.json carries empirical BIO transition log-probabilities (Laplace-smoothed, invalid transitions hard-zeroed) for optional Viterbi-style decoding on top of the raw token logits.

Intended use

In scope: extracting participants / datetimes / priorities / recurrence cues from short, informal English notes and reminders (calendar, to-do, email intent style text).
Out of scope: long documents, languages other than English, normalization of extracted spans into structured datetimes (use a downstream parser such as chrono-node for that), and any high-stakes decisioning.

Usage (Transformers.js)

import { pipeline } from "@huggingface/transformers";

const tagger = await pipeline("token-classification", "jottypro/notes-slots", {
  dtype: "q8",
});

const out = await tagger("call Sarah next Friday at 5pm, high priority, every week");
console.log(out);

The ONNX weights live at onnx/model_quantized.onnx, which is the layout Transformers.js expects when loading from the Hub.

Training

Data: AmazonScience/MASSIVE en-US (config en-US, revision d2362678…), filtered to the calendar / datetime / email / lists scenarios with MASSIVE slots remapped onto the local 4-slot schema (e.g. person/relation/email_address → PARTICIPANT, date/time/time_zone → DATETIME, general_frequency → RECURRENCE), combined with synthetic productivity and realistic note generators.
Augmentation: light, training-split only (AUGMENT_FACTOR=2) — random filler-word prefix, trailing punctuation, occasional O-token dropout; deduplicated against originals.
Hyperparameters: 10 epochs with early stopping (patience 2, restore best by F1), batch size 64, learning rate 5e-5, cosine schedule, warmup ratio 0.1, weight decay 0.01, label smoothing 0.1, max sequence length 128, seed 42.
Quantization: dynamic, per-channel QInt8, applied to MatMul and Gather ops via ONNX Runtime.

Evaluation

Token-level metrics (seqeval) on the held-out test split (n ≈ 559). The q8 column reflects the artifact actually shipped in this repo.

Metric	fp32	q8 (shipped)
Accuracy	0.9550	0.9050
Precision	0.8283	0.8718
Recall	0.8802	0.6416
F1	0.8535	0.7392
DATETIME F1	0.8191	0.6724
PARTICIPANT F1	0.9208	0.8943
PRIORITY F1	0.7979	0.6316
RECURRENCE F1	0.8981	0.7093

Limitations and bias

Quantization cost: INT8 quantization raises precision slightly but cuts recall substantially (0.88 → 0.64; F1 0.85 → 0.74). The model misses more true spans than the fp32 model; tune downstream thresholds accordingly.
Domain: trained on short calendar/task-style English notes plus synthetic data; expect degradation on long-form text, other domains, or other languages.
Synthetic data: part of the training distribution is generated, so phrasing diversity and demographic coverage of names/relations is limited and may carry generator biases.
No span normalization: the model tags spans only; converting a DATETIME span to an actual timestamp is a downstream concern.

License and attribution

Released under CC BY 4.0, consistent with the MASSIVE training data (CC BY 4.0). The base model microsoft/xtremedistil-l6-h256-uncased is MIT.

Part of the training data is derived from the MASSIVE dataset; CC BY 4.0 requires attribution to that source:

@misc{fitzgerald2022massive,
  title  = {MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages},
  author = {FitzGerald, Jack and others},
  year   = {2022},
  eprint = {2204.08582},
  archivePrefix = {arXiv}
}