Upload folder using huggingface_hub

0cecd81 verified 5 days ago

5.92 kB

	---
	library_name: transformers.js
	pipeline_tag: token-classification
	base_model: microsoft/xtremedistil-l6-h256-uncased
	language:
	- en
	license: cc-by-4.0
	tags:
	- token-classification
	- slot-filling
	- ner
	- transformers.js
	- onnx
	- quantized
	datasets:
	- AmazonScience/massive
	metrics:
	- f1
	- precision
	- recall
	- accuracy
	model-index:
	- name: notes-slots
	results:
	- task:
	type: token-classification
	name: Slot Extraction
	dataset:
	name: MASSIVE en-US (+ synthetic productivity/realistic)
	type: AmazonScience/massive
	metrics:
	- type: f1
	value: 0.8535
	name: Token F1 (fp32)
	- type: f1
	value: 0.7392
	name: Token F1 (q8, shipped)
	- type: accuracy
	value: 0.9550
	name: Accuracy (fp32)
	---

	# notes-slots

	Compact token-classification model that extracts scheduling/task slots from short
	English notes — participants, datetimes, priorities, and
	recurrences — and runs fully client-side via [Transformers.js](https://huggingface.co/docs/transformers.js).

	The shipped artifact is an INT8-quantized ONNX bundle (~13 MB) intended for
	in-browser WASM inference, not a PyTorch checkpoint.

	## Model details

	\| \| \|
	\|---\|---\|
	\| Base model \| [`microsoft/xtremedistil-l6-h256-uncased`](https://huggingface.co/microsoft/xtremedistil-l6-h256-uncased) (MIT) \|
	\| Architecture \| `BertForTokenClassification` — 6 layers, hidden size 256, 8 heads, intermediate 1024, vocab 30522, max positions 512 (~13M params) \|
	\| Task \| Token classification (BIO slot tagging) \|
	\| Schema version \| `slot-labels-v0.3.0` \|
	\| Model version \| 0.1.0 \|
	\| Languages \| English \|
	\| Runtime \| Transformers.js v4, WASM device, dtype `q8` \|
	\| Bundle size \| 13.32 MB \|
	\| `transformers` (training) \| 4.57.6 \|
	\| License \| CC BY 4.0 \|

	### Labels (9, BIO)

	`O`, `B-PARTICIPANT`, `I-PARTICIPANT`, `B-PRIORITY`, `I-PRIORITY`,
	`B-DATETIME`, `I-DATETIME`, `B-RECURRENCE`, `I-RECURRENCE`

	A bundled `transitions.json` carries empirical BIO transition log-probabilities
	(Laplace-smoothed, invalid transitions hard-zeroed) for optional Viterbi-style
	decoding on top of the raw token logits.

	## Intended use

	- In scope: extracting participants / datetimes / priorities / recurrence
	cues from short, informal English notes and reminders (calendar, to-do, email
	intent style text).
	- Out of scope: long documents, languages other than English, normalization
	of extracted spans into structured datetimes (use a downstream parser such as
	`chrono-node` for that), and any high-stakes decisioning.

	## Usage (Transformers.js)

	```js
	import { pipeline } from "@huggingface/transformers";

	const tagger = await pipeline("token-classification", "jottypro/notes-slots", {
	dtype: "q8",
	});

	const out = await tagger("call Sarah next Friday at 5pm, high priority, every week");
	console.log(out);
	```

	The ONNX weights live at `onnx/model_quantized.onnx`, which is the layout
	Transformers.js expects when loading from the Hub.

	## Training

	- Data: [AmazonScience/MASSIVE](https://huggingface.co/datasets/AmazonScience/massive)
	`en-US` (config `en-US`, revision `d2362678…`), filtered to the
	`calendar` / `datetime` / `email` / `lists` scenarios with MASSIVE slots
	remapped onto the local 4-slot schema (e.g. `person`/`relation`/`email_address`
	→ `PARTICIPANT`, `date`/`time`/`time_zone` → `DATETIME`,
	`general_frequency` → `RECURRENCE`), combined with synthetic
	productivity and realistic note generators.
	- Augmentation: light, training-split only (`AUGMENT_FACTOR=2`) — random
	filler-word prefix, trailing punctuation, occasional `O`-token dropout;
	deduplicated against originals.
	- Hyperparameters: 10 epochs with early stopping (patience 2, restore best
	by F1), batch size 64, learning rate 5e-5, cosine schedule, warmup ratio 0.1,
	weight decay 0.01, label smoothing 0.1, max sequence length 128, seed 42.
	- Quantization: dynamic, per-channel `QInt8`, applied to `MatMul` and
	`Gather` ops via ONNX Runtime.

	## Evaluation

	Token-level metrics (seqeval) on the held-out test split (n ≈ 559).
	The q8 column reflects the artifact actually shipped in this repo.

	\| Metric \| fp32 \| q8 (shipped) \|
	\|---\|---\|---\|
	\| Accuracy \| 0.9550 \| 0.9050 \|
	\| Precision \| 0.8283 \| 0.8718 \|
	\| Recall \| 0.8802 \| 0.6416 \|
	\| F1 \| 0.8535 \| 0.7392 \|
	\| DATETIME F1 \| 0.8191 \| 0.6724 \|
	\| PARTICIPANT F1 \| 0.9208 \| 0.8943 \|
	\| PRIORITY F1 \| 0.7979 \| 0.6316 \|
	\| RECURRENCE F1 \| 0.8981 \| 0.7093 \|

	## Limitations and bias

	- Quantization cost: INT8 quantization raises precision slightly but cuts
	recall substantially (0.88 → 0.64; F1 0.85 → 0.74). The model misses more
	true spans than the fp32 model; tune downstream thresholds accordingly.
	- Domain: trained on short calendar/task-style English notes plus synthetic
	data; expect degradation on long-form text, other domains, or other languages.
	- Synthetic data: part of the training distribution is generated, so phrasing
	diversity and demographic coverage of names/relations is limited and may carry
	generator biases.
	- No span normalization: the model tags spans only; converting a `DATETIME`
	span to an actual timestamp is a downstream concern.

	## License and attribution

	Released under CC BY 4.0, consistent with the MASSIVE training data
	(CC BY 4.0). The base model `microsoft/xtremedistil-l6-h256-uncased` is MIT.

	Part of the training data is derived from the MASSIVE dataset; CC BY 4.0
	requires attribution to that source:

	```bibtex
	@misc{fitzgerald2022massive,
	title = {MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages},
	author = {FitzGerald, Jack and others},
	year = {2022},
	eprint = {2204.08582},
	archivePrefix = {arXiv}
	}
	```