README.md · temsa/IrishCore-DiffMask-135M-v1-rc2 at main

IrishCore-DiffMask-135M-v1-rc2 / README.md

temsa

Update portfolio comparison in README.md

996a8ce verified 5 days ago

preview code

raw

history blame contribute delete

26.7 kB

	---
	language:
	- en
	- ga
	license: apache-2.0
	library_name: transformers
	pipeline_tag: token-classification
	tags:
	- pii
	- de-identification
	- token-classification
	- ireland
	- irish
	- gaelic
	- diffusion-style
	- denoising
	- ppsn
	- eircode
	- onnx
	- int8
	- dynamic-quantization
	- cpu
	base_model:
	- OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1
	datasets:
	- temsa/OpenMed-Irish-CorePII-TrainMix-v1
	- temsa/OpenMed-Irish-PPSN-Eircode-Spec-v1
	- joelniklaus/mapa
	- gretelai/synthetic_pii_finance_multilingual
	model-index:
	- name: IrishCore-DiffMask-135M-v1-rc2
	results:
	- task:
	type: token-classification
	name: Irish core PII masking
	dataset:
	type: custom
	name: irish_core_pii_v1
	metrics:
	- type: f1
	name: Overall F1
	value: 0.9664
	- task:
	type: token-classification
	name: Multilingual PPSN masking
	dataset:
	type: custom
	name: multilingual_ppsn_v1_all
	metrics:
	- type: f1
	name: Overall F1
	value: 0.9212
	- task:
	type: token-classification
	name: Hardening exact suite
	dataset:
	type: custom
	name: irish_dllm_hardening_exact_v1
	metrics:
	- type: f1
	name: Overall F1
	value: 0.9744
	- task:
	type: token-classification
	name: UAT replay exact suite
	dataset:
	type: custom
	name: diffmask_gap_uat_exact_v1
	metrics:
	- type: f1
	name: Overall F1
	value: 0.8276
	---

	# IrishCore-DiffMask-135M-v1-rc2

	`IrishCore-DiffMask-135M-v1-rc2` is a raw-only Irish PII masking model derived from `OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1`.

	It is a small, scanner-free span extractor tuned for:

	- `PPSN`
	- `ACCOUNT_NUMBER`
	- `BANK_ROUTING_NUMBER`
	- `CREDIT_DEBIT_CARD`
	- `PASSPORT_NUMBER`
	- `POSTCODE`
	- `PHONE_NUMBER`
	- `EMAIL`
	- `FIRST_NAME`
	- `LAST_NAME`
	- `SWIFT_BIC`

	The main target is English plus Irish Gaelic text in citizen-support, public-sector, and HSE-style flows. The repo ships both the full `transformers` checkpoint and a dynamic q8 ONNX artifact for CPU deployment.

	## What "DiffMask" Means Here

	This release is not a generative diffusion language model. It is a compact discriminative token-span model trained with a diffusion-style denoising schedule.

	The short version:

	- Base OpenMed: plain BIO token classification
	- DiffMask: token-span extraction with token-presence and boundary heads
	- DiffMask training: repeated masked denoising over the same sentence
	- DiffMask inference: one forward pass, no iterative refinement, no text generation

	Concretely:

	- The encoder starts from the DistilBERT-family weights inside `OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1`.
	- The model adds three task heads over the encoder hidden states:
	- a per-label token-presence head
	- a typed start-boundary head
	- a typed end-boundary head
	- During training, each input sentence is corrupted multiple times by replacing a random fraction of visible tokens with `[MASK]`.
	- The corruption level follows a short noise schedule from heavy masking to light masking.
	- The same gold spans are learned at every noise level, and the losses are averaged across the denoising passes.
	- At inference time there is no diffusion loop and no rewrite step: the model runs once and a score-only span decoder reconstructs spans from token scores plus typed boundaries.

	So the "DLLM" aspect here is the training recipe: repeated masked denoising over text, not autoregressive generation.

	## What It Is Not

	This model is not a full discrete diffusion language model in the LLaDA sense.

	A true DLLM would usually have:

	- timestep or noise conditioning inside the model
	- iterative denoising at inference time
	- multi-step sequence refinement at runtime
	- text generation or full-sequence reconstruction as a first-class objective

	This release does not do that.

	Instead, it uses the diffusion idea only as a training-time robustness trick:

	- corrupt the sentence with `[MASK]` at several noise levels
	- train on the same target spans each time
	- average those losses

	At runtime, it behaves like a normal fast discriminative extractor.

	## Architecture

	- Encoder: DistilBERT-size encoder from the OpenMed mLiteClinical 135M base
	- Heads:
	- token presence per released label
	- typed start boundary per released label
	- typed end boundary per released label
	- Decoder:
	- score-only span decoding from offsets, token continuity, label-specific thresholds, and typed boundaries
	- no regex candidate extractor
	- no checksum validator
	- no scanner layer

	The release behavior is fully defined by the weights plus the bundled decoder in `common.py`.

	## Training And Inference Flow

	Training:

	1. tokenize a sentence with gold BIO spans
	2. convert spans into:
	- token-presence targets
	- typed start targets
	- typed end targets
	3. create several noised copies of the same tokenized sentence by masking random visible tokens
	4. run the same encoder+heads on each noised copy
	5. average the losses across those denoising passes

	Inference:

	1. tokenize the raw text once
	2. run a single forward pass
	3. predict:
	- which labels are present on each token
	- where each labeled span starts
	- where each labeled span ends
	4. decode spans with label-aware thresholds and boundary rules
	5. replace the detected spans with placeholders such as `[PII:PPSN]`

	There is no multi-step refinement loop in deployment.

	## How It Differs From The Original OpenMed Model

	The original `OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1` is a standard `DistilBertForTokenClassification` model:

	- one encoder
	- one token-classification head
	- BIO labels such as `B-email`, `I-email`, `B-phone_number`
	- generic token aggregation to recover spans

	DiffMask changes two things:

	1. Different supervision
	- base OpenMed learns only BIO token labels
	- DiffMask learns token presence plus typed span boundaries

	2. Different training recipe
	- base OpenMed is trained as a standard token classifier
	- DiffMask is trained on multiple masked-noised views of the same sentence

	That makes DiffMask better suited to structured Irish identifiers and mixed PII masking, while still keeping a small encoder and a fast CPU path.

	## How It Differs From `rc5` And `rc8`

	\| Model \| Core idea \| External scanner/validator \| Runtime shape \|
	\|---\|---\|---\|---\|
	\| `rc5` \| token classifier + repair logic \| yes \| heavier, decoder-assisted \|
	\| `rc8` \| raw-only token-span model \| no \| one pass + span decoder \|
	\| `DiffMask` \| raw-only token-span model + denoising training \| no \| one pass + span decoder \|

	So DiffMask is closest to `rc8` operationally, but it uses a stronger training recipe.

	## Why This Exists

	The older `rc5` release still depended on a repair-oriented decoder stack. The public `rc8` release removed that external logic, but it regressed on several structured Irish identifiers. This release keeps the raw-only deployment shape while re-hardening the model on Irish numeric and mixed-PII cases.

	The selected `rc2` checkpoint is an interpolation blend between the stronger broad-coverage DiffMask candidate and a cleaned v5 continuation trained after fixing label contamination in the training mix. The goal was to recover real UAT cases without giving back too much Irish-core coverage.

	## References

	Direct implementation references:

	- Devlin et al., BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
	https://arxiv.org/abs/1810.04805
	- Sanh et al., DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
	https://arxiv.org/abs/1910.01108
	- Fu et al., Boundary Smoothing for Named Entity Recognition
	https://aclanthology.org/2022.acl-long.490/
	- Wang et al., SPANNER: Named Entity Re-/Recognition as Span Prediction
	https://aclanthology.org/2021.acl-long.558/

	Conceptual diffusion-style training references:

	- Nie et al., LLaDA 2.0: Scaling Up Diffusion Language Models to 100B
	https://arxiv.org/abs/2512.15745
	- Gong et al., Scaling Diffusion Language Models via Adaptation from Autoregressive Models
	https://arxiv.org/abs/2410.17891

	These diffusion papers were used as architectural inspiration for the masked noising schedule. This release does not implement a generative text diffusion runtime.

	## Included Artifacts

	- Full `transformers` checkpoint in the repo root
	- Dynamic q8 ONNX export in `onnx/model_quantized.onnx`
	- Unquantized ONNX export in `onnx/model.onnx`
	- `inference_mask.py` for the full checkpoint
	- `inference_mask_onnx.py` for the ONNX q8 path
	- `common.py`, `model.py`, and `multitask_model.py` implementing the release decoder
	- benchmark files in `eval/`

	Artifact sizes:

	- Full checkpoint: `514 MB` (`model.safetensors`)
	- Dynamic q8 ONNX: `393 MB` (`onnx/model_quantized.onnx`)

	## How To Use It

	Full checkpoint:

	```bash
	uv run python inference_mask.py \
	--model temsa/IrishCore-DiffMask-135M-v1-rc2 \
	--min-score 0.5 \
	--text "My PPSN is 1234567TW, my Eircode is D02 X285, and my phone is 087 123 4567." \
	--json
	```

	Dynamic q8 ONNX:

	```bash
	uv run python inference_mask_onnx.py \
	--model temsa/IrishCore-DiffMask-135M-v1-rc2 \
	--min-score 0.5 \
	--text "Please provide your passport NN5123456 and call me on 0851234567." \
	--json
	```

	Both scripts emit explicit placeholders like `[PII:PPSN]` in `masked_text`.

	## Q8 Comparison

	Deployment-relevant comparison on CPU:

	\| Model \| Core F1 \| Edge F1 \| Finance F1 \| Finance-boundary F1 \| User PPSN F1 \| GA weak PPSN F1 \| Multilingual PPSN F1 \| Hardening F1 \|
	\|---\|---:\|---:\|---:\|---:\|---:\|---:\|---:\|---:\|
	\| `rc5` ONNX q8 \| 0.9669 \| 0.9744 \| 0.9362 \| 0.8750 \| 1.0000 \| 1.0000 \| 0.9333 \| - \|
	\| `rc8` ONNX q8 \| 0.9737 \| 1.0000 \| 1.0000 \| 1.0000 \| 1.0000 \| 1.0000 \| 0.9176 \| 0.7059 \|
	\| `IrishCore-DiffMask-135M-v1-rc2` ONNX q8 \| 0.9664 \| 1.0000 \| 1.0000 \| 1.0000 \| 1.0000 \| 1.0000 \| 0.9212 \| 0.9744 \|

	UAT replay exact suite used for the latest hardening pass:

	\| Model \| UAT replay exact F1 \| Precision \| Recall \|
	\|---\|---:\|---:\|---:\|
	\| `IrishCore-DiffMask-135M-v1-rc1` ONNX q8 \| 0.4545 \| 1.0000 \| 0.2941 \|
	\| `rc8` ONNX q8 \| 0.3636 \| 0.3750 \| 0.3529 \|
	\| `IrishCore-DiffMask-135M-v1-rc2` ONNX q8 \| 0.8276 \| 1.0000 \| 0.7059 \|

	CPU throughput references:

	\| Suite \| `rc5` q8 \| `rc8` q8 \| `IrishCore-DiffMask-135M-v1-rc2` q8 \|
	\|---\|---:\|---:\|---:\|
	\| Irish core short-text path \| 33.6193 ex/s \| 257.3756 ex/s \| 247.0809 ex/s \|
	\| Multilingual PPSN short-text path \| 35.5561 ex/s \| 230.5181 ex/s \| 256.1316 ex/s \|
	\| Runtime profile source \| 23.8338 ex/s \| 179.4708 ex/s \| 173.0852 ex/s \|

	Notes:

	- The `rc5` speed references come from its published q8 end-to-end inference stack, which includes its older repair decoder.
	- The `rc8` and `IrishCore-DiffMask-135M-v1-rc2` numbers use the same raw-only token-span ONNX path.
	- A weight-only q4 ONNX experiment was also tried during development, but it was slower than q8 on this CPU and is not shipped.

	## Additional Training Data Used For This RC

	Published training sources:

	- `temsa/OpenMed-Irish-CorePII-TrainMix-v1`
	- `temsa/OpenMed-Irish-PPSN-Eircode-Spec-v1`
	- `joelniklaus/mapa`
	- `gretelai/synthetic_pii_finance_multilingual`

	Additional local synthetic hardening and replay sets used during checkpoint selection:

	- `irish_core_diffmask_v5_mix`: cleaned blend after removing unlabeled PPSN+phone and hidden Eircode/phone contamination
	- `dllm_uat_replay_v1`: replay of real UAT-style citizen-support blocks
	- `dllm_gap_patch_v4`: targeted synthetic patch set for bare PPSN, spaced phones, Eircodes, and mixed messages
	- `irish_core_diffmask_focus_v1` and `dllm_uat_patch_v2`: explored during later continuation runs but not selected as the published checkpoint

	## Limits

	- This is still a compact model. The hardest remaining errors are multilingual PPSN near-miss cases rather than Irish core numeric formats.
	- The release path is intentionally scanner-free. If you need deterministic validation of individual identifier types, add that in your application layer.
	- If you rely on release behavior, use the bundled inference scripts or import `decode_token_presence_segments` from `common.py`.
	- Known remaining misses from the current UAT replay suite include a second phone number inside a long support sentence (`071 967 2616`), `R93 EC57` inside a longer centre block, `EPStamp4@enterprise.gov.ie`, and one `D02 XY45` address form.

	## License And Attribution

	- Release license: Apache-2.0
	- Base model: `OpenMed/OpenMed-PII-mLiteClinical-Base-135M-v1`
	- The derivative release remains subject to the attribution terms of the upstream datasets listed above.
	- See `NOTICE`, `training_sources.json`, and `eval/benchmark_summary.json` for provenance and benchmark details.

	<!-- portfolio-comparison:start -->
	## Portfolio Comparison

	Updated: `2026-03-16`.

	Use this section for the fastest public comparison across the `temsa` PII masking portfolio.

	- The first core table only includes public checkpoints that ship both comparable q8 accuracy and q8 CPU throughput.
	- The first PPSN table only includes public artifacts that ship comparable PPSN accuracy and CPU throughput.
	- Missing cells in the archive tables mean the older release did not ship that metric in its public bundle.
	- DiffMask rows use the reconciled `clean_single_pass` harness that matches the deployed runtime.
	- GlobalPointer rows use the public raw-only span-matrix release bundle and its packaged q8 ONNX artifact.
	- The same content is shipped as `PORTFOLIO_COMPARISON.md` inside each public model repo.

	### Irish Core PII: Comparable Public Checkpoints

	\| Repo \| Stack \| Full Core F1 \| Q8 Core F1 \| Q8 Multilingual PPSN F1 \| Q8 Core ex/s \|
	\|---\|---\|---:\|---:\|---:\|---:\|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc4`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc4) \| 4-layer GlobalPointer distilled fast student \| 1.0000 \| 1.0000 \| 0.9333 \| 299.0 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc3`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc3) \| 4-layer GlobalPointer distilled fast student \| 1.0000 \| 1.0000 \| 0.9333 \| 317.9 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc2`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc2) \| 4-layer GlobalPointer distilled fast student \| 1.0000 \| 1.0000 \| 0.9333 \| 292.5 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc1`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-4L-122M-v1-rc1) \| 4-layer GlobalPointer distilled fast student \| 1.0000 \| 1.0000 \| 0.9333 \| 337.3 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc27`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc27) \| GlobalPointer raw-only + context labels \| 1.0000 \| 1.0000 \| 0.9333 \| 270.0 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc25`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc25) \| GlobalPointer raw-only + context labels \| 1.0000 \| 1.0000 \| 0.9333 \| 212.1 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc24`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc24) \| GlobalPointer raw-only + context labels \| 1.0000 \| 1.0000 \| 0.9333 \| 278.9 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc23`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc23) \| GlobalPointer raw-only + context labels \| 1.0000 \| 1.0000 \| 0.9333 \| 237.6 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc22`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc22) \| GlobalPointer raw-only + context labels \| 1.0000 \| 1.0000 \| 0.9333 \| 106.8 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc21`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc21) \| GlobalPointer raw-only + context labels \| 1.0000 \| 1.0000 \| 0.9333 \| 150.8 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc20`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc20) \| GlobalPointer raw-only + context labels \| 1.0000 \| 1.0000 \| 0.9333 \| 181.9 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc19`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc19) \| GlobalPointer raw-only + context labels \| 1.0000 \| 1.0000 \| 0.9333 \| 73.1 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc18`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc18) \| GlobalPointer raw-only + context labels \| 1.0000 \| 1.0000 \| 0.9333 \| 126.2 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc17`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc17) \| GlobalPointer raw-only + context labels \| 1.0000 \| 1.0000 \| 0.9333 \| 125.5 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc16`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc16) \| GlobalPointer raw-only + context labels \| 1.0000 \| 1.0000 \| 0.9333 \| 125.5 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc15`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc15) \| GlobalPointer raw-only + context labels \| 1.0000 \| 1.0000 \| 0.9333 \| 125.5 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc14`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc14) \| GlobalPointer raw-only + context labels \| 1.0000 \| 1.0000 \| 0.9333 \| 119.2 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc13`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc13) \| GlobalPointer raw-only + context labels \| 1.0000 \| 1.0000 \| 0.9333 \| 126.1 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc12`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc12) \| GlobalPointer raw-only + context labels \| 1.0000 \| 1.0000 \| 0.9333 \| 73.6 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc11`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc11) \| GlobalPointer raw-only + context labels \| 1.0000 \| 1.0000 \| 0.9333 \| 94.1 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc10`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc10) \| GlobalPointer raw-only + context labels \| 1.0000 \| 1.0000 \| 0.9333 \| 125.8 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc9`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc9) \| GlobalPointer raw-only + context labels \| 1.0000 \| 1.0000 \| 0.9333 \| 119.8 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc8`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc8) \| GlobalPointer raw-only + context labels \| 1.0000 \| 1.0000 \| 0.9333 \| 128.9 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc7`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc7) \| GlobalPointer raw-only + context labels \| 1.0000 \| 1.0000 \| 0.9333 \| 89.0 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc6`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc6) \| GlobalPointer raw-only + context labels \| 1.0000 \| 1.0000 \| 0.9333 \| 89.0 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc5`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc5) \| GlobalPointer raw-only + context labels \| 1.0000 \| 1.0000 \| 0.9333 \| 84.5 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc4`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc4) \| GlobalPointer raw-only + context labels \| 0.9935 \| 0.9935 \| 0.9333 \| 61.5 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc3`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc3) \| GlobalPointer raw-only + context labels \| 0.9935 \| 0.9935 \| 0.9333 \| 61.5 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc2`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc2) \| GlobalPointer raw-only + context labels \| 0.9935 \| 0.9935 \| 0.9222 \| 61.5 \|
	\| [`temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc1`](https://huggingface.co/temsa/IrishCore-GlobalPointer-ContextPII-135M-v1-rc1) \| GlobalPointer raw-only + context labels \| 0.9935 \| 0.9935 \| 0.9222 \| 61.5 \|
	\| [`temsa/IrishCore-GlobalPointer-135M-v1-rc4`](https://huggingface.co/temsa/IrishCore-GlobalPointer-135M-v1-rc4) \| GlobalPointer raw-only span-matrix \| 1.0000 \| 1.0000 \| 0.9333 \| 221.6 \|
	\| [`temsa/IrishCore-GlobalPointer-135M-v1-rc3`](https://huggingface.co/temsa/IrishCore-GlobalPointer-135M-v1-rc3) \| GlobalPointer raw-only span-matrix \| 1.0000 \| 1.0000 \| 0.9213 \| 204.9 \|
	\| [`temsa/IrishCore-GlobalPointer-135M-v1-rc2`](https://huggingface.co/temsa/IrishCore-GlobalPointer-135M-v1-rc2) \| GlobalPointer raw-only span-matrix \| 0.9934 \| 0.9934 \| 0.9326 \| 231.2 \|
	\| [`temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc8`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc8) \| Raw-only token-span \| 0.9737 \| 0.9737 \| 0.9176 \| 46.1 \|
	\| [`temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc7`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc7) \| Hybrid classifier + generated scanner spec \| 1.0000 \| 0.9934 \| 1.0000 \| 30.0 \|
	\| [`temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc6`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc6) \| Hybrid classifier + repair decoders \| 1.0000 \| 0.9934 \| 1.0000 \| 29.5 \|
	\| [`temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc5`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc5) \| Hybrid classifier + repair decoders \| 0.9737 \| 0.9669 \| 0.9333 \| 34.4 \|
	\| [`temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc4`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc4) \| Hybrid classifier + repair decoders \| 0.9870 \| 0.9740 \| 0.9600 \| 114.2 \|
	\| [`temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc3`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc3) \| Hybrid classifier + repair decoders \| 0.9806 \| 0.9677 \| 0.9333 \| 44.9 \|
	\| [`temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc2`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc2) \| Hybrid classifier + repair decoders \| 0.9554 \| 0.9615 \| 0.7887 \| 119.1 \|
	\| [`temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v1) \| Hybrid classifier baseline \| 0.9530 \| 0.9333 \| 0.9882 \| 103.3 \|
	\| [`temsa/IrishCore-DiffMask-135M-v1-rc6`](https://huggingface.co/temsa/IrishCore-DiffMask-135M-v1-rc6) \| DiffMask token-span, scanner-free \| 0.9801 \| 0.9733 \| 0.9274 \| 130.3 \|
	\| [`temsa/IrishCore-DiffMask-135M-v1-rc5`](https://huggingface.co/temsa/IrishCore-DiffMask-135M-v1-rc5) \| DiffMask token-span, scanner-free \| 0.9733 \| 0.9733 \| 0.9379 \| 249.2 \|
	\| [`temsa/IrishCore-DiffMask-135M-v1-rc4`](https://huggingface.co/temsa/IrishCore-DiffMask-135M-v1-rc4) \| DiffMask token-span, scanner-free \| 0.9733 \| 0.9733 \| 0.9371 \| 29.5 \|
	\| [`temsa/IrishCore-DiffMask-135M-v1-rc3`](https://huggingface.co/temsa/IrishCore-DiffMask-135M-v1-rc3) \| DiffMask token-span, scanner-free \| 0.9664 \| 0.9664 \| 0.9591 \| 30.0 \|
	\| [`temsa/IrishCore-DiffMask-135M-v1-rc2`](https://huggingface.co/temsa/IrishCore-DiffMask-135M-v1-rc2) \| DiffMask token-span, scanner-free \| 0.9664 \| 0.9664 \| 0.9212 \| 247.1 \|
	\| [`temsa/IrishCore-DiffMask-135M-v1-rc1`](https://huggingface.co/temsa/IrishCore-DiffMask-135M-v1-rc1) \| DiffMask token-span, scanner-free \| 0.9801 \| 0.9934 \| 0.9412 \| 251.2 \|

	### Irish Core PII: Other Public Checkpoints

	\| Repo \| Stack \| Full Core F1 \| Q8 Core F1 \| Q8 Multilingual PPSN F1 \| Notes \|
	\|---\|---\|---:\|---:\|---:\|---\|
	\| [`temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc1`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc1) \| Hybrid classifier prototype \| 0.9487 \| — \| — \| Predates the public q8 artifact. \|

	Finance-boundary q8 F1 is `1.0000` for `OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc6`, `OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc7`, `OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc8`, and all public `IrishCore-DiffMask` releases from `rc1` to `rc6`. `OpenMed-mLiteClinical-IrishCorePII-135M-v2-rc5` ships `0.8750` on that public q8 suite.

	### PPSN-Only: Comparable Public Artifacts

	\| Repo \| Artifact \| Irish Large F1 \| Multilingual PPSN F1 \| User Raw F1 \| QA v8 F1 \| CPU ex/s \|
	\|---\|---\|---:\|---:\|---:\|---:\|---:\|
	\| [`temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1) \| fp32 canonical checkpoint \| 0.8979 \| 0.9704 \| 0.8000 \| 0.7385 \| 57.4 \|
	\| [`temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1-fp16`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1-fp16) \| fp16 CPU/GPU artifact \| — \| 0.9704 \| 0.8000 \| 0.7385 \| 45.8 \|
	\| [`temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1-q8`](https://huggingface.co/temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1-q8) \| dynamic int8 CPU artifact \| — \| 0.9040 \| — \| — \| 132.1 \|

	### PPSN-Only: Historical Public Checkpoints

	\| Repo \| Main Published Metrics \| Notes \|
	\|---\|---\|---\|
	\| [`temsa/OpenMed-PPSN-mLiteClinical-v1`](https://huggingface.co/temsa/OpenMed-PPSN-mLiteClinical-v1) \| same as canonical fp32 repo: multilingual 0.9704, user raw 0.8000 \| Legacy alias; prefer `temsa/OpenMed-mLiteClinical-IrishPPSN-135M-v1`. \|
	\| [`temsa/OpenMed-PPSN-v6-raw-rc2`](https://huggingface.co/temsa/OpenMed-PPSN-v6-raw-rc2) \| irish_reg_v5 0.8750; user_raw 0.8000; qa_v8 0.7385 \| Raw PPSN-only research checkpoint; no packaged multilingual CPU benchmark row. \|
	\| [`temsa/OpenMed-PPSN-v5_1`](https://huggingface.co/temsa/OpenMed-PPSN-v5_1) \| irish_large_v2 raw 0.9285; qa_v6 hybrid strict 1.0000 \| Hybrid PPSN-only checkpoint; predates the canonical multilingual suite packaging. \|
	\| [`temsa/OpenMed-PPSN-v5`](https://huggingface.co/temsa/OpenMed-PPSN-v5) \| irish_reg_v5 raw 0.8235; irish_reg_v5 hybrid strict 1.0000 \| Hybrid PPSN-only checkpoint; predates the canonical multilingual suite packaging. \|
	\| [`temsa/OpenMed-PPSN-v4`](https://huggingface.co/temsa/OpenMed-PPSN-v4) \| synthetic non-PPSN drift check only \| Predates the current PPSN eval suite; no packaged apples-to-apples multilingual CPU row. \|

	If you need the strongest current raw-only Irish core model, start with `IrishCore-GlobalPointer-135M-v1-rc4`. If you need the fastest CPU-first raw-only line, compare it against `IrishCore-DiffMask-135M-v1-rc6`. If you need a PPSN-only artifact, compare the canonical `fp32`, `fp16`, and `q8` variants of `OpenMed-mLiteClinical-IrishPPSN-135M-v1` directly in the table above.
	<!-- portfolio-comparison:end -->