MeridianSafety
/

MERIT-XS-Preview

Text Classification

Model card Files Files and versions

MERIT-XS-Preview / README.md

SequoiaDev's picture

Upload folder using huggingface_hub

ef24de2 verified 6 days ago

|

history blame contribute delete

2.99 kB

	---
	language:
	- en
	tags:
	- text-classification
	- moderation
	- safety
	- meridian
	pipeline_tag: text-classification
	library_name: pytorch
	---

	# MERIT-XS (Research Preview)

	MERIT-XS is an early multilingual moderation encoder developed by Meridian Safety for research into compact, Unicode-aware safety classification systems.

	This release packages a binary toxicity research preview built from:

	- `MERIT-XS` encoder pretraining
	- a top-2-layer moderation adaptation run
	- a binary moderation head

	This artifact is not production-ready and should not be used as a standalone safety system.

	## Included files

	- `merit_xs_preview.pt`
	- exported moderation artifact with adapted encoder weights and binary head
	- `infer_merit_xs.py`
	- CLI inference entrypoint
	- `load_merit_xs.py`
	- simple Python loader for local use
	- `metrics_summary.json`
	- dev/test metrics and threshold sweep summary
	- `profile_summary.json`
	- lightweight export-run timing summary
	- `merit/`
	- local model package
	- `assets/tokenizers/merit/`
	- tokenizer files

	## Setup

	```bash
	pip install -r requirements.txt
	```

	## License

	This package uses the included [LICENSE.txt](C:/Coding/Meridian/MERIT/final_output/MERIT-XS-Preview/LICENSE.txt):

	- `MERIT Research Preview License (MRPL v1.0)`
	- research, evaluation, and benchmarking use are allowed
	- commercial deployment and hosted/public API use require separate permission

	## CLI usage

	```bash
	python infer_merit_xs.py \
	--text "you are awful" \
	--text "thanks for your help"
	```

	You can also pass an explicit checkpoint path:

	```bash
	python infer_merit_xs.py \
	--checkpoint merit_xs_preview.pt \
	--text "you are a stupid idiot"
	```

	## Python usage

	```python
	from load_merit_xs import load_merit_xs

	model = load_merit_xs()
	results = model.predict(
	[
	"you are awful",
	"thanks for your help",
	"you are a stupid idiot",
	]
	)
	print(results)
	```

	## Output schema

	Each prediction returns:

	- `score`
	- `sigmoid(logit)`
	- `decision`
	- `allow \| review \| action`
	- `confidence`
	- threshold-distance heuristic only
	- `decision_band`
	- same band label used for the decision

	Important: `confidence` here is a preview-time decision-margin style heuristic, not calibrated probability confidence.

	## Current limitations

	- Binary toxicity preview only
	- Not a full moderation taxonomy
	- Weak coverage for some safety categories, including self-harm / threat-style language
	- Message-level only
	- Incomplete multilingual and adversarial evaluation

	## Research note

	This package is intended for:

	- research
	- benchmarking
	- representation-transfer experiments
	- moderation evaluation

	It is not intended for:

	- production moderation
	- safety-critical enforcement
	- fully automated policy decisions

	## Existing model card

	This package is being prepared for the existing Hugging Face repo:

	[MeridianSafety/MERIT-XS-Preview](https://huggingface.co/MeridianSafety/MERIT-XS-Preview)