metadata
language:
- en
tags:
- text-classification
- moderation
- safety
- meridian
pipeline_tag: text-classification
library_name: pytorch
MERIT-XS (Research Preview)
MERIT-XS is an early multilingual moderation encoder developed by Meridian Safety for research into compact, Unicode-aware safety classification systems.
This release packages a binary toxicity research preview built from:
MERIT-XSencoder pretraining- a top-2-layer moderation adaptation run
- a binary moderation head
This artifact is not production-ready and should not be used as a standalone safety system.
Included files
merit_xs_preview.pt- exported moderation artifact with adapted encoder weights and binary head
infer_merit_xs.py- CLI inference entrypoint
load_merit_xs.py- simple Python loader for local use
metrics_summary.json- dev/test metrics and threshold sweep summary
profile_summary.json- lightweight export-run timing summary
merit/- local model package
assets/tokenizers/merit/- tokenizer files
Setup
pip install -r requirements.txt
License
This package uses the included LICENSE.txt:
MERIT Research Preview License (MRPL v1.0)- research, evaluation, and benchmarking use are allowed
- commercial deployment and hosted/public API use require separate permission
CLI usage
python infer_merit_xs.py \
--text "you are awful" \
--text "thanks for your help"
You can also pass an explicit checkpoint path:
python infer_merit_xs.py \
--checkpoint merit_xs_preview.pt \
--text "you are a stupid idiot"
Python usage
from load_merit_xs import load_merit_xs
model = load_merit_xs()
results = model.predict(
[
"you are awful",
"thanks for your help",
"you are a stupid idiot",
]
)
print(results)
Output schema
Each prediction returns:
scoresigmoid(logit)
decisionallow | review | action
confidence- threshold-distance heuristic only
decision_band- same band label used for the decision
Important: confidence here is a preview-time decision-margin style heuristic, not calibrated probability confidence.
Current limitations
- Binary toxicity preview only
- Not a full moderation taxonomy
- Weak coverage for some safety categories, including self-harm / threat-style language
- Message-level only
- Incomplete multilingual and adversarial evaluation
Research note
This package is intended for:
- research
- benchmarking
- representation-transfer experiments
- moderation evaluation
It is not intended for:
- production moderation
- safety-critical enforcement
- fully automated policy decisions
Existing model card
This package is being prepared for the existing Hugging Face repo: