Upload folder using huggingface_hub

ef24de2 verified 4 days ago

2.99 kB

language:
  - en
tags:
  - text-classification
  - moderation
  - safety
  - meridian
pipeline_tag: text-classification
library_name: pytorch

MERIT-XS (Research Preview)

MERIT-XS is an early multilingual moderation encoder developed by Meridian Safety for research into compact, Unicode-aware safety classification systems.

This release packages a binary toxicity research preview built from:

MERIT-XS encoder pretraining
a top-2-layer moderation adaptation run
a binary moderation head

This artifact is not production-ready and should not be used as a standalone safety system.

Included files

merit_xs_preview.pt
- exported moderation artifact with adapted encoder weights and binary head
infer_merit_xs.py
- CLI inference entrypoint
load_merit_xs.py
- simple Python loader for local use
metrics_summary.json
- dev/test metrics and threshold sweep summary
profile_summary.json
- lightweight export-run timing summary
merit/
- local model package
assets/tokenizers/merit/
- tokenizer files

Setup

pip install -r requirements.txt

License

This package uses the included LICENSE.txt:

MERIT Research Preview License (MRPL v1.0)
research, evaluation, and benchmarking use are allowed
commercial deployment and hosted/public API use require separate permission

CLI usage

python infer_merit_xs.py \
  --text "you are awful" \
  --text "thanks for your help"

You can also pass an explicit checkpoint path:

python infer_merit_xs.py \
  --checkpoint merit_xs_preview.pt \
  --text "you are a stupid idiot"

Python usage

from load_merit_xs import load_merit_xs

model = load_merit_xs()
results = model.predict(
    [
        "you are awful",
        "thanks for your help",
        "you are a stupid idiot",
    ]
)
print(results)

Output schema

Each prediction returns:

score
- sigmoid(logit)
decision
- allow | review | action
confidence
- threshold-distance heuristic only
decision_band
- same band label used for the decision

Important: confidence here is a preview-time decision-margin style heuristic, not calibrated probability confidence.

Current limitations

Binary toxicity preview only
Not a full moderation taxonomy
Weak coverage for some safety categories, including self-harm / threat-style language
Message-level only
Incomplete multilingual and adversarial evaluation

Research note

This package is intended for:

research
benchmarking
representation-transfer experiments
moderation evaluation

It is not intended for:

production moderation
safety-critical enforcement
fully automated policy decisions

Existing model card

This package is being prepared for the existing Hugging Face repo:

MeridianSafety/MERIT-XS-Preview