MERIT-XS-Preview / README.md
SequoiaDev's picture
Upload folder using huggingface_hub
ef24de2 verified
metadata
language:
  - en
tags:
  - text-classification
  - moderation
  - safety
  - meridian
pipeline_tag: text-classification
library_name: pytorch

MERIT-XS (Research Preview)

MERIT-XS is an early multilingual moderation encoder developed by Meridian Safety for research into compact, Unicode-aware safety classification systems.

This release packages a binary toxicity research preview built from:

  • MERIT-XS encoder pretraining
  • a top-2-layer moderation adaptation run
  • a binary moderation head

This artifact is not production-ready and should not be used as a standalone safety system.

Included files

  • merit_xs_preview.pt
    • exported moderation artifact with adapted encoder weights and binary head
  • infer_merit_xs.py
    • CLI inference entrypoint
  • load_merit_xs.py
    • simple Python loader for local use
  • metrics_summary.json
    • dev/test metrics and threshold sweep summary
  • profile_summary.json
    • lightweight export-run timing summary
  • merit/
    • local model package
  • assets/tokenizers/merit/
    • tokenizer files

Setup

pip install -r requirements.txt

License

This package uses the included LICENSE.txt:

  • MERIT Research Preview License (MRPL v1.0)
  • research, evaluation, and benchmarking use are allowed
  • commercial deployment and hosted/public API use require separate permission

CLI usage

python infer_merit_xs.py \
  --text "you are awful" \
  --text "thanks for your help"

You can also pass an explicit checkpoint path:

python infer_merit_xs.py \
  --checkpoint merit_xs_preview.pt \
  --text "you are a stupid idiot"

Python usage

from load_merit_xs import load_merit_xs

model = load_merit_xs()
results = model.predict(
    [
        "you are awful",
        "thanks for your help",
        "you are a stupid idiot",
    ]
)
print(results)

Output schema

Each prediction returns:

  • score
    • sigmoid(logit)
  • decision
    • allow | review | action
  • confidence
    • threshold-distance heuristic only
  • decision_band
    • same band label used for the decision

Important: confidence here is a preview-time decision-margin style heuristic, not calibrated probability confidence.

Current limitations

  • Binary toxicity preview only
  • Not a full moderation taxonomy
  • Weak coverage for some safety categories, including self-harm / threat-style language
  • Message-level only
  • Incomplete multilingual and adversarial evaluation

Research note

This package is intended for:

  • research
  • benchmarking
  • representation-transfer experiments
  • moderation evaluation

It is not intended for:

  • production moderation
  • safety-critical enforcement
  • fully automated policy decisions

Existing model card

This package is being prepared for the existing Hugging Face repo:

MeridianSafety/MERIT-XS-Preview