| --- |
| language: |
| - en |
| tags: |
| - text-classification |
| - moderation |
| - safety |
| - meridian |
| pipeline_tag: text-classification |
| library_name: pytorch |
| --- |
| |
| # MERIT-XS (Research Preview) |
|
|
| MERIT-XS is an early multilingual moderation encoder developed by Meridian Safety for research into compact, Unicode-aware safety classification systems. |
|
|
| This release packages a **binary toxicity research preview** built from: |
|
|
| - `MERIT-XS` encoder pretraining |
| - a top-2-layer moderation adaptation run |
| - a binary moderation head |
|
|
| This artifact is **not production-ready** and should not be used as a standalone safety system. |
|
|
| ## Included files |
|
|
| - `merit_xs_preview.pt` |
| - exported moderation artifact with adapted encoder weights and binary head |
| - `infer_merit_xs.py` |
| - CLI inference entrypoint |
| - `load_merit_xs.py` |
| - simple Python loader for local use |
| - `metrics_summary.json` |
| - dev/test metrics and threshold sweep summary |
| - `profile_summary.json` |
| - lightweight export-run timing summary |
| - `merit/` |
| - local model package |
| - `assets/tokenizers/merit/` |
| - tokenizer files |
|
|
| ## Setup |
|
|
| ```bash |
| pip install -r requirements.txt |
| ``` |
|
|
| ## License |
|
|
| This package uses the included [LICENSE.txt](C:/Coding/Meridian/MERIT/final_output/MERIT-XS-Preview/LICENSE.txt): |
|
|
| - `MERIT Research Preview License (MRPL v1.0)` |
| - research, evaluation, and benchmarking use are allowed |
| - commercial deployment and hosted/public API use require separate permission |
|
|
| ## CLI usage |
|
|
| ```bash |
| python infer_merit_xs.py \ |
| --text "you are awful" \ |
| --text "thanks for your help" |
| ``` |
|
|
| You can also pass an explicit checkpoint path: |
|
|
| ```bash |
| python infer_merit_xs.py \ |
| --checkpoint merit_xs_preview.pt \ |
| --text "you are a stupid idiot" |
| ``` |
|
|
| ## Python usage |
|
|
| ```python |
| from load_merit_xs import load_merit_xs |
| |
| model = load_merit_xs() |
| results = model.predict( |
| [ |
| "you are awful", |
| "thanks for your help", |
| "you are a stupid idiot", |
| ] |
| ) |
| print(results) |
| ``` |
|
|
| ## Output schema |
|
|
| Each prediction returns: |
|
|
| - `score` |
| - `sigmoid(logit)` |
| - `decision` |
| - `allow | review | action` |
| - `confidence` |
| - threshold-distance heuristic only |
| - `decision_band` |
| - same band label used for the decision |
|
|
| Important: `confidence` here is a preview-time decision-margin style heuristic, **not calibrated probability confidence**. |
|
|
| ## Current limitations |
|
|
| - Binary toxicity preview only |
| - Not a full moderation taxonomy |
| - Weak coverage for some safety categories, including self-harm / threat-style language |
| - Message-level only |
| - Incomplete multilingual and adversarial evaluation |
|
|
| ## Research note |
|
|
| This package is intended for: |
|
|
| - research |
| - benchmarking |
| - representation-transfer experiments |
| - moderation evaluation |
|
|
| It is not intended for: |
|
|
| - production moderation |
| - safety-critical enforcement |
| - fully automated policy decisions |
|
|
| ## Existing model card |
|
|
| This package is being prepared for the existing Hugging Face repo: |
|
|
| [MeridianSafety/MERIT-XS-Preview](https://huggingface.co/MeridianSafety/MERIT-XS-Preview) |
|
|