MERIT-XS-Preview / README.md
SequoiaDev's picture
Upload folder using huggingface_hub
ef24de2 verified
---
language:
- en
tags:
- text-classification
- moderation
- safety
- meridian
pipeline_tag: text-classification
library_name: pytorch
---
# MERIT-XS (Research Preview)
MERIT-XS is an early multilingual moderation encoder developed by Meridian Safety for research into compact, Unicode-aware safety classification systems.
This release packages a **binary toxicity research preview** built from:
- `MERIT-XS` encoder pretraining
- a top-2-layer moderation adaptation run
- a binary moderation head
This artifact is **not production-ready** and should not be used as a standalone safety system.
## Included files
- `merit_xs_preview.pt`
- exported moderation artifact with adapted encoder weights and binary head
- `infer_merit_xs.py`
- CLI inference entrypoint
- `load_merit_xs.py`
- simple Python loader for local use
- `metrics_summary.json`
- dev/test metrics and threshold sweep summary
- `profile_summary.json`
- lightweight export-run timing summary
- `merit/`
- local model package
- `assets/tokenizers/merit/`
- tokenizer files
## Setup
```bash
pip install -r requirements.txt
```
## License
This package uses the included [LICENSE.txt](C:/Coding/Meridian/MERIT/final_output/MERIT-XS-Preview/LICENSE.txt):
- `MERIT Research Preview License (MRPL v1.0)`
- research, evaluation, and benchmarking use are allowed
- commercial deployment and hosted/public API use require separate permission
## CLI usage
```bash
python infer_merit_xs.py \
--text "you are awful" \
--text "thanks for your help"
```
You can also pass an explicit checkpoint path:
```bash
python infer_merit_xs.py \
--checkpoint merit_xs_preview.pt \
--text "you are a stupid idiot"
```
## Python usage
```python
from load_merit_xs import load_merit_xs
model = load_merit_xs()
results = model.predict(
[
"you are awful",
"thanks for your help",
"you are a stupid idiot",
]
)
print(results)
```
## Output schema
Each prediction returns:
- `score`
- `sigmoid(logit)`
- `decision`
- `allow | review | action`
- `confidence`
- threshold-distance heuristic only
- `decision_band`
- same band label used for the decision
Important: `confidence` here is a preview-time decision-margin style heuristic, **not calibrated probability confidence**.
## Current limitations
- Binary toxicity preview only
- Not a full moderation taxonomy
- Weak coverage for some safety categories, including self-harm / threat-style language
- Message-level only
- Incomplete multilingual and adversarial evaluation
## Research note
This package is intended for:
- research
- benchmarking
- representation-transfer experiments
- moderation evaluation
It is not intended for:
- production moderation
- safety-critical enforcement
- fully automated policy decisions
## Existing model card
This package is being prepared for the existing Hugging Face repo:
[MeridianSafety/MERIT-XS-Preview](https://huggingface.co/MeridianSafety/MERIT-XS-Preview)