Instructions to use swordhealth/MindGuard-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use swordhealth/MindGuard-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="swordhealth/MindGuard-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("swordhealth/MindGuard-8B")
model = AutoModelForCausalLM.from_pretrained("swordhealth/MindGuard-8B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use swordhealth/MindGuard-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "swordhealth/MindGuard-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "swordhealth/MindGuard-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/swordhealth/MindGuard-8B

SGLang

How to use swordhealth/MindGuard-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "swordhealth/MindGuard-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "swordhealth/MindGuard-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "swordhealth/MindGuard-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "swordhealth/MindGuard-8B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use swordhealth/MindGuard-8B with Docker Model Runner:
```
docker model run hf.co/swordhealth/MindGuard-8B
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

MindGuard-8B: Clinically Grounded Safety Classifier for Mental Health AI

MindGuard-8B is a lightweight safety classifier specifically designed for mental health conversations. Developed by Sword Health in collaboration with PhD-level licensed clinical psychologists, this model introduces a clinically grounded risk taxonomy that distinguishes actionable harm from non-crisis therapeutic content.

Model Overview

MindGuard-8B achieves 0.982 AUROC while providing 2-26× reduction in false positives compared to general-purpose safeguards at high-recall operating points. Even as a 4B-parameter model, it outperforms baseline models that are 30× larger, making it suitable for real-time deployment in mental health applications.

Key Performance Metrics

Model Size: 8 billion parameters
Inference Speed: Optimized for real-time classification
Clinical Validation: Developed with licensed psychologists

Model Details

Model Description

MindGuard-8B is a transformer-based safety classifier fine-tuned specifically for mental health conversation safety assessment. Unlike general-purpose content moderation systems, it's designed to understand the nuanced difference between therapeutic disclosure and genuine clinical crisis.

Developed by: Sword Health
Language: English
License: CC-BY-NC-SA-4.0
Specialized for: Mental health conversation safety assessment

Clinical Risk Taxonomy

The model classifies content into three clinically grounded categories:

✓ Safe

Common therapeutic topics
Depression/anxiety symptoms without suicidal ideation
Metaphorical expressions of distress
Historical or third-person references to trauma

⚠ Self-Harm Risk

Explicit or implicit suicidal ideation
Self-directed harm indicators
Content requiring clinical risk assessment

⚠ Harm to Others Risk

Threats toward identifiable individuals
Violent ideation
Abuse/neglect of protected populations

Usage Examples

Basic Classification

from transformers import AutoTokenizer

from vllm import LLM, SamplingParams

# Load model and tokenizer
model_name = "swordhealth/MindGuard-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Example conversation context
# Note that the model is trained for multi-turn and with short contexts or single prompts its performance is worse.
conversation = [
    {"role": "assistant", "content": "Hey, I'm glad you reached out. How are you doing today?"},
    {"role": "user", "content": "Stressed. Taking care of my mom with dementia is hell. She screams, hits me, calls me thief — drives me insane."},
    {"role": "assistant", "content": "I'm so sorry you're carrying this weight. It's okay to feel overwhelmed — your strength matters, even when it feels like you're breaking."},
    {"role": "user", "content": "I'm not breaking. She pushes me, so I push her back. If she didn't act like a lunatic, I wouldn't have to huddle her the way I do."}
]

sampling_params = SamplingParams(best_of=1, temperature=0, max_tokens=15)

mindguard = LLM(model="swordhealth/MindGuard-8B", tensor_parallel_size=1, max_model_len=4096)
# Prepare input for the model
input_ids = tokenizer.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True)
outputs = mindguard.generate([input_ids], sampling_params)
print (outputs[0].outputs[0].text)
> Safety: Unsafe
> Categories: S2

Performance Comparison

On the MindGuard-testset benchmark:

Model	AUROC	FPR@90%TPR	Parameters
MindGuard	0.982	0.031	8B
gpt-oss-safeguard	0.960	0.084	120B
Llama Guard 3	0.970	0.066	8B

Limitations and Bias

Known Limitations

Multi-turn: The model was trained for multi-turn conversations and not for single prompt. Its not meant to be used with single messages or with short context (1-2 turns).
English-only: Currently trained and validated only on English conversations
Cultural considerations: Training data may not fully represent all cultural expressions of distress
Real-time constraints: Performance may vary with very long conversation contexts
Potential for Errors: Like all AI systems, this model may produce false positives, false negatives, or other classification errors

Training Details

Training Data

The model was trained using:

Synthetic mental health conversations generated with clinical supervision
Expert annotations from licensed clinical psychologists
Diverse risk scenarios and therapeutic contexts
Balanced representation of safe and unsafe content

Evaluation Data

Evaluated on MindGuard-testset: 1,134 expert-annotated turns from 67 multi-turn conversations, annotated by licensed clinical psychologists with 94.4% agreement.

License

This dataset is released under the CC-BY-NC-SA-4.0 license.

Important Disclaimer

⚠️ RESEARCH USE ONLY - NO COMMERCIAL APPLICATION PERMITTED ⚠️

This model is provided under the CC-BY-NC-SA-4.0 license for research purposes only. By using this model, you acknowledge and agree to the following terms:

License Restrictions

No Commercial Use: This model is explicitly prohibited from use in any commercial application, product, or service
Non-Commercial Research Only: Permitted uses are limited to academic research, educational purposes, and non-commercial mental health research
Attribution Required: Any use must provide appropriate attribution as specified in the CC-BY-NC-SA-4.0 license

Clinical Limitations and Liability

Research Tool Only: This classifier is intended solely for research purposes in mental health AI safety
Human Oversight Required: Any application must maintain appropriate human clinical oversight and intervention protocols
Potential for Errors: Like all AI systems, this model may produce false positives, false negatives, or other classification errors

Disclaimer of Responsibility

Sword Health disclaims all responsibility and liability for:

Any clinical decisions made based on model outputs
Any harm resulting from model misclassification or errors
Inappropriate use of the model in commercial or clinical settings
Failure to maintain appropriate human oversight and intervention protocols

By using this model, you assume full responsibility for ensuring appropriate and ethical use within the bounds of the specified license and these terms.

Citation

@misc{mindguardguard,
      title={MindGuard: Guardrail Classifiers for Multi-Turn Mental Health Support}, 
      author={António Farinhas and Nuno M. Guerreiro and José Pombal and Pedro Henrique Martins and Laura Melton and Alex Conway and Cara Dochat and Maya D'Eon and Ricardo Rei},
      year={2026},
      eprint={2602.00950},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2602.00950}, 
}

Downloads last month: 11,384

Safetensors

Model size

8B params

Tensor type

F32

Collection including swordhealth/MindGuard-8B

MindGuard

Collection

Mental Health safety guardrail data and models • 3 items • Updated Feb 3 • 5

Paper for swordhealth/MindGuard-8B

MindGuard: Guardrail Classifiers for Multi-Turn Mental Health Support

Paper • 2602.00950 • Published Feb 1