Instructions to use khazarai/MentalChat-16K with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use khazarai/MentalChat-16K with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="khazarai/MentalChat-16K")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("khazarai/MentalChat-16K")
model = AutoModelForCausalLM.from_pretrained("khazarai/MentalChat-16K", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use khazarai/MentalChat-16K with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "khazarai/MentalChat-16K"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "khazarai/MentalChat-16K",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/khazarai/MentalChat-16K

SGLang

How to use khazarai/MentalChat-16K with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "khazarai/MentalChat-16K" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "khazarai/MentalChat-16K",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "khazarai/MentalChat-16K" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "khazarai/MentalChat-16K",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use khazarai/MentalChat-16K with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for khazarai/MentalChat-16K to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for khazarai/MentalChat-16K to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for khazarai/MentalChat-16K to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="khazarai/MentalChat-16K",
    max_seq_length=2048,
)

Docker Model Runner
How to use khazarai/MentalChat-16K with Docker Model Runner:
```
docker model run hf.co/khazarai/MentalChat-16K
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Model Card for Model MentalChat-16K

Model Details

This model is a fine-tuned version of Llama-3.2-1B-Instruct, optimized for empathetic and supportive conversations in the mental health domain. It was trained on the ShenLab/MentalChat16K dataset, which includes over 16,000 counseling-style Q&A examples, combining real clinical paraphrases and synthetic mental health dialogues. The model is designed to understand and respond to emotionally nuanced prompts related to stress, anxiety, relationships, and personal well-being.

Model Description

Language(s) (NLP): English
License: MIT
Finetuned from model: unsloth/Llama-3.2-1B-Instruct
Dataset: ShenLab/MentalChat16K

Uses

This model is intended for research and experimentation in AI-driven mental health support. Key use cases include:

Mental health chatbot prototypes
Empathy-focused dialogue agents
Benchmarking LLMs on emotional intelligence and counseling-style prompts
Educational or training tools in psychology or mental health communication

This model is NOT intended for clinical diagnosis, therapy, or real-time intervention. It must not replace licensed mental health professionals.

Bias, Risks, and Limitations

Biases:
- The real interview data is biased toward caregivers (mostly White, female, U.S.-based), which may affect the model’s cultural and demographic generalizability.
- The synthetic dialogues are generated by GPT-3.5, which may introduce linguistic and cultural biases from its pretraining.
Limitations:
- The base model, Qwen2.5-0.5B-Instruct, is a small model (0.5B parameters), limiting depth of reasoning and nuanced understanding.
- Not suitable for handling acute mental health crises or emergency counseling.
- Responses may lack therapeutic rigor or miss subtle psychological cues.
- May produce hallucinated or inaccurate mental health advice.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("khazarai/MentalChat-16K")
model = AutoModelForCausalLM.from_pretrained(
    "khazarai/MentalChat-16K",
    device_map={"": 0}
)

system = """You are a helpful mental health counselling assistant, please answer the mental health questions based on the patient's description.
The assistant gives helpful, comprehensive, and appropriate answers to the user's questions.
"""

question = """
I've been feeling overwhelmed by my responsibilities at work and caring for my aging parents. I've reached a point where I don't know what else I can do, and I'm struggling to communicate this to my boss and family members. I feel guilty for even considering saying no, but I know I need to take care of myself.
"""

messages = [
    {"role" : "system", "content" : system},
    {"role" : "user", "content" : question}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True,
)

from transformers import TextStreamer
_ = model.generate(
    **tokenizer(text, return_tensors = "pt").to("cuda"),
    max_new_tokens = 900,
    temperature = 0.7,
    top_p = 0.8,
    top_k = 20,
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)

Downloads last month: 110

Safetensors

Model size

1B params

Tensor type

BF16

Model tree for khazarai/MentalChat-16K

Base model

meta-llama/Llama-3.2-1B-Instruct

Finetuned

unsloth/Llama-3.2-1B-Instruct

Finetuned

(480)

this model

Adapters

1 model

Quantizations