ProlificAI/social-reasoning-rlhf
Viewer • Updated • 3.82k • 209 • 52
How to use khazarai/Social-RLHF with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "khazarai/Social-RLHF")How to use khazarai/Social-RLHF with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="khazarai/Social-RLHF")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("khazarai/Social-RLHF", dtype="auto")How to use khazarai/Social-RLHF with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "khazarai/Social-RLHF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "khazarai/Social-RLHF",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/khazarai/Social-RLHF
How to use khazarai/Social-RLHF with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "khazarai/Social-RLHF" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "khazarai/Social-RLHF",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "khazarai/Social-RLHF" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "khazarai/Social-RLHF",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use khazarai/Social-RLHF with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for khazarai/Social-RLHF to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for khazarai/Social-RLHF to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for khazarai/Social-RLHF to start chatting
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="khazarai/Social-RLHF",
max_seq_length=2048,
)How to use khazarai/Social-RLHF with Docker Model Runner:
docker model run hf.co/khazarai/Social-RLHF
This model is a fine-tuned version of Qwen2.5-0.5B-Instruct on the ProlificAI/social-reasoning-rlhf dataset using ORPO. The primary objective was to experiment with Reinforcement Learning from Human Feedback (RLHF) via ORPO, focusing on preference alignment.
Use the code below to get started with the model.
from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
login(token="")
tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen2.5-0.5B-Instruct",)
base_model = AutoModelForCausalLM.from_pretrained(
"unsloth/Qwen2.5-0.5B-Instruct",
device_map={"": 0}, token=""
)
model = PeftModel.from_pretrained(base_model,"khazarai/Social-RLHF")
prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}"""
inputs = tokenizer(
[
prompt.format(
"You are an AI assistant that helps people find information",
"A stranger shares private information with you on public transportation. How might you respond sensitively?",
"",
)
],
return_tensors="pt",
).to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=512)
Base model
Qwen/Qwen2.5-0.5B