u-10bei/dpo-dataset-qwen-cot
Viewer • Updated • 4.04k • 40 • 2
How to use RinnRinnmini/qwen3-4b-structeval-sftdpo_v2-adapter with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="RinnRinnmini/qwen3-4b-structeval-sftdpo_v2-adapter")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("RinnRinnmini/qwen3-4b-structeval-sftdpo_v2-adapter", dtype="auto")How to use RinnRinnmini/qwen3-4b-structeval-sftdpo_v2-adapter with PEFT:
Task type is invalid.
How to use RinnRinnmini/qwen3-4b-structeval-sftdpo_v2-adapter with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RinnRinnmini/qwen3-4b-structeval-sftdpo_v2-adapter"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "RinnRinnmini/qwen3-4b-structeval-sftdpo_v2-adapter",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/RinnRinnmini/qwen3-4b-structeval-sftdpo_v2-adapter
How to use RinnRinnmini/qwen3-4b-structeval-sftdpo_v2-adapter with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "RinnRinnmini/qwen3-4b-structeval-sftdpo_v2-adapter" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "RinnRinnmini/qwen3-4b-structeval-sftdpo_v2-adapter",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "RinnRinnmini/qwen3-4b-structeval-sftdpo_v2-adapter" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "RinnRinnmini/qwen3-4b-structeval-sftdpo_v2-adapter",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use RinnRinnmini/qwen3-4b-structeval-sftdpo_v2-adapter with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for RinnRinnmini/qwen3-4b-structeval-sftdpo_v2-adapter to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for RinnRinnmini/qwen3-4b-structeval-sftdpo_v2-adapter to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for RinnRinnmini/qwen3-4b-structeval-sftdpo_v2-adapter to start chatting
pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
model_name="RinnRinnmini/qwen3-4b-structeval-sftdpo_v2-adapter",
max_seq_length=2048,
)How to use RinnRinnmini/qwen3-4b-structeval-sftdpo_v2-adapter with Docker Model Runner:
docker model run hf.co/RinnRinnmini/qwen3-4b-structeval-sftdpo_v2-adapter
This repository contains a LoRA adapter (PEFT) for StructEval-T style structured output tasks.
It is trained as SFT → DPO on top of Qwen/Qwen3-4B-Instruct-2507.
⚠️ This repo is NOT a full merged model.
Load the base model first, then apply this adapter.
u-10bei/dpo-dataset-qwen-cotRinnRinnmini/lora_structeval_t_qwen3_4b_sft_v1
(This adapter was used as initialization before DPO.)Qwen/Qwen3-4B-Instruct-2507RinnRinnmini/lora_structeval_t_qwen3_4b_sft_v1 as the initialization pointDPOTrainerfrom transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
BASE_ID = "Qwen/Qwen3-4B-Instruct-2507"
# BASE_ID = "unsloth/Qwen3-4B-Instruct-2507"
ADAPTER_ID = "RinnRinnmini/qwen3-4b-structeval-sftdpo_v2-adapter" # this repo (LoRA adapter)
tok = AutoTokenizer.from_pretrained(BASE_ID, trust_remote_code=True, use_fast=True)
if tok.pad_token is None:
tok.pad_token = tok.eos_token
base = AutoModelForCausalLM.from_pretrained(
BASE_ID,
device_map="auto",
torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, ADAPTER_ID)
model.eval()
messages = [
{"role": "user", "content": "Return a JSON with keys a,b,c and integer values."}
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
pad_token_id=tok.eos_token_id,
)
print(tok.decode(out[0], skip_special_tokens=True))
Base model
Qwen/Qwen3-4B-Instruct-2507