Instructions to use Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview")
model = AutoModelForCausalLM.from_pretrained("Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview

SGLang

How to use Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview",
    max_seq_length=2048,
)

Docker Model Runner
How to use Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview with Docker Model Runner:
```
docker model run hf.co/Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview
```

This model is purely for experimental purposes. Fine tuned on finetome, pinkchat-sft, pinkchat-dpo, the model is able to generate text which makes sense.

Additional fine-tuning is needed.

The model does not perform well, yet it does work. It has been fine tuned on 2 billion tokens of mostly syntetic data and some human made data in the sft process.

Phase 0: In mergekit, we remove 16 layers (out of 28, so 12 layers left: Pinkstackorg/Qwen2.5-3Bprunebase-1M) using passthrough.

Phase 1a: Fine tuning the model on a limited amount of data, lora 16 (21% trained). This phase is to get the model started on generating some sense, mainly for healing the model and nothing else, very low quality text would be generated.

Phase 1b: Fine tuning the model on a bigger amount of data, lora of 64(2.75% trained, due to removing lm_head, embed_tokens from the target_modules.) and a high sequence length on the same dataset (finetome) as phase 1a, would make the model much better at all tasks, but the model is still not able to generate proper high quality text, better than 1a though.

Phase 2: Fine tuning the model on a special dataset with synthetic generations, human text, code generations, math generations, some qwq generations for advanced reasoning. phase 2 makes the model be able to generate higher quality text, but has some issues, we use a low sequence length for only knowledge distillation, thus the model falls into loops sometimes when trying to generate long text. it is useable.

Phase 3: DPO on our Pinkstack/Pinkchat-dpo-19k-en dataset, on a higher sequence length, this phase is highly important, it makes the model safer, have better alignment and follow prompts better. the model has better performance and loops less but is still not great.

Phase 3 was done inside of google colab, other phases were run locally.

Uploaded model

Developed by: Pinkstack
License: apache-2.0
Finetuned from model : Pinkstack/qwen2.5-3b-1m-sft-phase2-max96-lowloss

This qwen2 model was trained with Unsloth and Huggingface's TRL library.

Downloads last month: 8

Safetensors

Model size

4B params

Tensor type

F16

Model tree for Pinkstackorg/PinkQwen2.5-3B-1M-DPO-preview

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct-1M

Finetuned

(51)

this model