Instructions to use djuna/Qwen2-2B-RHSD with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use djuna/Qwen2-2B-RHSD with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="djuna/Qwen2-2B-RHSD")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("djuna/Qwen2-2B-RHSD")
model = AutoModelForCausalLM.from_pretrained("djuna/Qwen2-2B-RHSD")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use djuna/Qwen2-2B-RHSD with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "djuna/Qwen2-2B-RHSD"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "djuna/Qwen2-2B-RHSD",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/djuna/Qwen2-2B-RHSD

SGLang

How to use djuna/Qwen2-2B-RHSD with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "djuna/Qwen2-2B-RHSD" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "djuna/Qwen2-2B-RHSD",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "djuna/Qwen2-2B-RHSD" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "djuna/Qwen2-2B-RHSD",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use djuna/Qwen2-2B-RHSD with Docker Model Runner:
```
docker model run hf.co/djuna/Qwen2-2B-RHSD
```

Like to repeat what user say. But here is the result

by djuna - opened Aug 13, 2024

Discussion

djuna

Owner Aug 13, 2024

Summarize this comment "Hey, I've got a few tips for merging that might help you with your 4.0 rendition of Stroganoff. Some researched, some I've found on my own.

Set 'normalize=false' so you have more control with the weights. By default, mergekit has 'normalize=true' and it makes it measure the weights of relative to each other to one. So with what you have it set to now, the weights are computed closer to '0.9, 0.3, 0.9' then '0.3, 0.1, 0.3'.

DARE TIES is great, but I've found that TIES anything loses a lot of the characteristics of the added models into the base model. It has its uses and technically produces a better model. However, it's likely why Spellbound Instruct is breaking your merge since it's retaining a lot of the instructs story heavy capabilities. Try setting one of the other models as base with DARE Linear, then place Spellbound at the bottom of the stack. Should produce a more vibrant model.

If you want to retain as much of a model as possible into a merge, passthrough/frankenmerging is the best route, though it takes the longest to find the right recipe. Manually taking a slice out of one model and putting it into another sometimes works better than most merge methods. Just keep in mind layer theory where the first layers primarily control formatting and instruct following, the mids are where most of the detailed information sits, and the ends have the most influence on the style of the output.

I'm by no means an expert, but hopefully some of that helps and I'm open to any questions if you have any."

Output:
Here's a summary of the comment:

Set 'normalize=false' to control more control over the weights.
By default, mergekit has 'normalize=true' and this makes the weights closer to '0.9, 0.3, 0.9' than '0.3, 0.1, 0.3'.
If your Stroganoff merge formula is set to '0.9, 0.3, 0.9', try setting one of the other models as the base and place Spellbound at the bottom.
DARE TIES is a good method, but it might not be as effective as it should be, losing some of the model's characteristics when combined with other models.
Pass-through/Frankenmerging is generally faster and more reliable but it can take longer to find the right recipe.
Layer theory is important in designing models, where the first layers are responsible for formatting and instructions, the middles are where the detailed information is, and the ends have the most influence on the style of the output.
The author does not consider themselves an expert, but they hope these suggestions may help you and is open to any questions you might have.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment