Instructions to use qingy2024/UwU-14B-Math-v0.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use qingy2024/UwU-14B-Math-v0.2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="qingy2024/UwU-14B-Math-v0.2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("qingy2024/UwU-14B-Math-v0.2")
model = AutoModelForCausalLM.from_pretrained("qingy2024/UwU-14B-Math-v0.2", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use qingy2024/UwU-14B-Math-v0.2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "qingy2024/UwU-14B-Math-v0.2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "qingy2024/UwU-14B-Math-v0.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/qingy2024/UwU-14B-Math-v0.2

SGLang

How to use qingy2024/UwU-14B-Math-v0.2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "qingy2024/UwU-14B-Math-v0.2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "qingy2024/UwU-14B-Math-v0.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "qingy2024/UwU-14B-Math-v0.2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "qingy2024/UwU-14B-Math-v0.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use qingy2024/UwU-14B-Math-v0.2 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for qingy2024/UwU-14B-Math-v0.2 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for qingy2024/UwU-14B-Math-v0.2 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for qingy2024/UwU-14B-Math-v0.2 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="qingy2024/UwU-14B-Math-v0.2",
    max_seq_length=2048,
)

Docker Model Runner
How to use qingy2024/UwU-14B-Math-v0.2 with Docker Model Runner:
```
docker model run hf.co/qingy2024/UwU-14B-Math-v0.2
```

Do you use transformer or mergekit to merge models?

by QuangDuy - opened Dec 25, 2024

Discussion

QuangDuy

Dec 25, 2024

If you use mergekit can you tell me how to use it?

qingy2024

Owner Dec 25, 2024

I use mergekit! If you'd like some examples you can refer to their github, or I could also share some merge templates that I use :)

QuangDuy

Dec 26, 2024

I hope you can share some merge templates you are using. I am doing it with Qwen 7b.

qingy2024

Owner Dec 26, 2024

Here are a couple that I've used/seen people use:

models:
  - model: qingy2024/NaturalLM3-8B-Instruct-v0.1
    parameters:
      weight: 1
      density: 1
  - model: NousResearch/Hermes-3-Llama-3.1-8B
    parameters:
      weight: 1
      density: 1
merge_method: ties
base_model: meta-llama/Meta-Llama-3.1-8B
parameters:
  weight: 1
  density: 1
  normalize: true
  int8_mask: true
tokenizer_source: qingy2024/NaturalLM3-8B-Instruct-v0.1
dtype: bfloat16

models:
  - model: arcee-ai/Virtuoso-Small
    parameters:
      weight: 1
      density: 1
merge_method: ties
base_model: Qwen/Qwen2.5-14B
parameters:
  weight: 1
  density: 1
  normalize: true
  int8_mask: true
dtype: float16

models:
  - model: Qwen/Qwen2.5-Math-7B-Instruct
    parameters:
      weight: 1
      density: 1
  - model: Qwen/Qwen2.5-7B-Instruct
    parameters:
      weight: 1
      density: 1
merge_method: ties
base_model: Qwen/Qwen2.5-7B
parameters:
  weight: 1
  density: 1
  normalize: true
  int8_mask: true
tokenizer_source: Qwen/Qwen2.5-7B-Instruct
dtype: bfloat16

models:
  - model: CultriX/SeQwence-14Bv1
    parameters:
      weight: 0.22        # Boosted slightly to improve general task performance
      density: 0.62       # Prioritize generalist adaptability
  - model: allknowingroger/QwenSlerp6-14B
    parameters:
      weight: 0.18
      density: 0.59       # Slight increase to enhance contextual reasoning (tinyHellaswag)
  - model: CultriX/Qwen2.5-14B-Wernickev3
    parameters:
      weight: 0.16
      density: 0.56       # Minor increase to stabilize GPQA and MUSR performance
  - model: CultriX/Qwen2.5-14B-Emergedv3
    parameters:
      weight: 0.15        # Increase weight for domain-specific expertise
      density: 0.55
  - model: VAGOsolutions/SauerkrautLM-v2-14b-DPO
    parameters:
      weight: 0.12
      density: 0.56       # Enhance factual reasoning and IFEval contributions
  - model: CultriX/Qwen2.5-14B-Unity
    parameters:
      weight: 0.10
      density: 0.53
  - model: qingy2019/Qwen2.5-Math-14B-Instruct
    parameters:
      weight: 0.10
      density: 0.51       # Retain focus on MATH and advanced reasoning tasks

merge_method: dare_ties
base_model: CultriX/SeQwence-14Bv1
parameters:
  normalize: true
  int8_mask: true
dtype: bfloat16
tokenizer_source: Qwen/Qwen2.5-14B-Instruct

adaptive_merge_parameters:
  task_weights:
    IFEval: 1.5           # Strengthened for better instruction-following
    BBH: 1.3
    MATH: 1.6             # Emphasize advanced reasoning and problem-solving
    GPQA: 1.4             # Improve factual recall and logical QA tasks
    MUSR: 1.5             # Strengthened multi-step reasoning capabilities
    MMLU-PRO: 1.3         # Slight boost for domain-specific multitask knowledge
  smoothing_factor: 0.19   # Refined for smoother blending of task strengths
gradient_clipping: 0.88    # Tightened slightly for precise parameter contribution

QuangDuy

Dec 26, 2024

Thank you so much

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment