Instructions to use kixlab/prefmatcher-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use kixlab/prefmatcher-7b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="kixlab/prefmatcher-7b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("kixlab/prefmatcher-7b")
model = AutoModelForCausalLM.from_pretrained("kixlab/prefmatcher-7b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use kixlab/prefmatcher-7b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "kixlab/prefmatcher-7b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kixlab/prefmatcher-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/kixlab/prefmatcher-7b

SGLang

How to use kixlab/prefmatcher-7b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "kixlab/prefmatcher-7b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kixlab/prefmatcher-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "kixlab/prefmatcher-7b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kixlab/prefmatcher-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use kixlab/prefmatcher-7b with Docker Model Runner:
```
docker model run hf.co/kixlab/prefmatcher-7b
```

tsook commited on Jul 23, 2025

Commit

985ea97

verified ·

1 Parent(s): 4c3b339

Upload folder using huggingface_hub

Browse files

Files changed (14) hide show

README.md +110 -0
config.json +27 -0
ft-model-00001-of-00004.safetensors +3 -0
ft-model-00002-of-00004.safetensors +3 -0
ft-model-00003-of-00004.safetensors +3 -0
ft-model-00004-of-00004.safetensors +3 -0
generation_config.json +14 -0
merges.txt +0 -0
model.safetensors.index.json +346 -0
original_repo_id.json +3 -0
tokenizer.json +0 -0
tokenizer_config.json +207 -0
torchtune_config.yaml +78 -0
vocab.json +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,110 @@

+---
+license: apache-2.0
+language:
+- en
+pipeline_tag: text-generation
+base_model: Qwen/Qwen2.5-7B
+tags:
+- chat
+library_name: transformers
+---
+## Links for Reference
+- **Homepage: https://cupid.kixlab.org**
+- **Repository: https://github.com/kixlab/CUPID**
+- **Benchmark Dataset: https://huggingface.co/datasets/kixlab/CUPID**
+- **Paper: https://arxiv.org/abs/XXXX.XXXXX**
+- **Point of Contact: taesoo.kim@kaist.ac.kr**
+# TL; DR
+**PrefMatcher-7B** instantiates the *Preference Match* metric proposed in the [CUPID benchmark](https://huggingface.co/datasets/kixlab/CUPID). The model takes a preference description and an evaluation checklist to assess whether each checklist item matches or is covered by the preference. The model is trained using [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) as its base model.  PrefMatcher provides a high-fidelity, cost efficient judge for automatic evaluation on the CUPID benchmark.
+# Model Details
+PrefMatcher-7B was finetuned through QLoRA for 1 epoch on 4k data samples (i.e., prefernece-checklist matches). PrefMatcher achieved a Krippendorff's alpha of 0.748 with human annotations. The data samples were created through the synthesis pipeline for the CUPID benchmark, which were then evaluated or matched by GPT-4o. The model was trained through the [torchtune](https://github.com/pytorch/torchtune) library.
+## Model Description
+- **Model type:** Language model
+- **Language(s) (NLP):** English
+- **License:** Apache 2.0
+# Usage
+Here is example code to use the model with [VLLM](https://github.com/vllm-project/vllm) to predict the match between a preference and an evaluation checklist.
+```python
+from vllm import LLM, SamplingParams
+model_name = "kixlab/prefmatcher-7b"
+# Load the model
+llm = LLM(
+    model=model_name,
+    load_format="safetensors",
+    kv_cache_dtype="auto",
+    max_model_len=512
+)
+# Prepare example input
+preference = "Analysis should focus exclusively on visible surface defects and their direct correlation to specific printer settings."
+checklist = [
+    "Does the training document provide a detailed framework?",
+    "Does the training document provide a systematic framework?",
+    "Does the framework link external and internal test cube measurements to specific diagnostics?",
+    "Does the framework link external and internal test cube measurements to specific quality improvement actions?",
+]
+checklist_str = "\n".join([f"{i+1}. {item}" for i, item in enumerate(checklist)])
+messages = [{
+    "role": "system",
+    "content": "You are an analytical and insightful assistant that can determine the similarity between **evaluation checklists** and **evaluation criteria**. A criterion describes an aspect of AI outputs that should be evaluated. A checklist contain questions that are used to evaluate more specific or fine-grained aspects of the AI outputs. You will be provided with pairs of checklists and criteria. For each pair, you should determine whether each entry in the checklist is **covered** by the criterion. **Covered** means that the criterion and the checklist entry will evaluate the same or similar aspects of an AI output, even if they use different wording or phrasing."
+},
+{
+    "role": "user",
+    "content": f"#### Criterion\n\n{preference}\n\n#### Checklist\n\n{checklist_str}"
+}]
+sampling_params = SamplingParams(
+    max_tokens=512,
+    temperature=0.7
+)
+# Generate the output
+outputs = llm.chat(messages, sampling_params=sampling_params, use_tqdm=False)
+# Print the output
+print(outputs[0].outputs[0].text)
+```
+# Training Details
+## Training hyperparameters
+The following hyperparameters were used for training:
+- learning_rate: 3e-4
+- train_batch_size: 4
+- gradient_accumulation_steps: 8
+- weight_decay: 1e-2
+- optimizer: AdamW
+- lr_scheduler_type: Cosine with warmup
+- num_warmup_steps: 100
+- lora_rank: 64
+- lora_alpha: 128
+- lora_dropout: 0.0
+- lora_attn_modules: ['q_proj', 'v_proj', 'output_proj']
+- apply_lora_to_mlp: True
+# Citation
+If you find our work useful, please consider citing our paper!
+**BibTeX:**
+```bibtex
+@article{kim2025cupid,
+  title     = {CUPID: Evaluating Personalized and Contextualized Alignment of LLMs from Interactions},
+  author    = {Kim, Tae Soo and Lee, Yoonjoo and Park, Yoonah and Kim, Jiho and Kim, Young-Ho and Kim, Juho},
+  journal   = {arXiv preprint arXiv:XXXX.YYYYY},
+  year      = {2025},
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,27 @@

+{
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "eos_token_id": 151645,
+  "hidden_act": "silu",
+  "hidden_size": 3584,
+  "initializer_range": 0.02,
+  "intermediate_size": 18944,
+  "max_position_embeddings": 32768,
+  "max_window_layers": 28,
+  "model_type": "qwen2",
+  "num_attention_heads": 28,
+  "num_hidden_layers": 28,
+  "num_key_value_heads": 4,
+  "rms_norm_eps": 1e-06,
+  "rope_theta": 1000000.0,
+  "sliding_window": 131072,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.43.1",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "vocab_size": 152064
+}

ft-model-00001-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d9f7b7fb18e902758c911922320744ecb9f3a66bf32d049d33a9f58e376c6cff
+size 3945441440

ft-model-00002-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4476cd3622132af9012c10584db4c447a63cbb506a4fbda0911a549887221d3d
+size 3864726352

ft-model-00003-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bed7395e357d123af64650d57f964f8b1ae3c5f3c8e5c532bb3eb54c96131ffc
+size 3864726424

ft-model-00004-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:38855261ac43847dd8ba830848493ebd8c940f07d083bb70bd2f18fc92f264a8
+size 3556377672

generation_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "bos_token_id": 151643,
+  "pad_token_id": 151643,
+  "do_sample": true,
+  "eos_token_id": [
+    151645,
+    151643
+  ],
+  "repetition_penalty": 1.05,
+  "temperature": 0.7,
+  "top_p": 0.8,
+  "top_k": 20,
+  "transformers_version": "4.37.0"
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,346 @@

+{
+  "metadata": {
+    "total_size": 15231233024
+  },
+  "weight_map": {
+    "model.embed_tokens.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.0.input_layernorm.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.down_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.gate_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.up_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.k_proj.bias": "ft-model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.q_proj.bias": "ft-model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.v_proj.bias": "ft-model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.1.input_layernorm.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.down_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.gate_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.up_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.k_proj.bias": "ft-model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.q_proj.bias": "ft-model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.v_proj.bias": "ft-model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.2.input_layernorm.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.down_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.gate_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.up_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.k_proj.bias": "ft-model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.q_proj.bias": "ft-model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.v_proj.bias": "ft-model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.3.input_layernorm.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.down_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.gate_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.up_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.k_proj.bias": "ft-model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.q_proj.bias": "ft-model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.v_proj.bias": "ft-model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.4.input_layernorm.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.down_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.gate_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.up_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.k_proj.bias": "ft-model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.q_proj.bias": "ft-model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.v_proj.bias": "ft-model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.5.input_layernorm.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.down_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.gate_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.up_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.k_proj.bias": "ft-model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.q_proj.bias": "ft-model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.v_proj.bias": "ft-model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.6.input_layernorm.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.k_proj.bias": "ft-model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.q_proj.bias": "ft-model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.v_proj.bias": "ft-model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "ft-model-00001-of-00004.safetensors",
+    "model.layers.10.input_layernorm.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.down_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.gate_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.up_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.k_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.q_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.v_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.11.input_layernorm.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.down_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.gate_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.up_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.k_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.q_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.v_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.12.input_layernorm.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.down_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.gate_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.up_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.k_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.q_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.v_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.13.input_layernorm.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.down_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.gate_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.up_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.k_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.q_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.v_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.14.input_layernorm.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.up_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.k_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.q_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.v_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.6.mlp.down_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.6.mlp.gate_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.6.mlp.up_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.7.input_layernorm.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.7.mlp.down_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.7.mlp.gate_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.7.mlp.up_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.7.self_attn.k_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.7.self_attn.q_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.7.self_attn.v_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.8.input_layernorm.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.8.mlp.down_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.8.mlp.gate_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.8.mlp.up_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.8.self_attn.k_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.8.self_attn.q_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.8.self_attn.v_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.9.input_layernorm.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.down_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.gate_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.up_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.k_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.q_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.v_proj.bias": "ft-model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "ft-model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.down_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.14.mlp.gate_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.15.input_layernorm.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.15.mlp.down_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.15.mlp.gate_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.15.mlp.up_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.15.self_attn.k_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.15.self_attn.q_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.15.self_attn.v_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.16.input_layernorm.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.16.mlp.down_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.16.mlp.gate_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.16.mlp.up_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.16.self_attn.k_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.16.self_attn.q_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.16.self_attn.v_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.17.input_layernorm.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.17.mlp.down_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.17.mlp.gate_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.17.mlp.up_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.17.self_attn.k_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.17.self_attn.q_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.17.self_attn.v_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.18.input_layernorm.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.18.mlp.down_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.18.mlp.gate_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.18.mlp.up_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.18.self_attn.k_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.18.self_attn.q_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.18.self_attn.v_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.19.input_layernorm.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.19.mlp.down_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.19.mlp.gate_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.19.mlp.up_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.k_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.q_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.v_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.20.input_layernorm.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.down_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.gate_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.up_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.k_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.q_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.v_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.21.input_layernorm.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.down_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.gate_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.up_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.k_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.q_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.v_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.22.input_layernorm.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.gate_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.up_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.k_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.q_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.v_proj.bias": "ft-model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "ft-model-00003-of-00004.safetensors",
+    "lm_head.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.22.mlp.down_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.23.input_layernorm.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.23.mlp.down_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.23.mlp.gate_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.23.mlp.up_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.23.self_attn.k_proj.bias": "ft-model-00004-of-00004.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.23.self_attn.q_proj.bias": "ft-model-00004-of-00004.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.23.self_attn.v_proj.bias": "ft-model-00004-of-00004.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.24.input_layernorm.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.24.mlp.down_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.24.mlp.gate_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.24.mlp.up_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.24.post_attention_layernorm.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.24.self_attn.k_proj.bias": "ft-model-00004-of-00004.safetensors",
+    "model.layers.24.self_attn.k_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.24.self_attn.o_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.24.self_attn.q_proj.bias": "ft-model-00004-of-00004.safetensors",
+    "model.layers.24.self_attn.q_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.24.self_attn.v_proj.bias": "ft-model-00004-of-00004.safetensors",
+    "model.layers.24.self_attn.v_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.25.input_layernorm.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.25.mlp.down_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.25.mlp.gate_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.25.mlp.up_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.25.post_attention_layernorm.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.25.self_attn.k_proj.bias": "ft-model-00004-of-00004.safetensors",
+    "model.layers.25.self_attn.k_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.25.self_attn.o_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.25.self_attn.q_proj.bias": "ft-model-00004-of-00004.safetensors",
+    "model.layers.25.self_attn.q_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.25.self_attn.v_proj.bias": "ft-model-00004-of-00004.safetensors",
+    "model.layers.25.self_attn.v_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.26.input_layernorm.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.26.mlp.down_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.26.mlp.gate_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.26.mlp.up_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.26.post_attention_layernorm.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.26.self_attn.k_proj.bias": "ft-model-00004-of-00004.safetensors",
+    "model.layers.26.self_attn.k_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.26.self_attn.o_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.26.self_attn.q_proj.bias": "ft-model-00004-of-00004.safetensors",
+    "model.layers.26.self_attn.q_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.26.self_attn.v_proj.bias": "ft-model-00004-of-00004.safetensors",
+    "model.layers.26.self_attn.v_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.27.input_layernorm.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.27.mlp.down_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.27.mlp.gate_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.27.mlp.up_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.27.post_attention_layernorm.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.27.self_attn.k_proj.bias": "ft-model-00004-of-00004.safetensors",
+    "model.layers.27.self_attn.k_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.27.self_attn.o_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.27.self_attn.q_proj.bias": "ft-model-00004-of-00004.safetensors",
+    "model.layers.27.self_attn.q_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.layers.27.self_attn.v_proj.bias": "ft-model-00004-of-00004.safetensors",
+    "model.layers.27.self_attn.v_proj.weight": "ft-model-00004-of-00004.safetensors",
+    "model.norm.weight": "ft-model-00004-of-00004.safetensors"
+  }
+}

original_repo_id.json ADDED Viewed

	@@ -0,0 +1,3 @@

+{
+    "repo_id": "Qwen/Qwen2.5-7B-Instruct"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,207 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

torchtune_config.yaml ADDED Viewed

	@@ -0,0 +1,78 @@

+output_dir: /hdd/taesoo/pllm-finetuning/models/Qwen2.5-7B-Mixed
+model:
+  _component_: torchtune.models.qwen2_5.lora_qwen2_5_7b_instruct
+  lora_attn_modules:
+  - q_proj
+  - v_proj
+  - output_proj
+  apply_lora_to_mlp: true
+  apply_lora_to_output: false
+  lora_rank: 64
+  lora_alpha: 128
+  lora_dropout: 0.0
+  quantize_base: true
+tokenizer:
+  _component_: torchtune.models.qwen2_5.qwen2_5_tokenizer
+  path: /hdd/taesoo/pllm-finetuning/models/Qwen2.5-7B-Instruct/vocab.json
+  merges_file: /hdd/taesoo/pllm-finetuning/models/Qwen2.5-7B-Instruct/merges.txt
+  max_seq_len: null
+checkpointer:
+  _component_: torchtune.training.FullModelHFCheckpointer
+  checkpoint_dir: /hdd/taesoo/pllm-finetuning/models/Qwen2.5-7B-Instruct
+  checkpoint_files:
+  - model-00001-of-00004.safetensors
+  - model-00002-of-00004.safetensors
+  - model-00003-of-00004.safetensors
+  - model-00004-of-00004.safetensors
+  recipe_checkpoint: null
+  output_dir: ${output_dir}
+  model_type: QWEN2
+resume_from_checkpoint: false
+dataset:
+  _component_: torchtune.datasets.chat_dataset
+  source: json
+  conversation_column: messages
+  conversation_style: openai
+  data_files: data/train_dataset_mixed.json
+  split: train
+seed: null
+shuffle: true
+batch_size: 4
+optimizer:
+  _component_: torch.optim.AdamW
+  fused: true
+  weight_decay: 0.01
+  lr: 0.0003
+lr_scheduler:
+  _component_: torchtune.training.lr_schedulers.get_cosine_schedule_with_warmup
+  num_warmup_steps: 100
+loss:
+  _component_: torchtune.modules.loss.CEWithChunkedOutputLoss
+epochs: 1
+max_steps_per_epoch: null
+gradient_accumulation_steps: 8
+clip_grad_norm: null
+compile: false
+metric_logger:
+  _component_: torchtune.training.metric_logging.WandBLogger
+  project: torchtune
+log_every_n_steps: 1
+log_peak_memory_stats: false
+device: cuda
+dtype: bf16
+enable_activation_checkpointing: true
+enable_activation_offloading: false
+profiler:
+  _component_: torchtune.training.setup_torch_profiler
+  enabled: false
+  output_dir: ${output_dir}/profiling_outputs
+  cpu: true
+  cuda: true
+  profile_memory: false
+  with_stack: false
+  record_shapes: true
+  with_flops: false
+  wait_steps: 5
+  warmup_steps: 5
+  active_steps: 2
+  num_cycles: 1

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff