prefmatcher-7b / README.md

Update README.md

bb67802 verified 7 months ago

4.4 kB

	---
	license: apache-2.0
	language:
	- en
	pipeline_tag: text-generation
	base_model: Qwen/Qwen2.5-7B
	tags:
	- chat
	library_name: transformers
	---

	## Links for Reference

	- Homepage: https://cupid.kixlab.org
	- Repository: https://github.com/kixlab/CUPID
	- Benchmark Dataset: https://huggingface.co/datasets/kixlab/CUPID
	- Paper: https://arxiv.org/abs/2508.01674
	- Point of Contact: taesoo.kim@kaist.ac.kr

	# TL; DR

	PrefMatcher-7B instantiates the Preference Match metric proposed in the [CUPID benchmark](https://huggingface.co/datasets/kixlab/CUPID) (COLM 2025). The model takes a preference description and an evaluation checklist to assess whether each checklist item matches or is covered by the preference. The model is trained using [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) as its base model. PrefMatcher provides a high-fidelity, cost efficient judge for automatic evaluation on the CUPID benchmark.

	# Model Details

	PrefMatcher-7B was finetuned through QLoRA for 1 epoch on 4k data samples (i.e., prefernece-checklist matches). PrefMatcher achieved a Krippendorff's alpha of 0.748 with human annotations. The data samples were created through the synthesis pipeline for the CUPID benchmark, which were then evaluated or matched by GPT-4o. The model was trained through the [torchtune](https://github.com/pytorch/torchtune) library.

	## Model Description

	- Model type: Language model
	- Language(s) (NLP): English
	- License: Apache 2.0

	# Usage
	Here is example code to use the model with [VLLM](https://github.com/vllm-project/vllm) to predict the match between a preference and an evaluation checklist.
	```python
	from vllm import LLM, SamplingParams

	model_name = "kixlab/prefmatcher-7b"

	# Load the model
	llm = LLM(
	model=model_name,
	load_format="safetensors",
	kv_cache_dtype="auto",
	max_model_len=512
	)

	# Prepare example input
	preference = "Analysis should focus exclusively on visible surface defects and their direct correlation to specific printer settings."
	checklist = [
	"Does the training document provide a detailed framework?",
	"Does the training document provide a systematic framework?",
	"Does the framework link external and internal test cube measurements to specific diagnostics?",
	"Does the framework link external and internal test cube measurements to specific quality improvement actions?",
	]

	checklist_str = "\n".join([f"{i+1}. {item}" for i, item in enumerate(checklist)])
	messages = [{
	"role": "system",
	"content": "You are an analytical and insightful assistant that can determine the similarity between evaluation checklists and evaluation criteria. A criterion describes an aspect of AI outputs that should be evaluated. A checklist contain questions that are used to evaluate more specific or fine-grained aspects of the AI outputs. You will be provided with pairs of checklists and criteria. For each pair, you should determine whether each entry in the checklist is covered by the criterion. Covered means that the criterion and the checklist entry will evaluate the same or similar aspects of an AI output, even if they use different wording or phrasing."
	},
	{
	"role": "user",
	"content": f"#### Criterion\n\n{preference}\n\n#### Checklist\n\n{checklist_str}"
	}]

	sampling_params = SamplingParams(
	max_tokens=512,
	temperature=0.7
	)

	# Generate the output
	outputs = llm.chat(messages, sampling_params=sampling_params, use_tqdm=False)

	# Print the output
	print(outputs[0].outputs[0].text)
	```

	# Training Details
	## Training hyperparameters

	The following hyperparameters were used for training:
	- learning_rate: 3e-4
	- train_batch_size: 4
	- gradient_accumulation_steps: 8
	- weight_decay: 1e-2
	- optimizer: AdamW
	- lr_scheduler_type: Cosine with warmup
	- num_warmup_steps: 100
	- lora_rank: 64
	- lora_alpha: 128
	- lora_dropout: 0.0
	- lora_attn_modules: ['q_proj', 'v_proj', 'output_proj']
	- apply_lora_to_mlp: True

	# Citation

	If you find our work useful, please consider citing our paper!

	BibTeX:

	```bibtex
	@article{kim2025cupid,
	title = {CUPID: Evaluating Personalized and Contextualized Alignment of LLMs from Interactions},
	author = {Kim, Tae Soo and Lee, Yoonjoo and Park, Yoonah and Kim, Jiho and Kim, Young-Ho and Kim, Juho},
	journal = {arXiv preprint arXiv:2508.01674},
	year = {2025},
	}
	```