| | --- |
| | license: apache-2.0 |
| | --- |
| | |
| | # Reasons to Reject? Aligning Language Models with Judgments. |
| | This repository contains the CUT model from our work, |
| |
|
| | [Reasons to Reject? Aligning Language Models with Judgments](https://arxiv.org/abs/2312.14591). |
| |
|
| | Weiwen Xu, Deng Cai, Zhisong Zhang, Wai Lam, Shuming Shi |
| |
|
| | The source codes can be found in https://github.com/wwxu21/CUT |
| | **** |
| |
|
| | ## 1. Model description |
| |
|
| | This model achieves 91.36 on [AlpacaEval](https://tatsu-lab.github.io/alpaca_eval). |
| | It is tuned after 4 iterations of online alignment. In each iteration, we apply the following three steps: |
| |
|
| | - Step 1: Collect instructions, and obtain the responses from the target model. |
| |
|
| | - Step 2: Annotate judgments for the responses. |
| |
|
| | - Step 3: Apply CUT to fine-tune the target model with the above instruction-response-judgment triplets. |
| |
|
| | Specifically, we use [LLaMA2-chat-13b](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as the base LLM. In each iteration, we sample 1000 instructions from [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca). |
| | To avoid over-fitting, we ensure that the sampled data are different in each iteration. |
| | We then ask GPT4 for the judgment annotation. |
| |
|
| |
|
| | ## 2. Template |
| | The CUT model is a chat model and it uses the following [Alpaca template](https://github.com/tatsu-lab/stanford_alpaca): |
| | ``` |
| | Below is an instruction that describes a task. Write a response that appropriately completes the request. |
| | |
| | ### Instruction: |
| | {instruction} |
| | |
| | ### Response: |
| | ``` |
| |
|
| | ### 3. How to use |
| |
|
| | #### 3.1. Huggingface |
| |
|
| | ```python |
| | import torch |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | torch.set_default_device("cuda") |
| | |
| | model = AutoModelForCausalLM.from_pretrained("xww033/cut-13b", torch_dtype=torch.float16) |
| | tokenizer = AutoTokenizer.from_pretrained("xww033/cut-13b") |
| | |
| | inputs = tokenizer('''Below is an instruction that describes a task. Write a response that appropriately completes the request. |
| | |
| | ### Instruction: |
| | How did US states get their names? |
| | |
| | ### Response:''', return_tensors="pt", return_attention_mask=False) |
| | |
| | outputs = model.generate(**inputs, max_length=2048) |
| | text = tokenizer.batch_decode(outputs)[0] |
| | print(text) |
| | ``` |
| |
|
| | #### 3.2. FastChat |
| |
|
| | [Fastchat](https://github.com/lm-sys/FastChat) provides a simple setup for those interested in trying our aligned model. After downloading the [CUT model](https://huggingface.co/xww033/cut-13b) through HuggingFace, clone the Fastchat repository: |
| |
|
| | ```bash |
| | git clone https://github.com/lm-sys/FastChat.git |
| | cd FastChat |
| | ``` |
| |
|
| | Download the required packages: |
| |
|
| | ```bash |
| | pip install --upgrade pip # enable PEP 660 support |
| | pip install -e . |
| | ``` |
| |
|
| | Finally, run the following: |
| |
|
| | ```bash |
| | python -m fastchat.serve.cli --model-path xww033/cut-13b --conv-template alpaca |
| | ``` |
| |
|
| |
|
| | ### 4. BibTeX entry and citation info |
| | ```bibtxt |
| | @article{xu2023reasons, |
| | title={Reasons to Reject? Aligning Language Models with Judgments}, |
| | author={Xu, Weiwen and Cai, Deng and Zhang, Zhisong and Lam, Wai and Shi, Shuming}, |
| | journal={arXiv preprint arXiv:2312.14591}, |
| | year={2023} |
| | } |
| | ``` |
| |
|