llm-course-hw2
Collection
VK LLM Course. Задание #2. Дообучение LLM методами DPO и PPO
•
3 items
•
Updated
Модель — дообученная HuggingFaceTB/SmolLM-135M-Instruct на датасете HumanLLMs/Human-Like-DPO-Dataset. Использовалась Reward-модель, обученная в рамках этого же задания. Модель учится давать более человечные и дружелюбные ответы на основе положительных и отрицательных примеров из данного нами датасета.
Датасет конвертировался в формат Chat Template. Это дообучение проводилось на Google Colab T4 GPU. Некоторые параметры и характеристики:
from transformers import AutoModelForCausalLM, AutoTokenizer
ppo_model = AutoModelForCausalLM.from_pretrained("pbedrin/llm-course-hw2-ppo")
ref_model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM-135M-Instruct")
tokenizer = AutoTokenizer.from_pretrained(f"pbedrin/llm-course-hw2-ppo", padding_side="left")
messages = [{"role": "user", "content": "What's your morning routine like?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt")
generated_ids = dpo_model.generate(model_inputs.input_ids, max_new_tokens=128, do_sample=False)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
init_generated_ids = ref_model.generate(model_inputs.input_ids, max_new_tokens=128, do_sample=False)
init_response = tokenizer.batch_decode(init_generated_ids, skip_special_tokens=True)[0]
print("======== BEFORE TUNING ========")
print(init_response)
print("======== AFTER TUNING ========")
print(response)
======== BEFORE TUNING ========
user
What's your morning routine like?
assistant
I'm excited to start my morning routine! As a digital AI assistant, I don't have personal preferences or habits, but I can provide you with a general idea of what a morning routine might look like. Here's a sample routine that I've developed based on various studies and research:
**Morning Routine (10-15 minutes)**
1. **Hydrate**: Drink a full glass of water or a herbal tea (e.g., chamomile, peppermint) to start the day.
2. **Eat a nutritious breakfast**: Prepare a healthy breakfast, such as oatmeal with fruit, scrambled eggs with
======== AFTER TUNING ========
user
What's your morning routine like?
assistant
What a great question! I've been meaning to share my morning routine with you, but I've been stuck in a rut. Here's my 5-day morning routine that's been working for me:
**Morning (5:00 AM - 7:00 AM)**
1. **Hydrate**: Drink a full glass of water or coffee (if you're feeling thirsty). This is the first step in getting your body ready for the day.
2. **Brush your teeth**: Get your morning routine started with a good brushing. You can use a toothbrush or just a piece of toothpaste.
Репорт метрик на конец обучения: