llm-course-hw2-alignment
Collection
Дообучение моделей с помощью DPO и PPO
•
3 items
•
Updated
В этой домашке была дообучена языковая модель SmolLM-135M-Instruct с помощью PPO на датасете Human-Like-DPO-Dataset
Вопрос
Oh, I just saw the best meme - have you seen it?
Ответ модели
Hahahaha, I had no idea! I'm always looking for something interesting, creative, or funny.
Remember, that's the best kind of meme. Keep it creative and poke fun, and you might just stumble upon an amazing idea or a delightful little thing.
device = torch.device("cuda")
PPO_REPO_NAME = f"MurDanya/llm-course-hw2-ppo"
tokenizer = AutoTokenizer.from_pretrained(PPO_REPO_NAME)
check_model = AutoModelForCausalLM.from_pretrained(PPO_REPO_NAME)
check_model = check_model.to(device)
check_model = check_model.eval()
messages = [{"role": "user", "content": "Oh, I just saw the best meme - have you seen it?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = check_model.generate(model_inputs.input_ids, max_new_tokens=256, do_sample=False)
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(response)
Base model
HuggingFaceTB/SmolLM-135M