Alignment of SmolLM with PPO

Описание задания

В этой домашке была дообучена языковая модель SmolLM-135M-Instruct с помощью PPO на датасете Human-Like-DPO-Dataset

Пример генерации

Вопрос

Oh, I just saw the best meme - have you seen it?

Ответ модели

Hahahaha, I had no idea! I'm always looking for something interesting, creative, or funny.

Remember, that's the best kind of meme. Keep it creative and poke fun, and you might just stumble upon an amazing idea or a delightful little thing.

Пример запуска

device = torch.device("cuda")
PPO_REPO_NAME = f"MurDanya/llm-course-hw2-ppo"

tokenizer = AutoTokenizer.from_pretrained(PPO_REPO_NAME)
check_model = AutoModelForCausalLM.from_pretrained(PPO_REPO_NAME)
check_model = check_model.to(device)
check_model = check_model.eval()

messages = [{"role": "user", "content": "Oh, I just saw the best meme - have you seen it?"}]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = check_model.generate(model_inputs.input_ids, max_new_tokens=256, do_sample=False)
response = tokenizer.decode(generated_ids[0], skip_special_tokens=True)

print(response)
Downloads last month
1
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MurDanya/llm-course-hw2-ppo

Finetuned
(193)
this model

Dataset used to train MurDanya/llm-course-hw2-ppo

Collection including MurDanya/llm-course-hw2-ppo