VK LLM
Collection
LLM models trained for VK course. • 9 items • Updated
How to use dmitry315/llm-course-hw2-reward-model with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="dmitry315/llm-course-hw2-reward-model") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("dmitry315/llm-course-hw2-reward-model")
model = AutoModelForSequenceClassification.from_pretrained("dmitry315/llm-course-hw2-reward-model")Reward model trained for PPO for VK NLP course.
The model is LLM HuggingFaceTB/SmolLM-135M-Instruct.
Trained only last linear layer.
Model trained with TRL for PPO trainig.
device = torch.device("cuda")
tokenizer = AutoTokenizer.from_pretrained("dmitry315/llm-course-hw2-reward-model")
reward_model = AutoModelForSequenceClassification.from_pretrained("dmitry315/llm-course-hw2-reward-model")
msgs = [
{"role": "user", "content": "<prompt>"},
{"role": "assistant", "content": <LLM answer>}
]
inputs_rejected = tokenizer.apply_chat_template(msgs, tokenize=False)
inputs_rejected = tokenizer(inputs_rejected, return_tensors="pt").to(DEVICE)
score = reward_model(**inputs_chosen).logits[0].cpu().detach()
print(score)
# > torch.tensor([[<score>]])