--- base_model: meta-llama/Llama-3.1-8B library_name: transformers pipeline_tag: text-classification tags: - trl - reward-trainer - reward-model - creative-writing license: llama3.1 --- # Llama 8B Creative Writing Verifier This model is a `LlamaForSequenceClassification` reward model for scoring creative-writing stories. It should be used as a scalar verifier/reward model, not as a text-generation model. ## Usage This is a reward model, not a text-generation model. Load it with `AutoModelForSequenceClassification` and score the story directly as raw text. Do not apply a chat template or wrap the story in a prompt. ```python import torch from transformers import AutoModelForSequenceClassification, AutoTokenizer model_id = "SAA-Lab/Llama8B-CreativeWritingVerifier" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForSequenceClassification.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", ) if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token if model.config.pad_token_id is None: model.config.pad_token_id = tokenizer.pad_token_id def reward(story: str) -> float: inputs = tokenizer( story.strip(), return_tensors="pt", truncation=True, max_length=4096, ).to(model.device) with torch.inference_mode(): return model(**inputs).logits.squeeze(-1).float().item() chosen_score = reward(chosen_story) rejected_score = reward(rejected_story) print(chosen_score > rejected_score) ```