| --- |
| base_model: meta-llama/Llama-3.1-8B-Instruct |
| library_name: peft |
| model_name: margin_reg_baseline_stem_20260328_121551 |
| tags: |
| - base_model:adapter:meta-llama/Llama-3.1-8B-Instruct |
| - lora |
| - reward-trainer |
| - transformers |
| - trl |
| licence: license |
| --- |
| |
| # Model Card for margin_reg_baseline_stem_20260328_121551 |
| |
| This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct). |
| It has been trained using [TRL](https://github.com/huggingface/trl). |
| |
| ## Quick start |
| |
| ```python |
| import torch |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| from peft import PeftModel, PeftConfig |
| |
| # ----------------------------- |
| # 1. Define PEFT model ID & Checkpoint (Epoch) |
| # ----------------------------- |
| peft_model_id = "xxccho/margin_reg_baseline_stem" |
|
|
| # [ Checkpoints to Epochs Mapping ] |
| # Epoch 1 : "checkpoint-245" |
| # Epoch 2 : "checkpoint-490" |
| # Epoch 3 : "checkpoint-735" |
| # Epoch 4 : "checkpoint-980" |
| # Epoch 5 : "checkpoint-1225" |
| # Epoch 6 : "checkpoint-1470" |
| # Epoch 7 : "checkpoint-1715" |
| # Epoch 8 : "checkpoint-1960" |
| # Epoch 9 : "checkpoint-2205" |
| # Epoch 10 : "checkpoint-2450" |
|
|
| # 예시: 5 Epoch 체크포인트를 사용하려면 "checkpoint-1225" 할당. None일 시 최종(10x) 모델 로드 |
| checkpoint = None |
|
|
| # 2. Load the PEFT config |
| config = PeftConfig.from_pretrained(peft_model_id, subfolder=checkpoint) if checkpoint else PeftConfig.from_pretrained(peft_model_id) |
|
|
| # 3. Load tokenizer |
| tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path) |
| if tokenizer.pad_token is None: |
| tokenizer.pad_token = tokenizer.eos_token |
| |
| # 4. Load base model |
| base_model = AutoModelForSequenceClassification.from_pretrained( |
| config.base_model_name_or_path, |
| num_labels=1, |
| torch_dtype=torch.bfloat16, |
| device_map="auto" |
| ) |
| |
| # 5. Apply LoRA adapter |
| model = PeftModel.from_pretrained(base_model, peft_model_id, subfolder=checkpoint) if checkpoint else PeftModel.from_pretrained(base_model, peft_model_id) |
| model.config.pad_token_id = tokenizer.pad_token_id |
| model.eval() |
|
|
| # Example Usage |
| text = "User: Explain the theory of relativity briefly.\nAssistant: The theory of relativity, proposed by Albert Einstein..." |
| inputs = tokenizer(text, return_tensors="pt").to(model.device) |
| |
| with torch.no_grad(): |
| outputs = model(**inputs) |
| reward_score = outputs.logits.squeeze().item() |
| |
| print(f"[STEM] Reward Score: {reward_score:.4f}") |
| |
| ``` |
| |
| ## Training procedure |
| |
| [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/changhee1016-seoul-national-university/reward-model-ood-robustness/runs/kd6wofeh) |
| |
| |
| This model was trained with Reward. |
| |
| ### Framework versions |
| |
| - PEFT 0.18.0 |
| - TRL: 0.26.1 |
| - Transformers: 4.57.3 |
| - Pytorch: 2.9.0 |
| - Datasets: 4.4.1 |
| - Tokenizers: 0.22.1 |
| |
| ## Citations |
| |
| |
| |
| Cite TRL as: |
| |
| ```bibtex |
| @misc{vonwerra2022trl, |
| title = {{TRL: Transformer Reinforcement Learning}}, |
| author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec}, |
| year = 2020, |
| journal = {GitHub repository}, |
| publisher = {GitHub}, |
| howpublished = {\url{https://github.com/huggingface/trl}} |
| } |
| ``` |