MattBou00
/

SingleRound1B-checkpoint-epoch-20

Reinforcement Learning

text-generation

text-generation-inference

Model card Files Files and versions

MattBou00 commited on Nov 21, 2025

Commit

4785250

·

verified ·

1 Parent(s): bc167e7

Checkpoint epoch 20

Files changed (2) hide show

README.md +3 -3
rlhf_config.yaml +2 -2

README.md CHANGED Viewed

@@ -26,7 +26,7 @@ You can then generate text as follows:
 ```python
 from transformers import pipeline
-generator = pipeline("text-generation", model="MattBou00//content/IRL-Alignment-Auditor/outputs/2025-11-21_13-27-11/checkpoints/checkpoint-epoch-20")
 outputs = generator("Hello, my llama is cute")
 ```
@@ -36,8 +36,8 @@ If you want to use the model for training or to obtain the outputs from the valu
 from transformers import AutoTokenizer
 from trl import AutoModelForCausalLMWithValueHead
-tokenizer = AutoTokenizer.from_pretrained("MattBou00//content/IRL-Alignment-Auditor/outputs/2025-11-21_13-27-11/checkpoints/checkpoint-epoch-20")
-model = AutoModelForCausalLMWithValueHead.from_pretrained("MattBou00//content/IRL-Alignment-Auditor/outputs/2025-11-21_13-27-11/checkpoints/checkpoint-epoch-20")
 inputs = tokenizer("Hello, my llama is cute", return_tensors="pt")
 outputs = model(**inputs, labels=inputs["input_ids"])

 ```python
 from transformers import pipeline
+generator = pipeline("text-generation", model="MattBou00//content/IRL-Alignment-Auditor/outputs/2025-11-21_13-56-58/checkpoints/checkpoint-epoch-20")
 outputs = generator("Hello, my llama is cute")
 ```
 from transformers import AutoTokenizer
 from trl import AutoModelForCausalLMWithValueHead
+tokenizer = AutoTokenizer.from_pretrained("MattBou00//content/IRL-Alignment-Auditor/outputs/2025-11-21_13-56-58/checkpoints/checkpoint-epoch-20")
+model = AutoModelForCausalLMWithValueHead.from_pretrained("MattBou00//content/IRL-Alignment-Auditor/outputs/2025-11-21_13-56-58/checkpoints/checkpoint-epoch-20")
 inputs = tokenizer("Hello, my llama is cute", return_tensors="pt")
 outputs = model(**inputs, labels=inputs["input_ids"])

rlhf_config.yaml CHANGED Viewed

@@ -48,7 +48,7 @@ output:
 wandb:
   project: irl_rlhf
   entity: null
-  name: Llama-3.2-1B-2025-11-21_13-27-11
 irl:
   irl_root: re_irl_min_stratified_plots
   posterior_dir: re_irl_min_stratified_plots/meta_llama_Llama_3.2_1B/round_1
@@ -65,4 +65,4 @@ irl:
   features_on_cpu: false
   reward_scale: 8
   reward_clip: 4
-now: 2025-11-21_13-27-11

 wandb:
   project: irl_rlhf
   entity: null
+  name: Llama-3.2-1B-2025-11-21_13-56-58
 irl:
   irl_root: re_irl_min_stratified_plots
   posterior_dir: re_irl_min_stratified_plots/meta_llama_Llama_3.2_1B/round_1
   features_on_cpu: false
   reward_scale: 8
   reward_clip: 4
+now: 2025-11-21_13-56-58