JokeGPT-Model / ppo_model /README.md

TzJ2006

Upload folder using huggingface_hub

f89a9b6 verified about 1 month ago

preview code

raw

history blame contribute delete

953 Bytes

metadata

base_model: JokeGPT-Model/sft_final
library_name: peft
license: apache-2.0
tags:
  - ppo
  - rlhf
  - humor
  - qwen
  - lora

JokeGPT - PPO Model

This is the final PPO-aligned version of JokeGPT. It has been optimized using Reinforcement Learning from Human Feedback (RLHF) to maximize humor scores provided by the Reward Model.

Model Details

Base Model: JokeGPT SFT Model
Training Method: PPO (Proximal Policy Optimization) with LoRA
Objective: Maximize humor reward while maintaining KL divergence from the SFT policy.

Performance

This model aims to generate jokes that are consistently rated as more humorous compared to the SFT baseline, as evaluated by the Reward Model.

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", device_map="auto")
model = PeftModel.from_pretrained(model, "JokeGPT-Model/ppo_model")