Humorous_DPO_LLama2_7b / README.md

ALEXIOSTER

Update README.md

1c99c55 verified 9 months ago

preview code

raw

history blame contribute delete

871 Bytes

metadata

library_name: transformers
tags:
  - DPO
  - Humor
  - Humor Generation
license: llama2
base_model:
  - meta-llama/Llama-2-7b-hf

A humor-focused language model trained on prompts and completions scraped from a subreddit known for its comedic content. The model undergoes Supervised Fine-Tuning (SFT) and Parameter-Efficient Fine-Tuning (PEFT) using LoRA to optimize its parameters efficiently. Following these steps, the model is further refined using Direct Preference Optimization (DPO), which aligns it with human preferences by leveraging chosen and rejected responses from the dataset. This multi-stage training pipeline ensures the model generates contextually appropriate and humorous outputs while maintaining computational efficiency.

The SFT-trained version can be found here: Humorous_SFT_LLama2_7b.