A humor-focused language model trained on prompts and completions scraped from a subreddit known for its comedic content. The model undergoes Supervised Fine-Tuning (SFT) and Parameter-Efficient Fine-Tuning (PEFT) using LoRA to optimize its parameters efficiently. Following these steps, the model is further refined using Direct Preference Optimization (DPO), which aligns it with human preferences by leveraging chosen and rejected responses from the dataset. This multi-stage training pipeline ensures the model generates contextually appropriate and humorous outputs while maintaining computational efficiency.

The SFT-trained version can be found here: Humorous_SFT_LLama2_7b.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ALEXIOSTER/Humorous_DPO_LLama2_7b

Base model

meta-llama/Llama-2-7b-hf

Finetuned

(974)

this model