A collection of parameter-efficient fine-tuning experiments for sentiment classification using chat-based instruction tuning
Denis Matveev
sodeniZz
AI & ML interests
None yet
Organizations
None yet
Parameter-Efficient Fine-Tuning (LoRA & DoRa & QLoRA)
A collection of parameter-efficient fine-tuning experiments for sentiment classification using chat-based instruction tuning
LLM Course Homework 2: RLHF (DPO & PPO)
The collection includes the DPO-trained model, PPO-trained model, and the Reward Model used for PPO.