A collection of parameter-efficient fine-tuning experiments for sentiment classification using chat-based instruction tuning
Denis Matveev
sodeniZz
AI & ML interests
None yet
Organizations
None yet
Parameter-Efficient Fine-Tuning (LoRA & DoRa & QLoRA)
A collection of parameter-efficient fine-tuning experiments for sentiment classification using chat-based instruction tuning
LLM Course Homework 2: RLHF (DPO & PPO)
The collection includes the DPO-trained model, PPO-trained model, and the Reward Model used for PPO.
models
9
sodeniZz/llm-course-hw3-tinyllama-qlora
Updated
sodeniZz/llm-course-hw3-dora
Text Generation
•
0.3B
•
Updated
•
2
sodeniZz/llm-course-hw3-lora
Text Generation
•
0.3B
•
Updated
•
2
sodeniZz/llm-course-hw3-tinyllamma-qlora
Updated
sodeniZz/llm-course-hw2-dpo
Text Generation
•
0.1B
•
Updated
•
1
sodeniZz/llm-course-hw2-ppo
Text Generation
•
0.1B
•
Updated
•
1
sodeniZz/llm-course-hw2-reward-model
Text Classification
•
0.1B
•
Updated
sodeniZz/llm-course-hw1
Updated
sodeniZz/bert-ner-finetuned
33.2M
•
Updated
datasets
0
None public yet