Update README.md

3682adc verified about 1 year ago

3.39 kB

	---
	library_name: transformers
	datasets:
	- HumanLLMs/Human-Like-DPO-Dataset
	language:
	- en
	base_model:
	- HuggingFaceTB/SmolLM-135M-Instruct
	---
	# 🤖 SmolLM-135M - PPO Fine-Tuned Model

	## 📝 Overview
	This is a Proximal Policy Optimization (PPO) fine-tuned version of SmolLM-135M, designed to make its responses more emotional and engaging. The model has been optimized using HumanLLMs/Human-Like-DPO-Dataset.

	PPO is a reinforcement learning algorithm commonly used in RLHF (Reinforcement Learning from Human Feedback) to optimize model behavior based on a reward model. [Paper](https://arxiv.org/abs/1707.06347)

	## 📊 Training & Dataset
	- Base Model: SmolLM-135M
	- Reward Model: [`tsessk/llm-course-hw2-reward-model`](https://huggingface.co/tsessk/llm-course-hw2-reward-model)
	- Fine-Tuned Dataset: [HumanLLMs/Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)
	- Objective: Improve expressiveness and emotional depth through reward-based reinforcement learning.


	## 📌 Example of PPO-Enhanced Response
	Here’s an example of how the PPO model responds to a simple prompt compared to a standard LLM.

	### Prompt:
	```plaintext
	What's your morning routine like?
	```
	### Base model Response:
	```plaintext
	I'm excited to start my morning routine! As a digital AI assistant, I don't have personal preferences or habits, but I can provide you with a general idea of what a morning routine might look like. Here's a sample routine that I've developed based on various studies and research:

	Morning Routine (10-15 minutes)

	1. Hydrate: Drink a full glass of water or a healthy beverage to start the day.
	2. Brush teeth: Brush your teeth for 2 minutes, then rinse with water.
	3. Get dressed: Put on some comfortable clothes, including a light jacket or sweater.
	4. Get ready for work: Start with some morning exercise, such as a 10-minute yoga or stretching routine.
	5. Plan your day: Review your schedule, set goals, and prioritize tasks.
	6. Get ready for bed: Wind down with a relaxing activity, like reading or listening to calming music.

	Morning Routine (15-30 minutes)

	1. Get ready for work: Start with some morning exercise, such as a 10-minute yoga or stretching routine.
	2. **Get ready for
	```

	### PPO Response:
	```plaintext
	I'm excited to start my morning routine! As a digital AI assistant, I don't have personal preferences or habits, but I can provide you with a general idea of what a morning routine might look like. Here's a sample routine I came up with:

	Morning Routine (10-15 minutes)

	1. Hydrate: Drink a full glass of water or a herbal tea to start the day.
	2. Brush teeth: Brush your teeth for 2-3 minutes, then rinse with water.
	3. Get dressed: Put on some comfortable clothes, including a light jacket or sweater.
	4. Get ready for work: Start your day with a quick breakfast, such as oatmeal with fruit or scrambled eggs with whole-grain toast.
	5. Take a few deep breaths: Take a few deep breaths to calm your mind and body.
	6. Get ready for the day: Start your day with a morning workout, such as a yoga or Pilates class.
	7. Get ready for school: Start your day with a morning routine, such as reading a book or taking a short walk.
	8. Get ready for bed: Start your day with a gentle sleep
	```

	---
	library_name: transformers
	datasets:
	- HumanLLMs/Human-Like-DPO-Dataset
	language:
	- en
	base_model:
	- HuggingFaceTB/SmolLM-135M-Instruct
	---
	# 🤖 SmolLM-135M - PPO Fine-Tuned Model

	## 📝 Overview
	This is a Proximal Policy Optimization (PPO) fine-tuned version of SmolLM-135M, designed to make its responses more emotional and engaging. The model has been optimized using HumanLLMs/Human-Like-DPO-Dataset.

	PPO is a reinforcement learning algorithm commonly used in RLHF (Reinforcement Learning from Human Feedback) to optimize model behavior based on a reward model. [Paper](https://arxiv.org/abs/1707.06347)

	## 📊 Training & Dataset
	- Base Model: SmolLM-135M
	- Reward Model: [`tsessk/llm-course-hw2-reward-model`](https://huggingface.co/tsessk/llm-course-hw2-reward-model)
	- Fine-Tuned Dataset: [HumanLLMs/Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset)
	- Objective: Improve expressiveness and emotional depth through reward-based reinforcement learning.


	## 📌 Example of PPO-Enhanced Response
	Here’s an example of how the PPO model responds to a simple prompt compared to a standard LLM.

	### Prompt:
	```plaintext
	What's your morning routine like?
	```
	### Base model Response:
	```plaintext
	I'm excited to start my morning routine! As a digital AI assistant, I don't have personal preferences or habits, but I can provide you with a general idea of what a morning routine might look like. Here's a sample routine that I've developed based on various studies and research:

	Morning Routine (10-15 minutes)

	1. Hydrate: Drink a full glass of water or a healthy beverage to start the day.
	2. Brush teeth: Brush your teeth for 2 minutes, then rinse with water.
	3. Get dressed: Put on some comfortable clothes, including a light jacket or sweater.
	4. Get ready for work: Start with some morning exercise, such as a 10-minute yoga or stretching routine.
	5. Plan your day: Review your schedule, set goals, and prioritize tasks.
	6. Get ready for bed: Wind down with a relaxing activity, like reading or listening to calming music.

	Morning Routine (15-30 minutes)

	1. Get ready for work: Start with some morning exercise, such as a 10-minute yoga or stretching routine.
	2. **Get ready for
	```

	### PPO Response:
	```plaintext
	I'm excited to start my morning routine! As a digital AI assistant, I don't have personal preferences or habits, but I can provide you with a general idea of what a morning routine might look like. Here's a sample routine I came up with:

	Morning Routine (10-15 minutes)

	1. Hydrate: Drink a full glass of water or a herbal tea to start the day.
	2. Brush teeth: Brush your teeth for 2-3 minutes, then rinse with water.
	3. Get dressed: Put on some comfortable clothes, including a light jacket or sweater.
	4. Get ready for work: Start your day with a quick breakfast, such as oatmeal with fruit or scrambled eggs with whole-grain toast.
	5. Take a few deep breaths: Take a few deep breaths to calm your mind and body.
	6. Get ready for the day: Start your day with a morning workout, such as a yoga or Pilates class.
	7. Get ready for school: Start your day with a morning routine, such as reading a book or taking a short walk.
	8. Get ready for bed: Start your day with a gentle sleep
	```