| | --- |
| | library_name: transformers |
| | datasets: |
| | - HumanLLMs/Human-Like-DPO-Dataset |
| | language: |
| | - en |
| | base_model: |
| | - HuggingFaceTB/SmolLM-135M-Instruct |
| | --- |
| | # ๐ค SmolLM-135M - PPO Fine-Tuned Model |
| |
|
| | ## ๐ Overview |
| | This is a **Proximal Policy Optimization (PPO) fine-tuned version of SmolLM-135M**, designed to make its responses **more emotional and engaging**. The model has been optimized using **HumanLLMs/Human-Like-DPO-Dataset**. |
| |
|
| | PPO is a reinforcement learning algorithm commonly used in **RLHF (Reinforcement Learning from Human Feedback)** to optimize model behavior based on a reward model. [Paper](https://arxiv.org/abs/1707.06347) |
| |
|
| | ## ๐ Training & Dataset |
| | - **Base Model:** SmolLM-135M |
| | - **Reward Model:** [`tsessk/llm-course-hw2-reward-model`](https://huggingface.co/tsessk/llm-course-hw2-reward-model) |
| | - **Fine-Tuned Dataset:** [HumanLLMs/Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset) |
| | - **Objective:** Improve **expressiveness and emotional depth** through **reward-based reinforcement learning**. |
| |
|
| |
|
| | ## ๐ Example of PPO-Enhanced Response |
| | Hereโs an example of how the **PPO model** responds to a simple prompt compared to a standard LLM. |
| |
|
| | ### **Prompt:** |
| | ```plaintext |
| | What's your morning routine like? |
| | ``` |
| | ### **Base model Response:** |
| | ```plaintext |
| | I'm excited to start my morning routine! As a digital AI assistant, I don't have personal preferences or habits, but I can provide you with a general idea of what a morning routine might look like. Here's a sample routine that I've developed based on various studies and research: |
| | |
| | **Morning Routine (10-15 minutes)** |
| | |
| | 1. **Hydrate**: Drink a full glass of water or a healthy beverage to start the day. |
| | 2. **Brush teeth**: Brush your teeth for 2 minutes, then rinse with water. |
| | 3. **Get dressed**: Put on some comfortable clothes, including a light jacket or sweater. |
| | 4. **Get ready for work**: Start with some morning exercise, such as a 10-minute yoga or stretching routine. |
| | 5. **Plan your day**: Review your schedule, set goals, and prioritize tasks. |
| | 6. **Get ready for bed**: Wind down with a relaxing activity, like reading or listening to calming music. |
| | |
| | **Morning Routine (15-30 minutes)** |
| | |
| | 1. **Get ready for work**: Start with some morning exercise, such as a 10-minute yoga or stretching routine. |
| | 2. **Get ready for |
| | ``` |
| |
|
| | ### **PPO Response:** |
| | ```plaintext |
| | I'm excited to start my morning routine! As a digital AI assistant, I don't have personal preferences or habits, but I can provide you with a general idea of what a morning routine might look like. Here's a sample routine I came up with: |
| | |
| | **Morning Routine (10-15 minutes)** |
| | |
| | 1. **Hydrate**: Drink a full glass of water or a herbal tea to start the day. |
| | 2. **Brush teeth**: Brush your teeth for 2-3 minutes, then rinse with water. |
| | 3. **Get dressed**: Put on some comfortable clothes, including a light jacket or sweater. |
| | 4. **Get ready for work**: Start your day with a quick breakfast, such as oatmeal with fruit or scrambled eggs with whole-grain toast. |
| | 5. **Take a few deep breaths**: Take a few deep breaths to calm your mind and body. |
| | 6. **Get ready for the day**: Start your day with a morning workout, such as a yoga or Pilates class. |
| | 7. **Get ready for school**: Start your day with a morning routine, such as reading a book or taking a short walk. |
| | 8. **Get ready for bed**: Start your day with a gentle sleep |
| | ``` |