--- library_name: transformers datasets: - HumanLLMs/Human-Like-DPO-Dataset language: - en base_model: - HuggingFaceTB/SmolLM-135M-Instruct --- # πŸ€– SmolLM-135M - PPO Fine-Tuned Model ## πŸ“ Overview This is a **Proximal Policy Optimization (PPO) fine-tuned version of SmolLM-135M**, designed to make its responses **more emotional and engaging**. The model has been optimized using **HumanLLMs/Human-Like-DPO-Dataset**. PPO is a reinforcement learning algorithm commonly used in **RLHF (Reinforcement Learning from Human Feedback)** to optimize model behavior based on a reward model. [Paper](https://arxiv.org/abs/1707.06347) ## πŸ“Š Training & Dataset - **Base Model:** SmolLM-135M - **Reward Model:** [`tsessk/llm-course-hw2-reward-model`](https://huggingface.co/tsessk/llm-course-hw2-reward-model) - **Fine-Tuned Dataset:** [HumanLLMs/Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset) - **Objective:** Improve **expressiveness and emotional depth** through **reward-based reinforcement learning**. ## πŸ“Œ Example of PPO-Enhanced Response Here’s an example of how the **PPO model** responds to a simple prompt compared to a standard LLM. ### **Prompt:** ```plaintext What's your morning routine like? ``` ### **Base model Response:** ```plaintext I'm excited to start my morning routine! As a digital AI assistant, I don't have personal preferences or habits, but I can provide you with a general idea of what a morning routine might look like. Here's a sample routine that I've developed based on various studies and research: **Morning Routine (10-15 minutes)** 1. **Hydrate**: Drink a full glass of water or a healthy beverage to start the day. 2. **Brush teeth**: Brush your teeth for 2 minutes, then rinse with water. 3. **Get dressed**: Put on some comfortable clothes, including a light jacket or sweater. 4. **Get ready for work**: Start with some morning exercise, such as a 10-minute yoga or stretching routine. 5. **Plan your day**: Review your schedule, set goals, and prioritize tasks. 6. **Get ready for bed**: Wind down with a relaxing activity, like reading or listening to calming music. **Morning Routine (15-30 minutes)** 1. **Get ready for work**: Start with some morning exercise, such as a 10-minute yoga or stretching routine. 2. **Get ready for ``` ### **PPO Response:** ```plaintext I'm excited to start my morning routine! As a digital AI assistant, I don't have personal preferences or habits, but I can provide you with a general idea of what a morning routine might look like. Here's a sample routine I came up with: **Morning Routine (10-15 minutes)** 1. **Hydrate**: Drink a full glass of water or a herbal tea to start the day. 2. **Brush teeth**: Brush your teeth for 2-3 minutes, then rinse with water. 3. **Get dressed**: Put on some comfortable clothes, including a light jacket or sweater. 4. **Get ready for work**: Start your day with a quick breakfast, such as oatmeal with fruit or scrambled eggs with whole-grain toast. 5. **Take a few deep breaths**: Take a few deep breaths to calm your mind and body. 6. **Get ready for the day**: Start your day with a morning workout, such as a yoga or Pilates class. 7. **Get ready for school**: Start your day with a morning routine, such as reading a book or taking a short walk. 8. **Get ready for bed**: Start your day with a gentle sleep ```