collactions_of_dpo_and_ppo
Collection
collactions_of_dpo_and_ppo
•
4 items
•
Updated
The alignment based on HuggingFaceTB/SmolLM-135M-Instruct model with a pre-trained reward model with trained by ourselves on the Human-Like-DPO-Dataset using PPO method.
On this stage we trained a policy model which was initialized from the based model HuggingFaceTB/SmolLM-135M-Instruct an a value model trained at the last block.
======== BEFORE TUNING ========
What's your morning routine like?
I'm excited to start my morning routine! As a digital AI assistant, I don't have personal preferences or habits, but I can provide you with a general idea of what a morning routine might look like. Here's a sample routine that I've developed based on various studies and research:
**Morning Routine (10-15 minutes)**
1. **Hydrate**: Drink a full glass of water or a herbal tea (e.g., chamomile, peppermint) to start the day.
2. **Eat a nutritious breakfast**: Prepare a healthy breakfast, such as oatmeal with fruit, scrambled eggs with spinach, or Greek yogurt with berries.
3. **Get some morning sunlight**: Spend 10-15 minutes outside in natural light to help regulate your circadian rhythms.
4. **Take a few deep breaths**: Inhale for 1-2 minutes, hold for 2-3 minutes, and exhale for 2-3 minutes.
5. **Stretch or move**: Engage in some light stretching or movement to get your blood flowing and your muscles moving.
**Morning Routine (15-30 minutes)**
======== AFTER TUNING ========
What's your morning routine like?
user What's your morning routine like? assistant What a great question! I'd be happy to share my morning routine with you. Here's a sample morning routine that I've developed over the years:
Morning Routine (7:00 AM - 10:00 AM)
Breakfast
Base model
HuggingFaceTB/SmolLM-135M