Model Card for SmolLM-135M-Instruct-PPO

This model is trained as a part of homework on VK NLP Course.

Model Details

SmolLM-135M-Instruct LLM fine-tunned on HumanLLMs/Human-Like-DPO-Dataset for 1 epoch using PPO. Training was performed with this reward model. Training stats are presented below. The most important metric is RLHF reward.

PPO training stats

Comparison of reward scores for generated answers before and after the training, as well as DPO-trained model from here:

reward scores

Examples of outputs:

Prompt:

Can you explain the concept of deep learning in AI, and how it's being used in our daily lives?

Output:

Deep learning is a subset of machine learning that involves the use of artificial neural networks to model and solve complex problems. It's a type of machine learning that's inspired by the structure and function of the human brain, but with a focus on artificial intelligence.

In AI, deep learning is used to enable machines to learn from data, recognize patterns, and make decisions or predictions without being explicitly programmed. This is achieved through the use of multiple layers of artificial neural networks, which are composed of interconnected nodes or "neurons" that process and transmit information.

Here's a breakdown of how deep learning works:

  1. Data: A large dataset is fed into the network, which is typically a feedforward neural network.
  2. Forward pass: The network processes the input data, making predictions or taking actions based on the patterns it learned from the data.
  3. Backward pass: The network computes the error between the predicted output and the actual output, and then updates the weights and biases of the network to minimize the error.
  4. Optimization: The network is optimized using an optimization algorithm, such as stochastic gradient descent (SGD), to minimize the loss function.
  5. Training: The network is trained on the

Base model:

Deep learning is a subset of machine learning that involves the use of artificial neural networks to model and solve complex problems. It's a type of machine learning that's inspired by the structure and function of the human brain, but with a focus on artificial intelligence.

In AI, deep learning is used to power a wide range of applications, from computer vision and natural language processing to speech recognition and autonomous vehicles. Here are some examples of how deep learning is being used in our daily lives:

  1. Image Recognition: Deep learning is used in image recognition systems, such as facial recognition, object detection, and image classification. These systems can recognize objects, people, and scenes in images and videos, enabling applications like security systems, self-driving cars, and surveillance cameras.
  2. Speech Recognition: Deep learning is used in speech recognition systems, such as voice assistants like Siri, Alexa, and Google Assistant, to recognize spoken words and phrases. This technology has many applications, including voice-controlled homes, smart homes, and virtual assistants.
  3. Natural Language Processing (NLP): Deep learning is used in NLP to analyze and generate human language, enabling applications like chatbots, language translation, and text summarization.
  4. **Robotics

Prompt:

What do you like to do on a typical Sunday afternoon?

Output:

What a great question! As a conversational AI, I don't have personal preferences, but I can provide you with some general insights on what people typically enjoy doing on a typical Sunday afternoon. Keep in mind that individual preferences may vary, and what one person enjoys, another person might not.

Reading and Learning

  • Reading: I love to read books, articles, and online content. I enjoy exploring different genres, authors, and topics.
  • Learning: I enjoy learning new things, whether it's through books, podcasts, online courses, or attending workshops or conferences.

Socializing and Building Relationships

  • Socializing: I love spending time with friends, family, and colleagues. I enjoy engaging in conversations, sharing experiences, and building connections with others.
  • Building Relationships: I enjoy spending time with people, whether it's through phone calls, video chats, or in-person interactions.

Creative Pursuits

  • Writing: I enjoy writing short stories, poetry, or even just writing short paragraphs. I enjoy expressing myself creatively and exploring different forms of writing.
  • Creative Writing: I love writing short stories, poetry, or even just writing short paragraphs. I enjoy exploring different forms of writing

Base model:

What a great question! As a digital AI assistant, I don't have personal preferences, but I can provide you with some general insights on what people typically enjoy doing on a typical Sunday afternoon. Keep in mind that individual preferences may vary, and what one person enjoys, another person might not. Here are some examples:

  1. Reading: Many people read for pleasure, whether it's fiction, non-fiction, or simply to relax and unwind.
  2. Exercise and physical activity: People often engage in physical activities like walking, jogging, cycling, or swimming, which can be a great way to unwind and improve their overall health.
  3. Socializing: Socializing with friends, family, or colleagues can be a great way to connect with others and build relationships.
  4. Learning and education: People often engage in learning activities like attending classes, workshops, or online courses, which can help them improve their skills and knowledge.
  5. Creative pursuits: Some individuals enjoy creative activities like painting, drawing, writing, photography, or playing music, which can be a great way to express themselves and relax.
  6. Food and dining: Many people enjoy trying new foods, trying out new restaurants, or enjoying a
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including X1716/llm-course-hw2-ppo2