Model Card for Model ID

Model Details

Model Description

This model is a fine-tuned version of HuggingFaceTB/SmolLM-135M-Instruct on the HumanLLMs/Human-Like-DPO-Dataset dataset. It has been trained using TRL.

Training Procedure

TrainOutput(global_step=2448, training_loss=0.026948539260166143, metrics={'train_runtime': 819.2334, 'train_samples_per_second': 47.801, 'train_steps_per_second': 2.988, 'total_flos': 0.0, 'train_loss': 0.026948539260166143, 'epoch': 4.0})

[More Information Needed]

Downloads last month: -

Safetensors

Model size

0.1B params

Tensor type

BF16

Collection including xinyuema/llm-course-hw2-reward-model-module

collactions_of_dpo_and_ppo

Collection

collactions_of_dpo_and_ppo • 4 items • Updated Mar 28, 2025