Model Card for Model ID

Model Details

Model Description

This model is a fine-tuned version of HuggingFaceTB/SmolLM-135M-Instruct on the HumanLLMs/Human-Like-DPO-Dataset dataset. It has been trained using TRL.

Training Procedure

TrainOutput(global_step=2448, training_loss=0.026948539260166143, metrics={'train_runtime': 819.2334, 'train_samples_per_second': 47.801, 'train_steps_per_second': 2.988, 'total_flos': 0.0, 'train_loss': 0.026948539260166143, 'epoch': 4.0})

[More Information Needed]

Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including xinyuema/llm-course-hw2-reward-model-module