mansurealism
/

working

Text Classification

Generated from Trainer

text-embeddings-inference

Model card Files Files and versions

mansurealism commited on Apr 5, 2025

Commit

ddecfd6

·

verified ·

1 Parent(s): ec168a0

mansurealism/llm-course-hw2-reward-model

Files changed (2) hide show

README.md +2 -2
training_args.bin +2 -2

README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 base_model: mansurealism/llm-course-hw2-dpo
 datasets: HumanLLMs/Human-Like-DPO-Dataset
 library_name: transformers
-model_name: ''
 tags:
 - generated_from_trainer
 - trl
@@ -10,7 +10,7 @@ tags:
 licence: license
 ---
-# Model Card for
 This model is a fine-tuned version of [mansurealism/llm-course-hw2-dpo](https://huggingface.co/mansurealism/llm-course-hw2-dpo) on the [HumanLLMs/Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset) dataset.
 It has been trained using [TRL](https://github.com/huggingface/trl).

 base_model: mansurealism/llm-course-hw2-dpo
 datasets: HumanLLMs/Human-Like-DPO-Dataset
 library_name: transformers
+model_name: llm-course-hw2-reward-model
 tags:
 - generated_from_trainer
 - trl
 licence: license
 ---
+# Model Card for llm-course-hw2-reward-model
 This model is a fine-tuned version of [mansurealism/llm-course-hw2-dpo](https://huggingface.co/mansurealism/llm-course-hw2-dpo) on the [HumanLLMs/Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset) dataset.
 It has been trained using [TRL](https://github.com/huggingface/trl).

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9bbd61fd09f228ad9dbdd389b9eab517cf7f6afc5ed15a3d981141c61cea0eee
-size 5368

 version https://git-lfs.github.com/spec/v1
+oid sha256:07dd56bff0945bed6ba90e670599b1fe2947ec1e417b6dd26d5f1cab5d5bfd10
+size 5432