mansurealism commited on
Commit
ddecfd6
·
verified ·
1 Parent(s): ec168a0

mansurealism/llm-course-hw2-reward-model

Browse files
Files changed (2) hide show
  1. README.md +2 -2
  2. training_args.bin +2 -2
README.md CHANGED
@@ -2,7 +2,7 @@
2
  base_model: mansurealism/llm-course-hw2-dpo
3
  datasets: HumanLLMs/Human-Like-DPO-Dataset
4
  library_name: transformers
5
- model_name: ''
6
  tags:
7
  - generated_from_trainer
8
  - trl
@@ -10,7 +10,7 @@ tags:
10
  licence: license
11
  ---
12
 
13
- # Model Card for
14
 
15
  This model is a fine-tuned version of [mansurealism/llm-course-hw2-dpo](https://huggingface.co/mansurealism/llm-course-hw2-dpo) on the [HumanLLMs/Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset) dataset.
16
  It has been trained using [TRL](https://github.com/huggingface/trl).
 
2
  base_model: mansurealism/llm-course-hw2-dpo
3
  datasets: HumanLLMs/Human-Like-DPO-Dataset
4
  library_name: transformers
5
+ model_name: llm-course-hw2-reward-model
6
  tags:
7
  - generated_from_trainer
8
  - trl
 
10
  licence: license
11
  ---
12
 
13
+ # Model Card for llm-course-hw2-reward-model
14
 
15
  This model is a fine-tuned version of [mansurealism/llm-course-hw2-dpo](https://huggingface.co/mansurealism/llm-course-hw2-dpo) on the [HumanLLMs/Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset) dataset.
16
  It has been trained using [TRL](https://github.com/huggingface/trl).
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:9bbd61fd09f228ad9dbdd389b9eab517cf7f6afc5ed15a3d981141c61cea0eee
3
- size 5368
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:07dd56bff0945bed6ba90e670599b1fe2947ec1e417b6dd26d5f1cab5d5bfd10
3
+ size 5432