Shahradmz commited on
Commit
16f5886
·
verified ·
1 Parent(s): 21a6257

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +177 -0
README.md ADDED
@@ -0,0 +1,177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - LifelongAlignment/aifgen-piecewise-preference-shift
5
+ language:
6
+ - en
7
+ metrics:
8
+ - accuracy
9
+ base_model:
10
+ - Qwen/Qwen2.5-0.5B-Instruct
11
+ pipeline_tag: reinforcement-learning
12
+ tags:
13
+ - reward-modeling
14
+ ---
15
+
16
+ # Model Card for Model ID
17
+
18
+ <!-- Provide a quick summary of what the model is/does. -->
19
+
20
+ This is a reward model fine tuned from [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) to train on [AIF Gen piecewise preference shift](https://huggingface.co/datasets/LifelongAlignment/aifgen-piecewise-preference-shift) dataset via TRL.
21
+
22
+ ## Model Details
23
+
24
+ ### Model Description
25
+
26
+ <!-- Provide a longer summary of what this model is. -->
27
+ The training is done on 8 `a100` GPUs for one epoch using full fine-tuning.
28
+
29
+
30
+ - **Developed by:** LifelongAlignment team and the Complex Data Lab
31
+ - **Model type:** Large Language Model - Transformer
32
+ - **Language(s) (NLP):** English
33
+ - **License:** MIT
34
+ - **Finetuned from model:** https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct
35
+
36
+
37
+ ## Uses
38
+
39
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
40
+
41
+ This model is trained to be used for benchmarking RLHF methods in static and Lifelong learning scenarios. TODO: link the paper.
42
+
43
+ ### Direct Use
44
+
45
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
46
+
47
+ Refer to Uses.
48
+
49
+
50
+ ### Out-of-Scope Use
51
+
52
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
53
+
54
+ As mentioned in AIF-Gen datasets as well, please be aware of the hallucinations in the synthetic data if you use this reward model to train agents for deployment.
55
+
56
+ ## Bias, Risks, and Limitations
57
+
58
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
59
+
60
+ The only risk is mentioned in the Out-of-Scope Use.
61
+
62
+ ### Training Procedure
63
+
64
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
65
+
66
+ #### Preprocessing [optional]
67
+
68
+ [More Information Needed]
69
+
70
+
71
+ #### Training Hyperparameters
72
+
73
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
74
+
75
+ #### Speeds, Sizes, Times [optional]
76
+
77
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
78
+
79
+ [More Information Needed]
80
+
81
+ ## Evaluation
82
+
83
+ <!-- This section describes the evaluation protocols and provides the results. -->
84
+
85
+ ### Testing Data, Factors & Metrics
86
+
87
+ #### Testing Data
88
+
89
+ <!-- This should link to a Dataset Card if possible. -->
90
+
91
+ [More Information Needed]
92
+
93
+ #### Factors
94
+
95
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
96
+
97
+ [More Information Needed]
98
+
99
+ #### Metrics
100
+
101
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
102
+
103
+ [More Information Needed]
104
+
105
+ ### Results
106
+
107
+ [More Information Needed]
108
+
109
+ #### Summary
110
+
111
+
112
+
113
+ ## Model Examination [optional]
114
+
115
+ <!-- Relevant interpretability work for the model goes here -->
116
+
117
+ [More Information Needed]
118
+
119
+ ## Environmental Impact
120
+
121
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
122
+
123
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
124
+
125
+ - **Hardware Type:** [More Information Needed]
126
+ - **Hours used:** [More Information Needed]
127
+ - **Cloud Provider:** [More Information Needed]
128
+ - **Compute Region:** [More Information Needed]
129
+ - **Carbon Emitted:** [More Information Needed]
130
+
131
+ ## Technical Specifications [optional]
132
+
133
+ ### Model Architecture and Objective
134
+
135
+ [More Information Needed]
136
+
137
+ ### Compute Infrastructure
138
+
139
+ [More Information Needed]
140
+
141
+ #### Hardware
142
+
143
+ [More Information Needed]
144
+
145
+ #### Software
146
+
147
+ [More Information Needed]
148
+
149
+ ## Citation [optional]
150
+
151
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
152
+
153
+ **BibTeX:**
154
+
155
+ [More Information Needed]
156
+
157
+ **APA:**
158
+
159
+ [More Information Needed]
160
+
161
+ ## Glossary [optional]
162
+
163
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
164
+
165
+ [More Information Needed]
166
+
167
+ ## More Information [optional]
168
+
169
+ [More Information Needed]
170
+
171
+ ## Model Card Authors [optional]
172
+
173
+ [More Information Needed]
174
+
175
+ ## Model Card Contact
176
+
177
+ [More Information Needed]