Update README.md
Browse files
README.md
CHANGED
|
@@ -134,6 +134,11 @@ The reward function used throughout this phase was very simple:
|
|
| 134 |
|
| 135 |
The initial training phase was run for 3 epochs with:
|
| 136 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 137 |
* batch size = 16
|
| 138 |
* number of generations = 4
|
| 139 |
* learning rate = 1e-5
|
|
@@ -143,6 +148,10 @@ The initial training phase was run for 3 epochs with:
|
|
| 143 |
|
| 144 |
A second phase followed, resetting the learning rate to `1e-6` with a linear decay schedule.
|
| 145 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 146 |
|
| 147 |
|
| 148 |
|
|
|
|
| 134 |
|
| 135 |
The initial training phase was run for 3 epochs with:
|
| 136 |
|
| 137 |
+
|
| 138 |
+

|
| 139 |
+

|
| 140 |
+

|
| 141 |
+
|
| 142 |
* batch size = 16
|
| 143 |
* number of generations = 4
|
| 144 |
* learning rate = 1e-5
|
|
|
|
| 148 |
|
| 149 |
A second phase followed, resetting the learning rate to `1e-6` with a linear decay schedule.
|
| 150 |
|
| 151 |
+

|
| 152 |
+

|
| 153 |
+

|
| 154 |
+
|
| 155 |
|
| 156 |
|
| 157 |
|