pere commited on
Commit
de243ef
·
1 Parent(s): 56d2e7f

Saving weights and logs of step 10000

Browse files
README.md CHANGED
@@ -1 +1,5 @@
1
  Just for performing some experiments. Do not use.
 
 
 
 
 
1
  Just for performing some experiments. Do not use.
2
+
3
+ Since the loss seem to start going up, I did have to restore this from 9e945cb0636bde60bec30bd7df5db30f80401cc7 (2 step 600k/200). I am then restarting with warmup decaying from 1e-4.
4
+
5
+
events.out.tfevents.1641666808.t1v-n-ccbf3e94-w-0.792149.3.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d234277a4579128e139aa241541106f29e30dcc392c9c20d1bd8d82562b59f51
3
+ size 1470136
flax_model.msgpack CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:df383e5d2aa55f3e13fcc3dbd076bce1372f2e42325790161c80e6f2d5ff7ac3
3
  size 498796983
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ea8be818d6a6d1fed62fd1200f35b6f2dc8f7d297717334b26e5a3e2a0c25aca
3
  size 498796983
run_step3.sh CHANGED
@@ -13,7 +13,7 @@
13
  --per_device_eval_batch_size="40" \
14
  --learning_rate="1e-4" \
15
  --end_learning_rate="5e-3" \
16
- --warmup_steps="0" \
17
  --overwrite_output_dir \
18
  --num_train_epochs="2" \
19
  --adam_beta1="0.9" \
 
13
  --per_device_eval_batch_size="40" \
14
  --learning_rate="1e-4" \
15
  --end_learning_rate="5e-3" \
16
+ --warmup_steps="10000" \
17
  --overwrite_output_dir \
18
  --num_train_epochs="2" \
19
  --adam_beta1="0.9" \