versae commited on Jul 26, 2021

Commit

821784a

1 Parent(s): 40cc04e

Step... (14000/50000 | Loss: 1.7139594554901123, Acc: 0.6574689745903015): 29%|███████▋ | 14350/50000 [5:33:22<15:08:11, 1.53s/it]

Browse files

Files changed (34) hide show

.gitattributes +3 -0
flax_model.msgpack +1 -1
outputs/checkpoints/{checkpoint-7000 → checkpoint-12000}/config.json +0 -0
outputs/checkpoints/{checkpoint-7000 → checkpoint-12000}/data_collator.joblib +0 -0
outputs/checkpoints/{checkpoint-7000 → checkpoint-12000}/flax_model.msgpack +1 -1
outputs/checkpoints/{checkpoint-9000 → checkpoint-12000}/optimizer_state.msgpack +1 -1
outputs/checkpoints/{checkpoint-7000 → checkpoint-12000}/training_args.joblib +0 -0
outputs/checkpoints/checkpoint-12000/training_state.json +1 -0
outputs/checkpoints/{checkpoint-8000 → checkpoint-13000}/config.json +0 -0
outputs/checkpoints/{checkpoint-8000 → checkpoint-13000}/data_collator.joblib +0 -0
outputs/checkpoints/{checkpoint-9000 → checkpoint-13000}/flax_model.msgpack +1 -1
outputs/checkpoints/{checkpoint-7000 → checkpoint-13000}/optimizer_state.msgpack +1 -1
outputs/checkpoints/{checkpoint-8000 → checkpoint-13000}/training_args.joblib +0 -0
outputs/checkpoints/checkpoint-13000/training_state.json +1 -0
outputs/checkpoints/{checkpoint-9000 → checkpoint-14000}/config.json +0 -0
outputs/checkpoints/{checkpoint-9000 → checkpoint-14000}/data_collator.joblib +0 -0
outputs/checkpoints/{checkpoint-8000 → checkpoint-14000}/flax_model.msgpack +1 -1
outputs/checkpoints/{checkpoint-8000 → checkpoint-14000}/optimizer_state.msgpack +1 -1
outputs/checkpoints/{checkpoint-9000 → checkpoint-14000}/training_args.joblib +0 -0
outputs/checkpoints/checkpoint-14000/training_state.json +1 -0
outputs/checkpoints/checkpoint-7000/training_state.json +0 -1
outputs/checkpoints/checkpoint-8000/training_state.json +0 -1
outputs/checkpoints/checkpoint-9000/training_state.json +0 -1
outputs/events.out.tfevents.1627258355.tablespoon.3000110.3.v2 +2 -2
outputs/flax_model.msgpack +1 -1
outputs/optimizer_state.msgpack +1 -1
outputs/training_state.json +1 -1
pytorch_model.bin +1 -1
run_stream.512.log +0 -0
wandb/run-20210726_001233-17u6inbn/files/output.log +1732 -0
wandb/run-20210726_001233-17u6inbn/files/wandb-summary.json +1 -1
wandb/run-20210726_001233-17u6inbn/logs/debug-internal.log +0 -0
wandb/run-20210726_001233-17u6inbn/logs/debug.log +3 -27
wandb/run-20210726_001233-17u6inbn/run-17u6inbn.wandb +0 -0

.gitattributes CHANGED Viewed

@@ -15,3 +15,6 @@
 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.pt filter=lfs diff=lfs merge=lfs -text
 *.pth filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.wandb filter=lfs diff=lfs merge=lfs -text
+debug.log filter=lfs diff=lfs merge=lfs -text
+debug-internal.log filter=lfs diff=lfs merge=lfs -text

flax_model.msgpack CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:165a80d10b493e4117c19ffeb7cbc1d340e88d14e329eb9be3ab1d32f050f973
 size 249750019

 version https://git-lfs.github.com/spec/v1
+oid sha256:40b18e55e7e0e173646f5693cf8c145dd0ec756f12776cb671210c598dafdb45
 size 249750019

outputs/checkpoints/{checkpoint-7000 → checkpoint-12000}/config.json RENAMED Viewed

File without changes

outputs/checkpoints/{checkpoint-7000 → checkpoint-12000}/data_collator.joblib RENAMED Viewed

File without changes

outputs/checkpoints/{checkpoint-7000 → checkpoint-12000}/flax_model.msgpack RENAMED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:353e62a7bbf3b5817b869c37e749c8e30fe14477d32a3cf95345a030057ed760
 size 249750019

 version https://git-lfs.github.com/spec/v1
+oid sha256:a4df1917f93cb5be75e1a67299b85e14508ce6d594537be9e03fa1ea0d5c451b
 size 249750019

outputs/checkpoints/{checkpoint-9000 → checkpoint-12000}/optimizer_state.msgpack RENAMED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2085e2cdeca180d85963536b92e396dad244a1a40804023af28d868e886658c8
 size 499500278

 version https://git-lfs.github.com/spec/v1
+oid sha256:d5ef9d9909e0225cdfdb08ba23fd64c8a8a881103ca5b932bc2206768a7e920b
 size 499500278

outputs/checkpoints/{checkpoint-7000 → checkpoint-12000}/training_args.joblib RENAMED Viewed

File without changes

outputs/checkpoints/checkpoint-12000/training_state.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"step": 12001}

outputs/checkpoints/{checkpoint-8000 → checkpoint-13000}/config.json RENAMED Viewed

File without changes

outputs/checkpoints/{checkpoint-8000 → checkpoint-13000}/data_collator.joblib RENAMED Viewed

File without changes

outputs/checkpoints/{checkpoint-9000 → checkpoint-13000}/flax_model.msgpack RENAMED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:55484b434d505ef7284a42471c8326f9bebe13561d6cbe478c61990f9fd7a04d
 size 249750019

 version https://git-lfs.github.com/spec/v1
+oid sha256:7781249560c15a41eb883214ab5f6613f40b42c1ae0886c52a020bbfa19f76fb
 size 249750019

outputs/checkpoints/{checkpoint-7000 → checkpoint-13000}/optimizer_state.msgpack RENAMED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0cd67c6ccf30e42fa238a68d1aa1ae063e8e11fc6c50bf034163444ab3f91118
 size 499500278

 version https://git-lfs.github.com/spec/v1
+oid sha256:05c37a1e738b919e689e3c653244d8a680235541f5d91c99fb41edd65340a91d
 size 499500278

outputs/checkpoints/{checkpoint-8000 → checkpoint-13000}/training_args.joblib RENAMED Viewed

File without changes

outputs/checkpoints/checkpoint-13000/training_state.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"step": 13001}

outputs/checkpoints/{checkpoint-9000 → checkpoint-14000}/config.json RENAMED Viewed

File without changes

outputs/checkpoints/{checkpoint-9000 → checkpoint-14000}/data_collator.joblib RENAMED Viewed

File without changes

outputs/checkpoints/{checkpoint-8000 → checkpoint-14000}/flax_model.msgpack RENAMED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:fcd1e001a114c411bab4cde0ffdf4e4bc13e918b2c1c3cac7a75100e5a3f0349
 size 249750019

 version https://git-lfs.github.com/spec/v1
+oid sha256:40b18e55e7e0e173646f5693cf8c145dd0ec756f12776cb671210c598dafdb45
 size 249750019

outputs/checkpoints/{checkpoint-8000 → checkpoint-14000}/optimizer_state.msgpack RENAMED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7070a9b0eb3c596cc8b7f538faa458611e2d751b69600e272ec31b7c5c1bbc82
 size 499500278

 version https://git-lfs.github.com/spec/v1
+oid sha256:c3b657f7303349384c5ab4bd1d5226d2f8dbc1b641fc9355b1d5d4d2825ce382
 size 499500278

outputs/checkpoints/{checkpoint-9000 → checkpoint-14000}/training_args.joblib RENAMED Viewed

File without changes

outputs/checkpoints/checkpoint-14000/training_state.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"step": 14001}

outputs/checkpoints/checkpoint-7000/training_state.json DELETED Viewed

	@@ -1 +0,0 @@
1	- {"step": 7001}

outputs/checkpoints/checkpoint-8000/training_state.json DELETED Viewed

	@@ -1 +0,0 @@
1	- {"step": 8001}

outputs/checkpoints/checkpoint-9000/training_state.json DELETED Viewed

	@@ -1 +0,0 @@
1	- {"step": 9001}

outputs/events.out.tfevents.1627258355.tablespoon.3000110.3.v2 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0526c59728161c425d35ed5858dbea479f8f85f54dacc020da0bd7b01b4c8862
-size 1693554

 version https://git-lfs.github.com/spec/v1
+oid sha256:4576e5515e6cf1926a9625e2db3778c06552a27d5a56f4b306bfdc6dec02245d
+size 2061819

outputs/flax_model.msgpack CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:165a80d10b493e4117c19ffeb7cbc1d340e88d14e329eb9be3ab1d32f050f973
 size 249750019

 version https://git-lfs.github.com/spec/v1
+oid sha256:40b18e55e7e0e173646f5693cf8c145dd0ec756f12776cb671210c598dafdb45
 size 249750019

outputs/optimizer_state.msgpack CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:17907ad9f925f7ff5210c836be64cf4f0b87dea575a17582ec3bce13447deb03
 size 499500278

 version https://git-lfs.github.com/spec/v1
+oid sha256:c3b657f7303349384c5ab4bd1d5226d2f8dbc1b641fc9355b1d5d4d2825ce382
 size 499500278

outputs/training_state.json CHANGED Viewed

	@@ -1 +1 @@
1	- {"step": ~~11001~~}


1	+ {"step": 14001}

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e31df557709db351ba2444f6f63f0229f00d2662c5659a587f36135034e99a59
 size 498858859

 version https://git-lfs.github.com/spec/v1
+oid sha256:f0290d7d4fc3d31d587881870f70299d2262836ee8bad199236e57b27fd504a0
 size 498858859

run_stream.512.log CHANGED Viewed

The diff for this file is too large to render. See raw diff

wandb/run-20210726_001233-17u6inbn/files/output.log CHANGED Viewed

	@@ -7731,6 +7731,1738 @@ You should probably TRAIN this model on a down-stream task to be able to use it
7731
7732
7733




































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































7734
7735
7736

+Step... (11000/50000 | Loss: 1.7415039539337158, Acc: 0.6532756686210632):  24%|██████▍                    | 12000/50000 [4:36:38<14:37:46,  1.39s/it]
+Step... (11500 | Loss: 1.8508110046386719, Learning Rate: 0.0004666667082346976)
+Step... (11000/50000 | Loss: 1.7415039539337158, Acc: 0.6532756686210632):  24%|██████▍                    | 12000/50000 [4:36:40<14:37:46,  1.39s/it]
+[06:45:03] - INFO - __main__ - Saving checkpoint at 12000 steps█████████████████████████████████████████████████████| 130/130 [00:21<00:00,  4.60it/s]
+All Flax model weights were used when initializing RobertaForMaskedLM.
+Some weights of RobertaForMaskedLM were not initialized from the Flax model and are newly initialized: ['lm_head.decoder.weight', 'roberta.embeddings.position_ids', 'lm_head.decoder.bias']
+You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
+Step... (12000/50000 | Loss: 1.7264103889465332, Acc: 0.6554967761039734):  26%|███████                    | 13000/50000 [5:00:28<12:23:03,  1.20s/it]
+Step... (12500 | Loss: 1.8441736698150635, Learning Rate: 0.00045454545761458576)
+Step... (12000/50000 | Loss: 1.7264103889465332, Acc: 0.6554967761039734):  26%|███████                    | 13000/50000 [5:00:30<12:23:03,  1.20s/it]
+[07:08:53] - INFO - __main__ - Saving checkpoint at 13000 steps█████████████████████████████████████████████████████| 130/130 [00:21<00:00,  4.60it/s]
+All Flax model weights were used when initializing RobertaForMaskedLM.
+Some weights of RobertaForMaskedLM were not initialized from the Flax model and are newly initialized: ['lm_head.decoder.weight', 'roberta.embeddings.position_ids', 'lm_head.decoder.bias']
+You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
+Step... (13000/50000 | Loss: 1.725870966911316, Acc: 0.6557744741439819):  28%|███████▊                    | 14000/50000 [5:24:30<15:07:56,  1.51s/it]
+Step... (13500 | Loss: 1.8221518993377686, Learning Rate: 0.0004424242360983044)
+Step... (14000 | Loss: 1.7394559383392334, Learning Rate: 0.0004363636835478246)
+[07:32:53] - INFO - __main__ - Saving checkpoint at 14000 steps█████████████████████████████████████████████████████| 130/130 [00:21<00:00,  4.59it/s]
+All Flax model weights were used when initializing RobertaForMaskedLM.
+Some weights of RobertaForMaskedLM were not initialized from the Flax model and are newly initialized: ['lm_head.decoder.weight', 'roberta.embeddings.position_ids', 'lm_head.decoder.bias']
+You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

wandb/run-20210726_001233-17u6inbn/files/wandb-summary.json CHANGED Viewed

	@@ -1 +1 @@
1	- {"global_step": ~~11500~~, "_timestamp": ~~1627281196~~.~~535555~~, "train_time": ~~349233~~.~~125~~, "train_learning_rate": 0.~~0004666667082346976~~, "_step": ~~22931~~, "train_loss": 2.~~128620147705078~~, "eval_accuracy": 0.~~6532756686210632~~, "eval_loss": 1.~~7415039539337158~~}


1	+ {"global_step": 14000, "_timestamp": 1627284744.890835, "train_time": 473944.03125, "train_learning_rate": 0.0004363636835478246, "_step": 27916, "train_loss": 1.866248369216919, "eval_accuracy": 0.6557744741439819, "eval_loss": 1.725870966911316}

wandb/run-20210726_001233-17u6inbn/logs/debug-internal.log CHANGED Viewed

The diff for this file is too large to render. See raw diff

wandb/run-20210726_001233-17u6inbn/logs/debug.log CHANGED Viewed

@@ -1,27 +1,3 @@
-2021-07-26 00:12:33,307 INFO    MainThread:3000110 [wandb_setup.py:_flush():69] setting env: {}
-2021-07-26 00:12:33,307 INFO    MainThread:3000110 [wandb_setup.py:_flush():69] setting login settings: {}
-2021-07-26 00:12:33,307 INFO    MainThread:3000110 [wandb_init.py:_log_setup():337] Logging user logs to /var/hf/experiment-base-exp-512seq-stepwise/wandb/run-20210726_001233-17u6inbn/logs/debug.log
-2021-07-26 00:12:33,307 INFO    MainThread:3000110 [wandb_init.py:_log_setup():338] Logging internal logs to /var/hf/experiment-base-exp-512seq-stepwise/wandb/run-20210726_001233-17u6inbn/logs/debug-internal.log
-2021-07-26 00:12:33,307 INFO    MainThread:3000110 [wandb_init.py:init():370] calling init triggers
-2021-07-26 00:12:33,307 INFO    MainThread:3000110 [wandb_init.py:init():375] wandb.init called with sweep_config: {}
-config: {}
-2021-07-26 00:12:33,307 INFO    MainThread:3000110 [wandb_init.py:init():419] starting backend
-2021-07-26 00:12:33,307 INFO    MainThread:3000110 [backend.py:_multiprocessing_setup():70] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
-2021-07-26 00:12:33,351 INFO    MainThread:3000110 [backend.py:ensure_launched():135] starting backend process...
-2021-07-26 00:12:33,394 INFO    MainThread:3000110 [backend.py:ensure_launched():139] started backend process with pid: 3001431
-2021-07-26 00:12:33,396 INFO    MainThread:3000110 [wandb_init.py:init():424] backend started and connected
-2021-07-26 00:12:33,399 INFO    MainThread:3000110 [wandb_init.py:init():472] updated telemetry
-2021-07-26 00:12:33,400 INFO    MainThread:3000110 [wandb_init.py:init():491] communicating current version
-2021-07-26 00:12:34,050 INFO    MainThread:3000110 [wandb_init.py:init():496] got version response upgrade_message: "wandb version 0.11.0 is available!  To upgrade, please run:\n $ pip install wandb --upgrade"
-2021-07-26 00:12:34,050 INFO    MainThread:3000110 [wandb_init.py:init():504] communicating run to backend with 30 second timeout
-2021-07-26 00:12:34,261 INFO    MainThread:3000110 [wandb_init.py:init():529] starting run threads in backend
-2021-07-26 00:12:35,502 INFO    MainThread:3000110 [wandb_run.py:_console_start():1623] atexit reg
-2021-07-26 00:12:35,502 INFO    MainThread:3000110 [wandb_run.py:_redirect():1497] redirect: SettingsConsole.REDIRECT
-2021-07-26 00:12:35,503 INFO    MainThread:3000110 [wandb_run.py:_redirect():1502] Redirecting console.
-2021-07-26 00:12:35,505 INFO    MainThread:3000110 [wandb_run.py:_redirect():1558] Redirects installed.
-2021-07-26 00:12:35,505 INFO    MainThread:3000110 [wandb_init.py:init():554] run started, returning control to user process
-2021-07-26 00:12:35,506 INFO    MainThread:3000110 [wandb_run.py:_config_callback():872] config_cb None None {'output_dir': './outputs', 'overwrite_output_dir': True, 'do_train': False, 'do_eval': False, 'do_predict': False, 'evaluation_strategy': 'IntervalStrategy.NO', 'prediction_loss_only': False, 'per_device_train_batch_size': 48, 'per_device_eval_batch_size': 48, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'learning_rate': 0.0006, 'weight_decay': 0.01, 'adam_beta1': 0.9, 'adam_beta2': 0.98, 'adam_epsilon': 1e-06, 'max_grad_norm': 1.0, 'num_train_epochs': 3.0, 'max_steps': -1, 'lr_scheduler_type': 'SchedulerType.LINEAR', 'warmup_ratio': 0.0, 'warmup_steps': 500, 'log_level': -1, 'log_level_replica': -1, 'log_on_each_node': True, 'logging_dir': './outputs/runs/Jul26_00-12-25_tablespoon', 'logging_strategy': 'IntervalStrategy.STEPS', 'logging_first_step': False, 'logging_steps': 500, 'save_strategy': 'IntervalStrategy.STEPS', 'save_steps': 1000, 'save_total_limit': 5, 'save_on_each_node': False, 'no_cuda': False, 'seed': 42, 'fp16': False, 'fp16_opt_level': 'O1', 'fp16_backend': 'auto', 'fp16_full_eval': False, 'local_rank': -1, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': 1000, 'dataloader_num_workers': 0, 'past_index': -1, 'run_name': './outputs', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'sharded_ddp': [], 'deepspeed': None, 'label_smoothing_factor': 0.0, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['tensorboard', 'wandb'], 'ddp_find_unused_parameters': None, 'dataloader_pin_memory': True, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': False, 'resume_from_checkpoint': None, 'push_to_hub_model_id': 'outputs', 'push_to_hub_organization': None, 'push_to_hub_token': None, 'mp_parameters': '', '_n_gpu': 0, '__cached__setup_devices': 'cpu'}
-2021-07-26 00:12:35,507 INFO    MainThread:3000110 [wandb_run.py:_config_callback():872] config_cb None None {'model_name_or_path': 'bertin-project/bertin-base-stepwise', 'model_type': 'roberta', 'config_name': './configs/base', 'tokenizer_name': './configs/base', 'cache_dir': None, 'use_fast_tokenizer': True, 'dtype': 'bfloat16'}
-2021-07-26 00:12:35,508 INFO    MainThread:3000110 [wandb_run.py:_config_callback():872] config_cb None None {'dataset_name': 'bertin-project/mc4-es-sampled', 'dataset_config_name': 'stepwise', 'train_file': None, 'validation_file': None, 'train_ref_file': None, 'validation_ref_file': None, 'overwrite_cache': False, 'validation_split_percentage': 5, 'max_seq_length': 512, 'preprocessing_num_workers': None, 'mlm_probability': 0.15, 'pad_to_max_length': True, 'line_by_line': False, 'text_column_name': 'text', 'shuffle_buffer_size': 10000, 'num_train_steps': 50000, 'num_eval_samples': 50000}
-2021-07-26 00:12:35,587 INFO    MainThread:3000110 [wandb_run.py:_tensorboard_callback():943] tensorboard callback: outputs, None

+version https://git-lfs.github.com/spec/v1
+oid sha256:a2dd7c5185e79b60bb93ec1d9266f770360846fbb237a1f78837006b5f8269bd
+size 5866

wandb/run-20210726_001233-17u6inbn/run-17u6inbn.wandb CHANGED Viewed

Binary files a/wandb/run-20210726_001233-17u6inbn/run-17u6inbn.wandb and b/wandb/run-20210726_001233-17u6inbn/run-17u6inbn.wandb differ