[2024-03-28 17:41:18,867] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-28 17:41:22,314] [WARNING] [runner.py:202:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2024-03-28 17:41:22,320] [INFO] [runner.py:568:main] cmd = /usr/bin/python3 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None main.py --data_path Dahoas/rm-static Dahoas/full-hh-rlhf Dahoas/synthetic-instruct-gptj-pairwise yitingxie/rlhf-reward-datasets --data_split 2,4,4 --model_name_or_path facebook/opt-125m --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --max_seq_len 512 --learning_rate 1e-3 --weight_decay 0. --num_train_epochs 1 --gradient_accumulation_steps 16 --lr_scheduler_type cosine --num_warmup_steps 0 --seed 1234 --gradient_checkpointing --zero_stage 0 --lora_dim 128 --lora_module_name decoder.layers. --deepspeed --output_dir ./output [2024-03-28 17:41:25,564] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-28 17:41:27,024] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.19.3-1+cuda12.2 [2024-03-28 17:41:27,024] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_VERSION=2.19.3-1 [2024-03-28 17:41:27,024] [INFO] [launch.py:138:main] 0 NCCL_VERSION=2.19.3-1 [2024-03-28 17:41:27,024] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev [2024-03-28 17:41:27,024] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE=libnccl2=2.19.3-1+cuda12.2 [2024-03-28 17:41:27,024] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_NAME=libnccl2 [2024-03-28 17:41:27,024] [INFO] [launch.py:138:main] 0 NV_LIBNCCL_PACKAGE_VERSION=2.19.3-1 [2024-03-28 17:41:27,024] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0]} [2024-03-28 17:41:27,024] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=1, node_rank=0 [2024-03-28 17:41:27,024] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(, {'localhost': [0]}) [2024-03-28 17:41:27,024] [INFO] [launch.py:163:main] dist_world_size=1 [2024-03-28 17:41:27,024] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0 [2024-03-28 17:41:27,025] [INFO] [launch.py:253:main] process 3222 spawned with command: ['/usr/bin/python3', '-u', 'main.py', '--local_rank=0', '--data_path', 'Dahoas/rm-static', 'Dahoas/full-hh-rlhf', 'Dahoas/synthetic-instruct-gptj-pairwise', 'yitingxie/rlhf-reward-datasets', '--data_split', '2,4,4', '--model_name_or_path', 'facebook/opt-125m', '--per_device_train_batch_size', '8', '--per_device_eval_batch_size', '8', '--max_seq_len', '512', '--learning_rate', '1e-3', '--weight_decay', '0.', '--num_train_epochs', '1', '--gradient_accumulation_steps', '16', '--lr_scheduler_type', 'cosine', '--num_warmup_steps', '0', '--seed', '1234', '--gradient_checkpointing', '--zero_stage', '0', '--lora_dim', '128', '--lora_module_name', 'decoder.layers.', '--deepspeed', '--output_dir', './output'] 2024-03-28 17:41:31.850930: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-03-28 17:41:31.851031: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-03-28 17:41:31.983743: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-03-28 17:41:34.523187: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT [2024-03-28 17:41:37,255] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect) /usr/local/lib/python3.10/dist-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations warnings.warn( [2024-03-28 17:41:38,314] [INFO] [comm.py:637:init_distributed] cdb=None [2024-03-28 17:41:38,314] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root... Creating extension directory /root/.cache/torch_extensions/py310_cu121/fused_adam... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py310_cu121/fused_adam/build.ninja... Building extension module fused_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/3] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output multi_tensor_adam.cuda.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/includes -I/usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/adam -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 -std=c++17 -c /usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o [2/3] c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/includes -I/usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/adam -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -c /usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o [3/3] c++ fused_adam_frontend.o multi_tensor_adam.cuda.o -shared -L/usr/local/lib/python3.10/dist-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o fused_adam.so Loading extension module fused_adam... Time to load fused_adam op: 52.98678803443909 seconds [2024-03-28 17:43:08,993] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.14.0, git-hash=unknown, git-branch=unknown [2024-03-28 17:43:08,993] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized [2024-03-28 17:43:09,328] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2024-03-28 17:43:09,329] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer [2024-03-28 17:43:09,330] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer [2024-03-28 17:43:09,342] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam [2024-03-28 17:43:09,342] [INFO] [logging.py:96:log_dist] [Rank 0] Creating fp16 optimizer with dynamic loss scale [2024-03-28 17:43:09,484] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2024-03-28 17:43:09,484] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler [2024-03-28 17:43:09,484] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2024-03-28 17:43:09,484] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.001, 0.0005, 0.001], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 17:43:09,485] [INFO] [config.py:996:print] DeepSpeedEngine configuration: [2024-03-28 17:43:09,485] [INFO] [config.py:1000:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2024-03-28 17:43:09,485] [INFO] [config.py:1000:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2024-03-28 17:43:09,486] [INFO] [config.py:1000:print] amp_enabled .................. False [2024-03-28 17:43:09,486] [INFO] [config.py:1000:print] amp_params ................... False [2024-03-28 17:43:09,486] [INFO] [config.py:1000:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2024-03-28 17:43:09,486] [INFO] [config.py:1000:print] bfloat16_enabled ............. False [2024-03-28 17:43:09,486] [INFO] [config.py:1000:print] bfloat16_immediate_grad_update False [2024-03-28 17:43:09,486] [INFO] [config.py:1000:print] checkpoint_parallel_write_pipeline False [2024-03-28 17:43:09,486] [INFO] [config.py:1000:print] checkpoint_tag_validation_enabled True [2024-03-28 17:43:09,486] [INFO] [config.py:1000:print] checkpoint_tag_validation_fail False [2024-03-28 17:43:09,486] [INFO] [config.py:1000:print] comms_config ................. [2024-03-28 17:43:09,486] [INFO] [config.py:1000:print] communication_data_type ...... None [2024-03-28 17:43:09,486] [INFO] [config.py:1000:print] compile_config ............... enabled=False backend='inductor' kwargs={} [2024-03-28 17:43:09,486] [INFO] [config.py:1000:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2024-03-28 17:43:09,486] [INFO] [config.py:1000:print] curriculum_enabled_legacy .... False [2024-03-28 17:43:09,486] [INFO] [config.py:1000:print] curriculum_params_legacy ..... False [2024-03-28 17:43:09,486] [INFO] [config.py:1000:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2024-03-28 17:43:09,486] [INFO] [config.py:1000:print] data_efficiency_enabled ...... False [2024-03-28 17:43:09,486] [INFO] [config.py:1000:print] dataloader_drop_last ......... False [2024-03-28 17:43:09,486] [INFO] [config.py:1000:print] disable_allgather ............ False [2024-03-28 17:43:09,486] [INFO] [config.py:1000:print] dump_state ................... False [2024-03-28 17:43:09,486] [INFO] [config.py:1000:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1} [2024-03-28 17:43:09,486] [INFO] [config.py:1000:print] eigenvalue_enabled ........... False [2024-03-28 17:43:09,486] [INFO] [config.py:1000:print] eigenvalue_gas_boundary_resolution 1 [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] eigenvalue_layer_name ........ bert.encoder.layer [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] eigenvalue_layer_num ......... 0 [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] eigenvalue_max_iter .......... 100 [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] eigenvalue_stability ......... 1e-06 [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] eigenvalue_tol ............... 0.01 [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] eigenvalue_verbose ........... False [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] elasticity_enabled ........... False [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] fp16_auto_cast ............... False [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] fp16_enabled ................. True [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] fp16_master_weights_and_gradients False [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] global_rank .................. 0 [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] grad_accum_dtype ............. None [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] gradient_accumulation_steps .. 16 [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] gradient_clipping ............ 1.0 [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] gradient_predivide_factor .... 1.0 [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] graph_harvesting ............. False [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] initial_dynamic_scale ........ 65536 [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] load_universal_checkpoint .... False [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] loss_scale ................... 0 [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] memory_breakdown ............. False [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] mics_hierarchial_params_gather False [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] mics_shard_size .............. -1 [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='step1_tensorboard/ds_tensorboard_logs/', job_name='step1_model_tensorboard') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] optimizer_legacy_fusion ...... False [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] optimizer_name ............... None [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] optimizer_params ............. None [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True} [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] pld_enabled .................. False [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] pld_params ................... False [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] prescale_gradients ........... False [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] scheduler_name ............... None [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] scheduler_params ............. None [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] seq_parallel_communication_data_type torch.float32 [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] sparse_attention ............. None [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] sparse_gradients_enabled ..... False [2024-03-28 17:43:09,487] [INFO] [config.py:1000:print] steps_per_print .............. 10 [2024-03-28 17:43:09,488] [INFO] [config.py:1000:print] train_batch_size ............. 128 [2024-03-28 17:43:09,488] [INFO] [config.py:1000:print] train_micro_batch_size_per_gpu 8 [2024-03-28 17:43:09,488] [INFO] [config.py:1000:print] use_data_before_expert_parallel_ False [2024-03-28 17:43:09,488] [INFO] [config.py:1000:print] use_node_local_storage ....... False [2024-03-28 17:43:09,488] [INFO] [config.py:1000:print] wall_clock_breakdown ......... False [2024-03-28 17:43:09,488] [INFO] [config.py:1000:print] weight_quantization_config ... None [2024-03-28 17:43:09,488] [INFO] [config.py:1000:print] world_size ................... 1 [2024-03-28 17:43:09,488] [INFO] [config.py:1000:print] zero_allow_untested_optimizer False [2024-03-28 17:43:09,488] [INFO] [config.py:1000:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False, ratio=1.0) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=30000000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False pipeline_loading_checkpoint=False override_module_apply=True [2024-03-28 17:43:09,488] [INFO] [config.py:1000:print] zero_enabled ................. False [2024-03-28 17:43:09,488] [INFO] [config.py:1000:print] zero_force_ds_cpu_optimizer .. True [2024-03-28 17:43:09,488] [INFO] [config.py:1000:print] zero_optimization_stage ...... 0 [2024-03-28 17:43:09,488] [INFO] [config.py:986:print_user_config] json = { "train_batch_size": 128, "train_micro_batch_size_per_gpu": 8, "steps_per_print": 10, "zero_optimization": { "stage": 0, "offload_param": { "device": "none" }, "offload_optimizer": { "device": "none" }, "stage3_param_persistence_threshold": 1.000000e+04, "stage3_max_live_parameters": 3.000000e+07, "stage3_prefetch_bucket_size": 3.000000e+07, "memory_efficient_linear": false }, "fp16": { "enabled": true, "loss_scale_window": 100 }, "gradient_clipping": 1.0, "prescale_gradients": false, "wall_clock_breakdown": false, "hybrid_engine": { "enabled": false, "max_out_tokens": 512, "inference_tp_size": 1, "release_inference_cache": false, "pin_parameters": true, "tp_gather_partition_size": 8 }, "tensorboard": { "enabled": false, "output_path": "step1_tensorboard/ds_tensorboard_logs/", "job_name": "step1_model_tensorboard" } } ***** Running training ***** ***** Evaluating perplexity, Epoch 0/1 ***** ppl: 261.23980712890625, loss: 5.565438747406006 Beginning of Epoch 1/1, Total Micro Batches 7360 Model Parameters: 0.146 B, Latency: 0.69s, TFLOPs: 4.90, Samples/sec: 11.63, Time/seq 0.09s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.41s, TFLOPs: 8.30, Samples/sec: 19.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.89, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.96, Samples/sec: 18.90, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.89, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.87, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.88, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.88, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.96, Samples/sec: 18.90, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 8.02, Samples/sec: 19.04, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 17:44:34,980] [INFO] [fused_optimizer.py:344:_update_scale] Grad overflow on iteration 0 [2024-03-28 17:44:34,981] [INFO] [fused_optimizer.py:345:_update_scale] Reducing dynamic loss scale from 65536 to 32768.0 [2024-03-28 17:44:34,981] [INFO] [logging.py:96:log_dist] [Rank 0] Overflow detected. Skipping step. Attempted loss scale: 65536, reducing to 32768.0 Model Parameters: 0.146 B, Latency: 0.45s, TFLOPs: 7.51, Samples/sec: 17.85, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.40s, TFLOPs: 8.33, Samples/sec: 19.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 8.04, Samples/sec: 19.10, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 8.01, Samples/sec: 19.04, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.99, Samples/sec: 18.99, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 8.04, Samples/sec: 19.09, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.89, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.89, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.98, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.88, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.99, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.96, Samples/sec: 18.90, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.99, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.97, Samples/sec: 18.93, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 17:44:41,783] [INFO] [fused_optimizer.py:344:_update_scale] Grad overflow on iteration 1 [2024-03-28 17:44:41,783] [INFO] [fused_optimizer.py:345:_update_scale] Reducing dynamic loss scale from 32768.0 to 16384.0 [2024-03-28 17:44:41,783] [INFO] [logging.py:96:log_dist] [Rank 0] Overflow detected. Skipping step. Attempted loss scale: 32768.0, reducing to 16384.0 Model Parameters: 0.146 B, Latency: 0.45s, TFLOPs: 7.51, Samples/sec: 17.84, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.98, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.97, Samples/sec: 18.93, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.96, Samples/sec: 18.90, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.97, Samples/sec: 18.93, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.97, Samples/sec: 18.92, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.99, Samples/sec: 18.99, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.98, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.97, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.96, Samples/sec: 18.90, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.87, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.98, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.87, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.99, Samples/sec: 18.98, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 17:44:48,584] [INFO] [fused_optimizer.py:344:_update_scale] Grad overflow on iteration 2 [2024-03-28 17:44:48,585] [INFO] [fused_optimizer.py:345:_update_scale] Reducing dynamic loss scale from 16384.0 to 8192.0 [2024-03-28 17:44:48,585] [INFO] [logging.py:96:log_dist] [Rank 0] Overflow detected. Skipping step. Attempted loss scale: 16384.0, reducing to 8192.0 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.86, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.86, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.98, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.87, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.98, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.88, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.87, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.97, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 17:44:55,426] [INFO] [fused_optimizer.py:344:_update_scale] Grad overflow on iteration 3 [2024-03-28 17:44:55,426] [INFO] [fused_optimizer.py:345:_update_scale] Reducing dynamic loss scale from 8192.0 to 4096.0 [2024-03-28 17:44:55,426] [INFO] [logging.py:96:log_dist] [Rank 0] Overflow detected. Skipping step. Attempted loss scale: 8192.0, reducing to 4096.0 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.70, Samples/sec: 18.30, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.99, Samples/sec: 18.99, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.89, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.87, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.88, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.88, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.96, Samples/sec: 18.92, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.89, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.88, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.51s, TFLOPs: 6.56, Samples/sec: 15.57, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.97, Samples/sec: 18.92, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.08, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.05, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.17, Samples/sec: 17.03, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.72, Samples/sec: 18.34, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.75, Samples/sec: 18.41, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.72, Samples/sec: 18.33, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.75, Samples/sec: 18.42, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.74, Samples/sec: 18.40, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.73, Samples/sec: 18.36, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.75, Samples/sec: 18.42, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.14, Samples/sec: 16.97, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.70, Samples/sec: 18.29, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.72, Samples/sec: 18.34, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.72, Samples/sec: 18.35, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.43, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.74, Samples/sec: 18.38, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.42, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.72, Samples/sec: 18.34, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.71, Samples/sec: 18.31, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.74, Samples/sec: 18.39, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.73, Samples/sec: 18.36, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.74, Samples/sec: 18.38, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.65, Samples/sec: 18.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.69, Samples/sec: 18.26, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 17:45:37,224] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=4, lr=[0.0009995802740501933, 0.0004997901370250966, 0.0009995802740501933], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 17:45:37,232] [INFO] [timer.py:260:stop] epoch=0/micro_step=160/global_step=10, RunningAvgSamplesPerSec=18.56605752848472, CurrSamplesPerSec=18.265046753772538, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.48s, TFLOPs: 7.04, Samples/sec: 16.71, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.74, Samples/sec: 18.38, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.73, Samples/sec: 18.36, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.73, Samples/sec: 18.36, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.74, Samples/sec: 18.37, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.73, Samples/sec: 18.37, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.72, Samples/sec: 18.34, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.72, Samples/sec: 18.34, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.74, Samples/sec: 18.39, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.15, Samples/sec: 16.98, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.72, Samples/sec: 18.35, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.43, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.42, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.43, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.74, Samples/sec: 18.40, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.74, Samples/sec: 18.38, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.16, Samples/sec: 17.00, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.07, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.72, Samples/sec: 18.33, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.17, Samples/sec: 17.03, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.15, Samples/sec: 17.00, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.42, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.26, Samples/sec: 17.24, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.96, Samples/sec: 18.91, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.13, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.20, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 17:46:46,808] [INFO] [logging.py:96:log_dist] [Rank 0] step=20, skipped=4, lr=[0.0009970178336161017, 0.0004985089168080509, 0.0009970178336161017], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 17:46:46,819] [INFO] [timer.py:260:stop] epoch=0/micro_step=320/global_step=20, RunningAvgSamplesPerSec=18.521262837086557, CurrSamplesPerSec=18.553537475925577, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.14, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.11, Samples/sec: 16.89, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.05, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.12, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.16, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.43, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.16, Samples/sec: 17.00, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.88, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.11, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 17:47:56,268] [INFO] [logging.py:96:log_dist] [Rank 0] step=30, skipped=4, lr=[0.0009921380666088558, 0.0004960690333044279, 0.0009921380666088558], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 17:47:56,278] [INFO] [timer.py:260:stop] epoch=0/micro_step=480/global_step=30, RunningAvgSamplesPerSec=18.5225680687781, CurrSamplesPerSec=18.574112822692964, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.09, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.14, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.12, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.71, Samples/sec: 18.31, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.68, Samples/sec: 18.24, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.62, Samples/sec: 18.10, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.13, Samples/sec: 16.93, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.16, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.96, Samples/sec: 18.90, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.18, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.96, Samples/sec: 18.92, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.27, Samples/sec: 17.26, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.87, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.16, Samples/sec: 17.00, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.86, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.29, Samples/sec: 17.32, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 17:49:05,550] [INFO] [logging.py:96:log_dist] [Rank 0] step=40, skipped=4, lr=[0.0009849637247548357, 0.0004924818623774179, 0.0009849637247548357], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 17:49:05,560] [INFO] [timer.py:260:stop] epoch=0/micro_step=640/global_step=40, RunningAvgSamplesPerSec=18.53541176531461, CurrSamplesPerSec=18.561715550861805, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.10, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.15, Samples/sec: 16.99, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.75, Samples/sec: 18.41, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.20, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.06, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.14, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.09, Samples/sec: 16.85, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.11, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.06, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.09, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 17:50:15,022] [INFO] [logging.py:96:log_dist] [Rank 0] step=50, skipped=4, lr=[0.0009755282581475768, 0.0004877641290737884, 0.0009755282581475768], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 17:50:15,032] [INFO] [timer.py:260:stop] epoch=0/micro_step=800/global_step=50, RunningAvgSamplesPerSec=18.53164373995434, CurrSamplesPerSec=18.5311845061101, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.11, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.69, Samples/sec: 18.28, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.11, Samples/sec: 16.89, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.09, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.15, Samples/sec: 16.98, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.48s, TFLOPs: 7.06, Samples/sec: 16.78, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.74, Samples/sec: 18.40, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.29, Samples/sec: 17.31, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.74, Samples/sec: 18.38, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 17:51:24,435] [INFO] [logging.py:96:log_dist] [Rank 0] step=60, skipped=4, lr=[0.0009638756592879923, 0.0004819378296439961, 0.0009638756592879923], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 17:51:24,445] [INFO] [timer.py:260:stop] epoch=0/micro_step=960/global_step=60, RunningAvgSamplesPerSec=18.531986650999862, CurrSamplesPerSec=18.5168937957388, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.11, Samples/sec: 16.89, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.16, Samples/sec: 17.01, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.10, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.48s, TFLOPs: 7.09, Samples/sec: 16.84, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.86, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.16, Samples/sec: 17.02, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.86, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.25, Samples/sec: 17.22, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.06, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.10, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 17:52:33,847] [INFO] [logging.py:96:log_dist] [Rank 0] step=70, skipped=4, lr=[0.0009500602579710256, 0.0004750301289855128, 0.0009500602579710256], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 17:52:33,857] [INFO] [timer.py:260:stop] epoch=0/micro_step=1120/global_step=70, RunningAvgSamplesPerSec=18.532704211474588, CurrSamplesPerSec=18.53214273829954, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.13, Samples/sec: 16.94, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.06, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.48s, TFLOPs: 7.04, Samples/sec: 16.72, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.09, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.89, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.10, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.05, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.26, Samples/sec: 17.25, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 17:53:43,219] [INFO] [logging.py:96:log_dist] [Rank 0] step=80, skipped=4, lr=[0.0009341464679750669, 0.00046707323398753343, 0.0009341464679750669], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 17:53:43,229] [INFO] [timer.py:260:stop] epoch=0/micro_step=1280/global_step=80, RunningAvgSamplesPerSec=18.534281748554193, CurrSamplesPerSec=18.544834263618963, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.07, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.17, Samples/sec: 17.03, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.08, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.12, Samples/sec: 16.92, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.43, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.09, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.24, Samples/sec: 17.21, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.16, Samples/sec: 17.01, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 17:54:52,545] [INFO] [logging.py:96:log_dist] [Rank 0] step=90, skipped=4, lr=[0.0009162084867351841, 0.00045810424336759206, 0.0009162084867351841], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 17:54:52,555] [INFO] [timer.py:260:stop] epoch=0/micro_step=1440/global_step=90, RunningAvgSamplesPerSec=18.53672133939495, CurrSamplesPerSec=18.550609644621343, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.16, Samples/sec: 17.02, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.25, Samples/sec: 17.21, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.06, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.06, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.11, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.75, Samples/sec: 18.40, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.25, Samples/sec: 17.22, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.87, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.08, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.13, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.18, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.16, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 17:56:01,922] [INFO] [logging.py:96:log_dist] [Rank 0] step=100, skipped=4, lr=[0.0008963299494004291, 0.00044816497470021456, 0.0008963299494004291], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 17:56:01,932] [INFO] [timer.py:260:stop] epoch=0/micro_step=1600/global_step=100, RunningAvgSamplesPerSec=18.538191729389002, CurrSamplesPerSec=18.55355991743927, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.12, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.20, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.88, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.14, Samples/sec: 16.96, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.09, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 17:56:36,567] [INFO] [fused_optimizer.py:352:_update_scale] No Grad overflow for 100 iterations [2024-03-28 17:56:36,567] [INFO] [fused_optimizer.py:353:_update_scale] Increasing dynamic loss scale from 4096.0 to 8192.0 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.20, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.25, Samples/sec: 17.23, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.15, Samples/sec: 16.98, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.75, Samples/sec: 18.42, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.18, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 17:57:11,251] [INFO] [logging.py:96:log_dist] [Rank 0] step=110, skipped=4, lr=[0.0008746035388881655, 0.00043730176944408274, 0.0008746035388881655], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 17:57:11,261] [INFO] [timer.py:260:stop] epoch=0/micro_step=1760/global_step=110, RunningAvgSamplesPerSec=18.539821661701772, CurrSamplesPerSec=18.54142378151915, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.08, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.10, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.96, Samples/sec: 18.91, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.07, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.16, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.14, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.48s, TFLOPs: 7.03, Samples/sec: 16.70, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.26, Samples/sec: 17.24, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.27, Samples/sec: 17.27, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.17, Samples/sec: 17.04, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.75, Samples/sec: 18.40, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 17:58:20,661] [INFO] [logging.py:96:log_dist] [Rank 0] step=120, skipped=4, lr=[0.0008511305537535237, 0.00042556527687676184, 0.0008511305537535237], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 17:58:20,669] [INFO] [timer.py:260:stop] epoch=0/micro_step=1920/global_step=120, RunningAvgSamplesPerSec=18.53942202997731, CurrSamplesPerSec=18.514011993757105, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.48s, TFLOPs: 7.06, Samples/sec: 16.77, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.74, Samples/sec: 18.38, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.92, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.17, Samples/sec: 17.03, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.26, Samples/sec: 17.24, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.16, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.72, Samples/sec: 18.33, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.24, Samples/sec: 17.21, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.48s, TFLOPs: 7.04, Samples/sec: 16.73, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.18, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 17:59:30,153] [INFO] [logging.py:96:log_dist] [Rank 0] step=130, skipped=4, lr=[0.0008260204358887753, 0.00041301021794438764, 0.0008260204358887753], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 17:59:30,163] [INFO] [timer.py:260:stop] epoch=0/micro_step=2080/global_step=130, RunningAvgSamplesPerSec=18.537646420989645, CurrSamplesPerSec=18.534955318981456, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.08, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.27, Samples/sec: 17.26, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.12, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.24, Samples/sec: 17.21, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.25, Samples/sec: 17.22, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.14, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.86, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.86, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.26, Samples/sec: 17.25, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.48s, TFLOPs: 7.02, Samples/sec: 16.67, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.89, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:00:39,477] [INFO] [logging.py:96:log_dist] [Rank 0] step=140, skipped=4, lr=[0.0007993902602547113, 0.00039969513012735566, 0.0007993902602547113], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:00:39,487] [INFO] [timer.py:260:stop] epoch=0/micro_step=2240/global_step=140, RunningAvgSamplesPerSec=18.538940857126306, CurrSamplesPerSec=18.58041444840641, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.24, Samples/sec: 17.20, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.26, Samples/sec: 17.25, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.96, Samples/sec: 18.90, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.28, Samples/sec: 17.30, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.27, Samples/sec: 17.28, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.16, Samples/sec: 17.00, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.97, Samples/sec: 18.92, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.14, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.16, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.18, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:01:48,758] [INFO] [logging.py:96:log_dist] [Rank 0] step=150, skipped=4, lr=[0.0007713641890231309, 0.00038568209451156544, 0.0007713641890231309], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:01:48,767] [INFO] [timer.py:260:stop] epoch=0/micro_step=2400/global_step=150, RunningAvgSamplesPerSec=18.540864782927592, CurrSamplesPerSec=18.58180867314579, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.26, Samples/sec: 17.26, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.48s, TFLOPs: 7.07, Samples/sec: 16.79, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.08, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.05, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.70, Samples/sec: 18.28, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.64, Samples/sec: 18.16, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.72, Samples/sec: 18.35, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.09, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.09, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.13, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.72, Samples/sec: 18.33, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.11, Samples/sec: 16.90, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.11, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:02:58,214] [INFO] [logging.py:96:log_dist] [Rank 0] step=160, skipped=4, lr=[0.0007420728926754803, 0.00037103644633774014, 0.0007420728926754803], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:02:58,224] [INFO] [timer.py:260:stop] epoch=0/micro_step=2560/global_step=160, RunningAvgSamplesPerSec=18.540070897154596, CurrSamplesPerSec=18.543650540081565, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.25, Samples/sec: 17.22, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.08, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.13, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.17, Samples/sec: 17.04, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.96, Samples/sec: 18.91, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.09, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.75, Samples/sec: 18.41, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.26, Samples/sec: 17.24, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.86, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.16, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.14, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.92, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.26, Samples/sec: 17.25, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.26, Samples/sec: 17.26, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.88, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:04:07,548] [INFO] [logging.py:96:log_dist] [Rank 0] step=170, skipped=4, lr=[0.0007116529407567489, 0.00035582647037837444, 0.0007116529407567489], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:04:07,556] [INFO] [timer.py:260:stop] epoch=0/micro_step=2720/global_step=170, RunningAvgSamplesPerSec=18.540883194631782, CurrSamplesPerSec=18.537260531298593, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.09, Samples/sec: 16.85, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.13, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.16, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.26, Samples/sec: 17.24, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.09, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.18, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.88, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.12, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.20, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.20, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:05:16,884] [INFO] [logging.py:96:log_dist] [Rank 0] step=180, skipped=4, lr=[0.0006802461651252074, 0.0003401230825626037, 0.0006802461651252074], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:05:16,895] [INFO] [timer.py:260:stop] epoch=0/micro_step=2880/global_step=180, RunningAvgSamplesPerSec=18.541628889518975, CurrSamplesPerSec=18.541742680475863, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.11, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.48s, TFLOPs: 7.07, Samples/sec: 16.79, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.09, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.75, Samples/sec: 18.42, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.11, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.07, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.48s, TFLOPs: 7.07, Samples/sec: 16.79, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.26, Samples/sec: 17.24, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:06:26,363] [INFO] [logging.py:96:log_dist] [Rank 0] step=190, skipped=4, lr=[0.0006479989986668118, 0.0003239994993334059, 0.0006479989986668118], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:06:26,373] [INFO] [timer.py:260:stop] epoch=0/micro_step=3040/global_step=190, RunningAvgSamplesPerSec=18.54060864711078, CurrSamplesPerSec=18.559926522355102, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.07, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.25, Samples/sec: 17.23, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.17, Samples/sec: 17.04, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.12, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.96, Samples/sec: 18.90, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.08, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.20, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.86, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.26, Samples/sec: 17.24, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.20, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.16, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.43, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:07:35,699] [INFO] [logging.py:96:log_dist] [Rank 0] step=200, skipped=4, lr=[0.0006150617925574932, 0.0003075308962787466, 0.0006150617925574932], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:07:35,709] [INFO] [timer.py:260:stop] epoch=0/micro_step=3200/global_step=200, RunningAvgSamplesPerSec=18.541272676713408, CurrSamplesPerSec=18.519528622966536, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.05, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.14, Samples/sec: 16.95, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.74, Samples/sec: 18.40, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.08, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.74, Samples/sec: 18.39, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.18, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:08:10,364] [INFO] [fused_optimizer.py:352:_update_scale] No Grad overflow for 100 iterations [2024-03-28 18:08:10,365] [INFO] [fused_optimizer.py:353:_update_scale] Increasing dynamic loss scale from 8192.0 to 16384.0 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.08, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.16, Samples/sec: 17.00, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.25, Samples/sec: 17.22, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.09, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:08:45,080] [INFO] [logging.py:96:log_dist] [Rank 0] step=210, skipped=4, lr=[0.0005815881152565712, 0.0002907940576282856, 0.0005815881152565712], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:08:45,089] [INFO] [timer.py:260:stop] epoch=0/micro_step=3360/global_step=210, RunningAvgSamplesPerSec=18.541347512651587, CurrSamplesPerSec=18.548024279639346, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.18, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.96, Samples/sec: 18.90, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.16, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.10, Samples/sec: 16.88, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.25, Samples/sec: 17.23, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.18, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.13, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.30, Samples/sec: 17.35, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.14, Samples/sec: 16.96, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.24, Samples/sec: 17.20, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.87, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.74, Samples/sec: 18.38, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:09:54,442] [INFO] [logging.py:96:log_dist] [Rank 0] step=220, skipped=4, lr=[0.0005477340364997051, 0.00027386701824985254, 0.0005477340364997051], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:09:54,452] [INFO] [timer.py:260:stop] epoch=0/micro_step=3520/global_step=220, RunningAvgSamplesPerSec=18.541620302981933, CurrSamplesPerSec=18.557348181675394, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.27, Samples/sec: 17.27, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.27, Samples/sec: 17.28, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.73, Samples/sec: 18.37, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.17, Samples/sec: 17.03, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.72, Samples/sec: 18.35, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.20, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.11, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.86, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.16, Samples/sec: 17.01, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.25, Samples/sec: 17.23, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:11:03,755] [INFO] [logging.py:96:log_dist] [Rank 0] step=230, skipped=4, lr=[0.000513657399629743, 0.0002568286998148715, 0.000513657399629743], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:11:03,763] [INFO] [timer.py:260:stop] epoch=0/micro_step=3680/global_step=230, RunningAvgSamplesPerSec=18.54232879236137, CurrSamplesPerSec=18.552225062397287, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.14, Samples/sec: 16.97, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.24, Samples/sec: 17.21, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.43, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.25, Samples/sec: 17.21, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.27, Samples/sec: 17.26, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.43, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.25, Samples/sec: 17.23, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.18, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.09, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.11, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.10, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:12:13,075] [INFO] [logging.py:96:log_dist] [Rank 0] step=240, skipped=4, lr=[0.00047951708565819283, 0.00023975854282909641, 0.00047951708565819283], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:12:13,085] [INFO] [timer.py:260:stop] epoch=0/micro_step=3840/global_step=240, RunningAvgSamplesPerSec=18.542964081744767, CurrSamplesPerSec=18.555372729859226, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.96, Samples/sec: 18.91, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.18, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.48s, TFLOPs: 7.08, Samples/sec: 16.82, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.26, Samples/sec: 17.24, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.14, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.74, Samples/sec: 18.38, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.27, Samples/sec: 17.28, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.75, Samples/sec: 18.41, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.07, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.15, Samples/sec: 17.00, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:13:22,535] [INFO] [logging.py:96:log_dist] [Rank 0] step=250, skipped=4, lr=[0.0004454722724886051, 0.00022273613624430256, 0.0004454722724886051], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:13:22,545] [INFO] [timer.py:260:stop] epoch=0/micro_step=4000/global_step=250, RunningAvgSamplesPerSec=18.542268065021528, CurrSamplesPerSec=18.490867551442975, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.07, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.12, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.26, Samples/sec: 17.25, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.12, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.98, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.25, Samples/sec: 17.22, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.14, Samples/sec: 16.97, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.05, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.12, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.16, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:14:31,952] [INFO] [logging.py:96:log_dist] [Rank 0] step=260, skipped=4, lr=[0.0004116816927557063, 0.00020584084637785316, 0.0004116816927557063], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:14:31,962] [INFO] [timer.py:260:stop] epoch=0/micro_step=4160/global_step=260, RunningAvgSamplesPerSec=18.54204101844016, CurrSamplesPerSec=18.524135143596048, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.15, Samples/sec: 16.99, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.90, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.48s, TFLOPs: 7.01, Samples/sec: 16.66, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.13, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.16, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.74, Samples/sec: 18.38, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.68, Samples/sec: 18.25, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.18, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.87, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.26, Samples/sec: 17.24, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.08, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:15:41,368] [INFO] [logging.py:96:log_dist] [Rank 0] step=270, skipped=4, lr=[0.00037830289374058215, 0.00018915144687029107, 0.00037830289374058215], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:15:41,378] [INFO] [timer.py:260:stop] epoch=0/micro_step=4320/global_step=270, RunningAvgSamplesPerSec=18.54180541547887, CurrSamplesPerSec=18.561755339490563, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.10, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.06, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.75, Samples/sec: 18.41, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.14, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.48s, TFLOPs: 7.02, Samples/sec: 16.67, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.74, Samples/sec: 18.38, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.66, Samples/sec: 18.19, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.15, Samples/sec: 16.98, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.17, Samples/sec: 17.04, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.74, Samples/sec: 18.39, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.25, Samples/sec: 17.23, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.12, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.87, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:16:50,749] [INFO] [logging.py:96:log_dist] [Rank 0] step=280, skipped=4, lr=[0.00034549150281252633, 0.00017274575140626317, 0.00034549150281252633], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:16:50,759] [INFO] [timer.py:260:stop] epoch=0/micro_step=4480/global_step=280, RunningAvgSamplesPerSec=18.541759581201212, CurrSamplesPerSec=18.575196323371934, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.16, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.97, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.20, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.88, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.48s, TFLOPs: 7.04, Samples/sec: 16.73, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.10, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.25, Samples/sec: 17.23, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.24, Samples/sec: 17.21, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.87, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.07, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.12, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:18:00,135] [INFO] [logging.py:96:log_dist] [Rank 0] step=290, skipped=4, lr=[0.00031340050182240435, 0.00015670025091120218, 0.00031340050182240435], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:18:00,145] [INFO] [timer.py:260:stop] epoch=0/micro_step=4640/global_step=290, RunningAvgSamplesPerSec=18.541661340455345, CurrSamplesPerSec=18.532257246638082, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.20, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.98, Samples/sec: 18.96, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.26, Samples/sec: 17.24, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.11, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.13, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.10, Samples/sec: 16.87, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.18, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.13, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.98, Samples/sec: 18.95, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.18, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.16, Samples/sec: 17.02, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:19:09,534] [INFO] [logging.py:96:log_dist] [Rank 0] step=300, skipped=4, lr=[0.00028217951383064543, 0.00014108975691532271, 0.00028217951383064543], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:19:09,544] [INFO] [timer.py:260:stop] epoch=0/micro_step=4800/global_step=300, RunningAvgSamplesPerSec=18.54155666119789, CurrSamplesPerSec=18.53944083481109, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.15, Samples/sec: 16.99, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.13, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.14, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.75, Samples/sec: 18.41, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.70, Samples/sec: 18.30, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.73, Samples/sec: 18.37, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.68, Samples/sec: 18.24, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.12, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:19:44,250] [INFO] [fused_optimizer.py:352:_update_scale] No Grad overflow for 100 iterations [2024-03-28 18:19:44,250] [INFO] [fused_optimizer.py:353:_update_scale] Increasing dynamic loss scale from 16384.0 to 32768.0 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.11, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.14, Samples/sec: 16.95, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.17, Samples/sec: 17.02, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.07, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.09, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.90, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:20:18,972] [INFO] [logging.py:96:log_dist] [Rank 0] step=310, skipped=4, lr=[0.00025197410549546596, 0.00012598705274773298, 0.00025197410549546596], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:20:18,982] [INFO] [timer.py:260:stop] epoch=0/micro_step=4960/global_step=310, RunningAvgSamplesPerSec=18.541095193008974, CurrSamplesPerSec=18.55957876703722, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.97, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.16, Samples/sec: 17.01, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.12, Samples/sec: 16.93, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.18, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.27, Samples/sec: 17.27, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.11, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.14, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:21:28,386] [INFO] [logging.py:96:log_dist] [Rank 0] step=320, skipped=4, lr=[0.00022292510837391267, 0.00011146255418695633, 0.00022292510837391267], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:21:28,394] [INFO] [timer.py:260:stop] epoch=0/micro_step=5120/global_step=320, RunningAvgSamplesPerSec=18.540819421126987, CurrSamplesPerSec=18.522821134686904, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.11, Samples/sec: 16.90, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.71, Samples/sec: 18.32, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.12, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.68, Samples/sec: 18.24, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.09, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.86, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.30, Samples/sec: 17.33, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.09, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.15, Samples/sec: 17.00, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.05, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.73, Samples/sec: 18.37, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.17, Samples/sec: 17.04, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.25, Samples/sec: 17.22, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:22:37,790] [INFO] [logging.py:96:log_dist] [Rank 0] step=330, skipped=4, lr=[0.00019516796230013272, 9.758398115006636e-05, 0.00019516796230013272], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:22:37,800] [INFO] [timer.py:260:stop] epoch=0/micro_step=5280/global_step=330, RunningAvgSamplesPerSec=18.540594245410674, CurrSamplesPerSec=18.53455346947388, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.12, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.75, Samples/sec: 18.42, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.58, Samples/sec: 18.00, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.13, Samples/sec: 16.93, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.72, Samples/sec: 18.34, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.26, Samples/sec: 17.25, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.05, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.07, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.17, Samples/sec: 17.04, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.08, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.24, Samples/sec: 17.21, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.86, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.12, Samples/sec: 16.92, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:23:47,228] [INFO] [logging.py:96:log_dist] [Rank 0] step=340, skipped=4, lr=[0.00016883208390234628, 8.441604195117314e-05, 0.00016883208390234628], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:23:47,238] [INFO] [timer.py:260:stop] epoch=0/micro_step=5440/global_step=340, RunningAvgSamplesPerSec=18.540163025165192, CurrSamplesPerSec=18.543503225756535, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.14, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.12, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.13, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.14, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.11, Samples/sec: 16.90, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.74, Samples/sec: 18.40, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.24, Samples/sec: 17.21, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.43, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.14, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.17, Samples/sec: 17.03, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.75, Samples/sec: 18.41, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.17, Samples/sec: 17.04, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.17, Samples/sec: 17.03, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:24:56,705] [INFO] [logging.py:96:log_dist] [Rank 0] step=350, skipped=4, lr=[0.00014404026320278317, 7.202013160139159e-05, 0.00014404026320278317], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:24:56,714] [INFO] [timer.py:260:stop] epoch=0/micro_step=5600/global_step=350, RunningAvgSamplesPerSec=18.539456757323265, CurrSamplesPerSec=18.49183053552749, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.14, Samples/sec: 16.96, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.13, Samples/sec: 16.93, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.75, Samples/sec: 18.41, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.25, Samples/sec: 17.23, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.20, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.24, Samples/sec: 17.21, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.16, Samples/sec: 17.01, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.16, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.17, Samples/sec: 17.04, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.20, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:26:06,107] [INFO] [logging.py:96:log_dist] [Rank 0] step=360, skipped=4, lr=[0.00012090809111391871, 6.0454045556959356e-05, 0.00012090809111391871], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:26:06,117] [INFO] [timer.py:260:stop] epoch=0/micro_step=5760/global_step=360, RunningAvgSamplesPerSec=18.539331396865737, CurrSamplesPerSec=18.53951830071153, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.11, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.13, Samples/sec: 16.94, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.72, Samples/sec: 18.33, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.71, Samples/sec: 18.32, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.45s, TFLOPs: 7.55, Samples/sec: 17.94, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.43, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.14, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.16, Samples/sec: 17.01, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.18, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.16, Samples/sec: 17.01, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.18, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.25, Samples/sec: 17.22, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.12, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.20, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:27:15,544] [INFO] [logging.py:96:log_dist] [Rank 0] step=370, skipped=4, lr=[9.95434205002792e-05, 4.97717102501396e-05, 9.95434205002792e-05], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:27:15,552] [INFO] [timer.py:260:stop] epoch=0/micro_step=5920/global_step=370, RunningAvgSamplesPerSec=18.53903805492127, CurrSamplesPerSec=18.511187255475914, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.11, Samples/sec: 16.89, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.16, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.13, Samples/sec: 16.94, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.18, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.74, Samples/sec: 18.38, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.15, Samples/sec: 16.99, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.08, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.73, Samples/sec: 18.36, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.11, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.06, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.11, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:28:25,002] [INFO] [logging.py:96:log_dist] [Rank 0] step=380, skipped=4, lr=[8.004586331860176e-05, 4.002293165930088e-05, 8.004586331860176e-05], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:28:25,013] [INFO] [timer.py:260:stop] epoch=0/micro_step=6080/global_step=380, RunningAvgSamplesPerSec=18.53848940928051, CurrSamplesPerSec=18.53713059986594, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.10, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.86, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.12, Samples/sec: 16.90, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.75, Samples/sec: 18.41, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.73, Samples/sec: 18.36, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.26, Samples/sec: 17.24, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.07, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.12, Samples/sec: 16.92, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.42, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.16, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.09, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.24, Samples/sec: 17.21, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.97, Samples/sec: 18.94, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.48s, TFLOPs: 7.08, Samples/sec: 16.82, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:29:34,439] [INFO] [logging.py:96:log_dist] [Rank 0] step=390, skipped=4, lr=[6.250632618090867e-05, 3.1253163090454336e-05, 6.250632618090867e-05], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:29:34,449] [INFO] [timer.py:260:stop] epoch=0/micro_step=6240/global_step=390, RunningAvgSamplesPerSec=18.538186334757402, CurrSamplesPerSec=18.561331150152675, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.17, Samples/sec: 17.03, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.16, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.87, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.27, Samples/sec: 17.26, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.16, Samples/sec: 17.02, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.18, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.25, Samples/sec: 17.23, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.10, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.17, Samples/sec: 17.04, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.13, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.75, Samples/sec: 18.41, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:30:43,846] [INFO] [logging.py:96:log_dist] [Rank 0] step=400, skipped=4, lr=[4.700658650591827e-05, 2.3503293252959136e-05, 4.700658650591827e-05], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:30:43,855] [INFO] [timer.py:260:stop] epoch=0/micro_step=6400/global_step=400, RunningAvgSamplesPerSec=18.53816415143405, CurrSamplesPerSec=18.52424955299934, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.20, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.15, Samples/sec: 16.99, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.81, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.75, Samples/sec: 18.42, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.11, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.08, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:31:18,524] [INFO] [fused_optimizer.py:352:_update_scale] No Grad overflow for 100 iterations [2024-03-28 18:31:18,525] [INFO] [fused_optimizer.py:353:_update_scale] Increasing dynamic loss scale from 32768.0 to 65536.0 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.95, Samples/sec: 18.89, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.14, Samples/sec: 16.97, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.14, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.18, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.13, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.43, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:31:53,257] [INFO] [logging.py:96:log_dist] [Rank 0] step=410, skipped=4, lr=[3.361891123496824e-05, 1.680945561748412e-05, 3.361891123496824e-05], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:31:53,267] [INFO] [timer.py:260:stop] epoch=0/micro_step=6560/global_step=410, RunningAvgSamplesPerSec=18.538064702182602, CurrSamplesPerSec=18.515605717972974, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.11, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.15, Samples/sec: 16.99, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.13, Samples/sec: 16.95, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.12, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.24, Samples/sec: 17.21, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.85, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.16, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.79, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.70, Samples/sec: 18.28, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.13, Samples/sec: 16.93, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.07, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:33:02,725] [INFO] [logging.py:96:log_dist] [Rank 0] step=420, skipped=4, lr=[2.240571989017598e-05, 1.120285994508799e-05, 2.240571989017598e-05], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:33:02,735] [INFO] [timer.py:260:stop] epoch=0/micro_step=6720/global_step=420, RunningAvgSamplesPerSec=18.537642294087007, CurrSamplesPerSec=18.521468334887395, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.06, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.05, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.42, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.07, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.26, Samples/sec: 17.24, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.11, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.18, Samples/sec: 17.04, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.74, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.09, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.74, Samples/sec: 18.39, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.13, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.08, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.42, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.18, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.80, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:34:12,227] [INFO] [logging.py:96:log_dist] [Rank 0] step=430, skipped=4, lr=[1.3419293545812339e-05, 6.709646772906169e-06, 1.3419293545812339e-05], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:34:12,235] [INFO] [timer.py:260:stop] epoch=0/micro_step=6880/global_step=430, RunningAvgSamplesPerSec=18.537000532675403, CurrSamplesPerSec=18.5183238549837, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.12, Samples/sec: 16.91, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.96, Samples/sec: 18.90, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.91, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.73, Samples/sec: 18.37, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.23, Samples/sec: 17.17, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.48s, TFLOPs: 7.06, Samples/sec: 16.78, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.11, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.75, Samples/sec: 18.41, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.11, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.14, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.75, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.42, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.25, Samples/sec: 17.22, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.74, Samples/sec: 18.38, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:35:21,793] [INFO] [logging.py:96:log_dist] [Rank 0] step=440, skipped=4, lr=[6.7015310697990384e-06, 3.3507655348995192e-06, 6.7015310697990384e-06], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:35:21,803] [INFO] [timer.py:260:stop] epoch=0/micro_step=7040/global_step=440, RunningAvgSamplesPerSec=18.536085201376515, CurrSamplesPerSec=18.49774884773867, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.15, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.74, Samples/sec: 18.40, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.14, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.78, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.14, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.43, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.10, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.13, Samples/sec: 16.95, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.46s, TFLOPs: 7.26, Samples/sec: 17.25, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.13, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.45, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.13, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.10, Samples/sec: 16.88, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.14, Samples/sec: 16.95, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.77, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.75, Samples/sec: 18.40, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:36:31,355] [INFO] [logging.py:96:log_dist] [Rank 0] step=450, skipped=4, lr=[2.283753771845587e-06, 1.1418768859227934e-06, 2.283753771845587e-06], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:36:31,364] [INFO] [timer.py:260:stop] epoch=0/micro_step=7200/global_step=450, RunningAvgSamplesPerSec=18.535276849414952, CurrSamplesPerSec=18.503277470213675, MemAllocated=1.37GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.19, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.46, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.19, Samples/sec: 17.08, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.13, Samples/sec: 16.94, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.44s, TFLOPs: 7.73, Samples/sec: 18.37, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.11, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.92, Samples/sec: 18.82, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.71, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.11, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.72, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.49, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.24, Samples/sec: 17.20, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.84, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.43, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.88, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.16, Samples/sec: 17.01, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.70, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.65, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.22, Samples/sec: 17.14, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.93, Samples/sec: 18.83, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.64, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.50, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.56, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.21, Samples/sec: 17.12, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.94, Samples/sec: 18.87, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.79, Samples/sec: 18.51, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.52, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.78, Samples/sec: 18.48, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.87, Samples/sec: 18.69, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.77, Samples/sec: 18.47, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.85, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.66, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.63, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.54, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.89, Samples/sec: 18.73, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.47s, TFLOPs: 7.20, Samples/sec: 17.10, Time/seq 0.06s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.90, Samples/sec: 18.76, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.67, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.86, Samples/sec: 18.68, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.81, Samples/sec: 18.55, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.57, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.58, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.76, Samples/sec: 18.44, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.82, Samples/sec: 18.59, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.80, Samples/sec: 18.53, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.62, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.83, Samples/sec: 18.60, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 Model Parameters: 0.146 B, Latency: 0.43s, TFLOPs: 7.84, Samples/sec: 18.61, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 [2024-03-28 18:37:40,819] [INFO] [logging.py:96:log_dist] [Rank 0] step=460, skipped=4, lr=[1.8655936904465876e-07, 9.327968452232938e-08, 1.8655936904465876e-07], mom=[(0.9, 0.95), (0.9, 0.95), (0.9, 0.95)] [2024-03-28 18:37:40,829] [INFO] [timer.py:260:stop] epoch=0/micro_step=7360/global_step=460, RunningAvgSamplesPerSec=18.535013636837004, CurrSamplesPerSec=18.634074116958562, MemAllocated=1.32GB, MaxMemAllocated=2.8GB Model Parameters: 0.146 B, Latency: 0.42s, TFLOPs: 7.98, Samples/sec: 18.97, Time/seq 0.05s, Batch Size: 8, Sequence Length: 512 ***** Evaluating perplexity, Epoch 1/1 ***** ppl: 2.4226245880126953, loss: 0.8848514556884766 saving the final model ... [2024-03-28 18:39:12,621] [INFO] [launch.py:348:main] Process 3222 exits successfully.