| 13:4: not a valid test operator: ( |
| 13:4: not a valid test operator: 535.86.10 |
| 2026-04-28 05:14:39.684201: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 |
| WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them. |
| Traceback (most recent call last): |
| File "/workspace/finetune/main_chars_lstm.py", line 26, in <module> |
| import tensorflow as tf |
| File "/usr/local/lib/python3.8/dist-packages/tensorflow/__init__.py", line 101, in <module> |
| from tensorflow_core import * |
| File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/__init__.py", line 28, in <module> |
| from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import |
| File "/usr/local/lib/python3.8/dist-packages/tensorflow/__init__.py", line 50, in __getattr__ |
| module = self._load() |
| File "/usr/local/lib/python3.8/dist-packages/tensorflow/__init__.py", line 44, in _load |
| module = _importlib.import_module(self.__name__) |
| File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module |
| return _bootstrap._gcd_import(name[level:], package, level) |
| File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/__init__.py", line 63, in <module> |
| from tensorflow.python.framework.framework_lib import * # pylint: disable=redefined-builtin |
| File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/framework_lib.py", line 52, in <module> |
| from tensorflow.python.framework.importer import import_graph_def |
| File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/importer.py", line 28, in <module> |
| from tensorflow.python.framework import function |
| File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/framework/function.py", line 38, in <module> |
| from tensorflow.python.ops import variable_scope as vs |
| File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/variable_scope.py", line 40, in <module> |
| from tensorflow.python.ops import init_ops |
| File "/usr/local/lib/python3.8/dist-packages/tensorflow_core/python/ops/init_ops.py", line 45, in <module> |
| from tensorflow.python.ops import linalg_ops_impl |
| File "<frozen importlib._bootstrap>", line 991, in _find_and_load |
| File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked |
| File "<frozen importlib._bootstrap>", line 671, in _load_unlocked |
| File "<frozen importlib._bootstrap_external>", line 844, in exec_module |
| File "<frozen importlib._bootstrap_external>", line 939, in get_code |
| File "<frozen importlib._bootstrap_external>", line 1038, in get_data |
| KeyboardInterrupt |
| 13:4: not a valid test operator: ( |
| 13:4: not a valid test operator: 535.86.10 |
| 2026-04-28 05:14:49.627917: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 |
| WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them. |
| WARNING:tensorflow:From /workspace/finetune/main_chars_lstm.py:36: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead. |
|
|
| [train] params: {"batch_size": 128, "buffer": 2000, "char_lstm_size": 25, "chars": "/workspace/finetune/data_kvkk_set3_v3/vocab.chars.txt", "dim": 50, "dim_chars": 100, "dropout": 0.5, "early_stop_max_steps": 600, "epochs": 20, "learning_rate": 0.001, "log_step_count_steps": 200, "lstm_size": 100, "min_steps": 600000, "num_oov_buckets": 1, "save_checkpoints_secs": 500, "save_summary_steps": 1000, "tags": "/workspace/finetune/data_kvkk_set3_v3/vocab.tags.txt", "trainable_embeddings": true, "vectors": "/workspace/finetune/data_kvkk_set3_v3/vectors.npz", "words": "/workspace/finetune/data_kvkk_set3_v3/vocab.words.txt"} |
| Using config: {'_model_dir': '/workspace/finetune/results/model', '_tf_random_seed': None, '_save_summary_steps': 1000, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 500, '_session_config': allow_soft_placement: true |
| graph_options { |
| rewrite_options { |
| meta_optimizer_iterations: ONE |
| } |
| } |
| , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 200, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f22cadd3190>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} |
| Not using Distribute Coordinator. |
| Running training and evaluation locally (non-distributed). |
| Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 500. |
| Calling model_fn. |
|
|
| The TensorFlow contrib module will not be included in TensorFlow 2.0. |
| For more information, please see: |
| * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md |
| * https://github.com/tensorflow/addons |
| * https://github.com/tensorflow/io (for I/O related ops) |
| If you depend on functionality not listed there, please file an issue. |
|
|
| TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1. |
| TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1. |
| TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1. |
| From /workspace/finetune/main_chars_lstm.py:104: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead. |
|
|
| From /workspace/finetune/main_chars_lstm.py:169: The name tf.metrics.accuracy is deprecated. Please use tf.compat.v1.metrics.accuracy instead. |
|
|
| From /py_packages/tf_metrics/__init__.py:152: The name tf.diag_part is deprecated. Please use tf.linalg.tensor_diag_part instead. |
|
|
| From /workspace/finetune/main_chars_lstm.py:175: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead. |
|
|
| From /workspace/finetune/main_chars_lstm.py:180: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead. |
|
|
| From /workspace/finetune/main_chars_lstm.py:181: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead. |
|
|
| Done calling model_fn. |
| Create CheckpointSaverHook. |
| Graph was finalized. |
| 2026-04-28 05:14:53.664462: I tensorflow/core/platform/profile_utils/cpu_utils.cc:109] CPU Frequency: 2000000000 Hz |
| 2026-04-28 05:14:53.695769: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x695d390 initialized for platform Host (this does not guarantee that XLA will be used). Devices: |
| 2026-04-28 05:14:53.695810: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version |
| 2026-04-28 05:14:53.700326: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 |
| 2026-04-28 05:14:53.861749: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6640640 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: |
| 2026-04-28 05:14:53.861787: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA H100, Compute Capability 9.0 |
| 2026-04-28 05:14:53.862462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: |
| name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 |
| pciBusID: 0000:ad:00.0 |
| 2026-04-28 05:14:53.862490: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 |
| 2026-04-28 05:14:54.169364: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 |
| 2026-04-28 05:14:54.198790: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 |
| 2026-04-28 05:14:54.206171: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 |
| 2026-04-28 05:14:54.213877: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 |
| 2026-04-28 05:14:54.225162: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 |
| 2026-04-28 05:14:54.226485: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 |
| 2026-04-28 05:14:54.226851: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 |
| 2026-04-28 05:14:54.228166: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 |
| 2026-04-28 05:14:54.233928: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: |
| 2026-04-28 05:14:54.233952: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 |
| 2026-04-28 05:14:54.233961: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N |
| 2026-04-28 05:14:54.234404: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) |
| Running local_init_op. |
| Done running local_init_op. |
| Saving checkpoints for 0 into /workspace/finetune/results/model/model.ckpt. |
| 2026-04-28 05:14:57.835187: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 |
| loss = 111.06471, step = 0 |
| global_step/sec: 15.9321 |
| loss = 5.9110317, step = 200 (12.553 sec) |
| global_step/sec: 17.0678 |
| loss = 2.240281, step = 400 (11.721 sec) |
| 13:4: not a valid test operator: ( |
| 13:4: not a valid test operator: 535.86.10 |
| 2026-04-28 05:15:33.664526: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 |
| WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them. |
| WARNING:tensorflow:From /workspace/finetune/main_chars_lstm.py:36: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead. |
|
|
| [train] params: {"batch_size": 128, "buffer": 2000, "char_lstm_size": 25, "chars": "/workspace/finetune/data_kvkk_set3_v3/vocab.chars.txt", "dim": 50, "dim_chars": 100, "dropout": 0.5, "early_stop_max_steps": 600, "epochs": 20, "learning_rate": 0.001, "log_step_count_steps": 200, "lstm_size": 100, "min_steps": 600000, "num_oov_buckets": 1, "save_checkpoints_secs": 500, "save_summary_steps": 1000, "tags": "/workspace/finetune/data_kvkk_set3_v3/vocab.tags.txt", "trainable_embeddings": true, "vectors": "/workspace/finetune/data_kvkk_set3_v3/vectors.npz", "words": "/workspace/finetune/data_kvkk_set3_v3/vocab.words.txt"} |
| Using config: {'_model_dir': '/workspace/finetune/results/model', '_tf_random_seed': None, '_save_summary_steps': 1000, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 500, '_session_config': allow_soft_placement: true |
| graph_options { |
| rewrite_options { |
| meta_optimizer_iterations: ONE |
| } |
| } |
| , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 200, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fe5412e7190>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} |
| Not using Distribute Coordinator. |
| Running training and evaluation locally (non-distributed). |
| Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps None or save_checkpoints_secs 500. |
| Calling model_fn. |
|
|
| The TensorFlow contrib module will not be included in TensorFlow 2.0. |
| For more information, please see: |
| * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md |
| * https://github.com/tensorflow/addons |
| * https://github.com/tensorflow/io (for I/O related ops) |
| If you depend on functionality not listed there, please file an issue. |
|
|
| TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1. |
| TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1. |
| TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1. |
| From /workspace/finetune/main_chars_lstm.py:104: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead. |
|
|
| From /workspace/finetune/main_chars_lstm.py:169: The name tf.metrics.accuracy is deprecated. Please use tf.compat.v1.metrics.accuracy instead. |
|
|
| From /py_packages/tf_metrics/__init__.py:152: The name tf.diag_part is deprecated. Please use tf.linalg.tensor_diag_part instead. |
|
|
| From /workspace/finetune/main_chars_lstm.py:175: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead. |
|
|
| From /workspace/finetune/main_chars_lstm.py:180: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead. |
|
|
| From /workspace/finetune/main_chars_lstm.py:181: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead. |
|
|
| Done calling model_fn. |
| Create CheckpointSaverHook. |
| Graph was finalized. |
| 2026-04-28 05:15:37.160482: I tensorflow/core/platform/profile_utils/cpu_utils.cc:109] CPU Frequency: 2000000000 Hz |
| 2026-04-28 05:15:37.198169: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6a89f10 initialized for platform Host (this does not guarantee that XLA will be used). Devices: |
| 2026-04-28 05:15:37.198203: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version |
| 2026-04-28 05:15:37.203160: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1 |
| 2026-04-28 05:15:37.389778: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x695b9c0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: |
| 2026-04-28 05:15:37.389819: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA H100, Compute Capability 9.0 |
| 2026-04-28 05:15:37.390462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: |
| name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 |
| pciBusID: 0000:ad:00.0 |
| 2026-04-28 05:15:37.390490: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 |
| 2026-04-28 05:15:37.703030: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 |
| 2026-04-28 05:15:37.735401: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 |
| 2026-04-28 05:15:37.743084: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 |
| 2026-04-28 05:15:37.750452: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 |
| 2026-04-28 05:15:37.762361: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 |
| 2026-04-28 05:15:37.764124: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 |
| 2026-04-28 05:15:37.766984: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 |
| 2026-04-28 05:15:37.768618: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 |
| 2026-04-28 05:15:37.774465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: |
| 2026-04-28 05:15:37.774485: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 |
| 2026-04-28 05:15:37.774492: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N |
| 2026-04-28 05:15:37.774878: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) |
| Restoring parameters from /workspace/finetune/results/model/model.ckpt-0 |
| Running local_init_op. |
| Done running local_init_op. |
| Saving checkpoints for 0 into /workspace/finetune/results/model/model.ckpt. |
| 2026-04-28 05:15:40.981003: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 |
| loss = 102.289, step = 0 |
| global_step/sec: 16.2585 |
| loss = 5.986624, step = 200 (12.302 sec) |
| global_step/sec: 16.5611 |
| loss = 2.1498528, step = 400 (12.076 sec) |
| global_step/sec: 16.1212 |
| loss = 1.312039, step = 600 (12.406 sec) |
| global_step/sec: 16.2189 |
| loss = 1.3136858, step = 800 (12.331 sec) |
| global_step/sec: 16.2345 |
| loss = 1.3853596, step = 1000 (12.320 sec) |
| global_step/sec: 15.9785 |
| loss = 1.2304411, step = 1200 (12.517 sec) |
| global_step/sec: 16.0121 |
| loss = 0.7067219, step = 1400 (12.491 sec) |
| global_step/sec: 16.1801 |
| loss = 0.93841916, step = 1600 (12.361 sec) |
| global_step/sec: 16.1054 |
| loss = 0.65951395, step = 1800 (12.418 sec) |
| global_step/sec: 16.1093 |
| loss = 0.8376456, step = 2000 (12.415 sec) |
| global_step/sec: 16.4475 |
| loss = 0.45452726, step = 2200 (12.160 sec) |
| global_step/sec: 16.3659 |
| loss = 0.6535889, step = 2400 (12.220 sec) |
| global_step/sec: 16.8136 |
| loss = 0.54435384, step = 2600 (11.895 sec) |
| global_step/sec: 16.8206 |
| loss = 0.5056425, step = 2800 (11.891 sec) |
| global_step/sec: 16.4608 |
| loss = 0.54167736, step = 3000 (12.150 sec) |
| global_step/sec: 16.5302 |
| loss = 0.9743586, step = 3200 (12.099 sec) |
| global_step/sec: 16.6723 |
| loss = 0.29906678, step = 3400 (11.996 sec) |
| global_step/sec: 16.9803 |
| loss = 0.5273884, step = 3600 (11.780 sec) |
| global_step/sec: 16.9109 |
| loss = 0.51336044, step = 3800 (11.825 sec) |
| global_step/sec: 17.2923 |
| loss = 0.6441746, step = 4000 (11.566 sec) |
| global_step/sec: 17.4957 |
| loss = 0.34444237, step = 4200 (11.432 sec) |
| global_step/sec: 17.3879 |
| loss = 0.3049839, step = 4400 (11.501 sec) |
| global_step/sec: 17.517 |
| loss = 0.33523333, step = 4600 (11.418 sec) |
| global_step/sec: 17.1766 |
| loss = 0.5199193, step = 4800 (11.643 sec) |
| global_step/sec: 17.223 |
| loss = 0.40655118, step = 5000 (11.613 sec) |
| global_step/sec: 17.67 |
| loss = 0.42372644, step = 5200 (11.319 sec) |
| global_step/sec: 17.67 |
| loss = 0.08840948, step = 5400 (11.318 sec) |
| global_step/sec: 18.0636 |
| loss = 0.21405059, step = 5600 (11.072 sec) |
| global_step/sec: 17.8529 |
| loss = 0.5103779, step = 5800 (11.203 sec) |
| global_step/sec: 17.9147 |
| loss = 0.16661471, step = 6000 (11.164 sec) |
| global_step/sec: 16.9307 |
| loss = 0.24831116, step = 6200 (11.813 sec) |
| global_step/sec: 17.287 |
| loss = 0.07917696, step = 6400 (11.569 sec) |
| global_step/sec: 17.287 |
| loss = 0.19675535, step = 6600 (11.569 sec) |
| global_step/sec: 17.2728 |
| loss = 0.45481026, step = 6800 (11.579 sec) |
| global_step/sec: 17.3187 |
| loss = 0.1687935, step = 7000 (11.548 sec) |
| global_step/sec: 17.3065 |
| loss = 0.10605991, step = 7200 (11.556 sec) |
| global_step/sec: 17.2856 |
| loss = 0.21068978, step = 7400 (11.570 sec) |
| global_step/sec: 17.6764 |
| loss = 0.10583621, step = 7600 (11.314 sec) |
| global_step/sec: 17.1817 |
| loss = 0.25232738, step = 7800 (11.640 sec) |
| global_step/sec: 17.3914 |
| loss = 0.5265127, step = 8000 (11.500 sec) |
| global_step/sec: 17.2989 |
| loss = 0.15738869, step = 8200 (11.561 sec) |
| global_step/sec: 17.4263 |
| loss = 0.13792074, step = 8400 (11.477 sec) |
| Saving checkpoints for 8428 into /workspace/finetune/results/model/model.ckpt. |
| Calling model_fn. |
| Done calling model_fn. |
| Starting evaluation at 2026-04-28T05:24:01Z |
| Graph was finalized. |
| 2026-04-28 05:24:01.172161: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: |
| name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 |
| pciBusID: 0000:ad:00.0 |
| 2026-04-28 05:24:01.172213: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 |
| 2026-04-28 05:24:01.172243: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 |
| 2026-04-28 05:24:01.172249: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 |
| 2026-04-28 05:24:01.172255: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 |
| 2026-04-28 05:24:01.172260: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 |
| 2026-04-28 05:24:01.172265: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 |
| 2026-04-28 05:24:01.172272: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 |
| 2026-04-28 05:24:01.172522: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 |
| 2026-04-28 05:24:01.172549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: |
| 2026-04-28 05:24:01.172553: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 |
| 2026-04-28 05:24:01.172557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N |
| 2026-04-28 05:24:01.172822: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) |
| Restoring parameters from /workspace/finetune/results/model/model.ckpt-8428 |
| Running local_init_op. |
| Done running local_init_op. |
| Evaluation [10/100] |
| Evaluation [20/100] |
| Evaluation [30/100] |
| Evaluation [40/100] |
| Evaluation [50/100] |
| Evaluation [60/100] |
| Evaluation [70/100] |
| Evaluation [80/100] |
| Evaluation [90/100] |
| Evaluation [100/100] |
| Finished evaluation at 2026-04-28-05:24:06 |
| Saving dict for global step 8428: acc = 0.997914, f1 = 0.9942703, global_step = 8428, loss = 0.14689502, precision = 0.9936579, recall = 0.99488366 |
| Saving 'checkpoint_path' summary for global step 8428: /workspace/finetune/results/model/model.ckpt-8428 |
| global_step/sec: 11.5153 |
| loss = 0.13722146, step = 8600 (17.368 sec) |
| global_step/sec: 17.6419 |
| loss = 0.31847942, step = 8800 (11.336 sec) |
| global_step/sec: 17.8273 |
| loss = 0.28958094, step = 9000 (11.219 sec) |
| global_step/sec: 17.5481 |
| loss = 0.30293107, step = 9200 (11.397 sec) |
| global_step/sec: 17.5695 |
| loss = 0.11271095, step = 9400 (11.383 sec) |
| global_step/sec: 17.8699 |
| loss = 0.2657137, step = 9600 (11.192 sec) |
| global_step/sec: 17.8482 |
| loss = 0.06779647, step = 9800 (11.205 sec) |
| global_step/sec: 17.8093 |
| loss = 0.116514206, step = 10000 (11.230 sec) |
| global_step/sec: 17.6214 |
| loss = 0.17775589, step = 10200 (11.350 sec) |
| global_step/sec: 18.1588 |
| loss = 0.11069703, step = 10400 (11.014 sec) |
| global_step/sec: 18.0034 |
| loss = 0.024612904, step = 10600 (11.109 sec) |
| global_step/sec: 18.1334 |
| loss = 0.17613989, step = 10800 (11.029 sec) |
| global_step/sec: 18.0531 |
| loss = 0.13324213, step = 11000 (11.079 sec) |
| global_step/sec: 18.0712 |
| loss = 0.12831438, step = 11200 (11.067 sec) |
| global_step/sec: 17.947 |
| loss = 0.2804634, step = 11400 (11.144 sec) |
| global_step/sec: 18.1069 |
| loss = 0.2336666, step = 11600 (11.046 sec) |
| global_step/sec: 17.9596 |
| loss = 0.16818011, step = 11800 (11.136 sec) |
| global_step/sec: 17.9847 |
| loss = 0.062304914, step = 12000 (11.121 sec) |
| global_step/sec: 17.8009 |
| loss = 0.15490824, step = 12200 (11.235 sec) |
| global_step/sec: 18.0888 |
| loss = 0.102054656, step = 12400 (11.056 sec) |
| global_step/sec: 17.9338 |
| loss = 0.04917127, step = 12600 (11.152 sec) |
| global_step/sec: 17.9571 |
| loss = 0.07043046, step = 12800 (11.138 sec) |
| global_step/sec: 16.5145 |
| loss = 0.22558558, step = 13000 (12.111 sec) |
| global_step/sec: 17.9296 |
| loss = 0.0617373, step = 13200 (11.154 sec) |
| global_step/sec: 18.0051 |
| loss = 0.080938876, step = 13400 (11.108 sec) |
| global_step/sec: 17.8975 |
| loss = 0.049816668, step = 13600 (11.175 sec) |
| global_step/sec: 17.6146 |
| loss = 0.091686785, step = 13800 (11.354 sec) |
| global_step/sec: 17.6869 |
| loss = 0.09760392, step = 14000 (11.308 sec) |
| global_step/sec: 17.7814 |
| loss = 0.08662301, step = 14200 (11.247 sec) |
| global_step/sec: 17.8851 |
| loss = 0.12617636, step = 14400 (11.183 sec) |
| global_step/sec: 17.1209 |
| loss = 0.13539296, step = 14600 (11.682 sec) |
| global_step/sec: 18.0841 |
| loss = 0.21070999, step = 14800 (11.059 sec) |
| global_step/sec: 17.9193 |
| loss = 0.09002066, step = 15000 (11.161 sec) |
| global_step/sec: 18.0027 |
| loss = 0.07087046, step = 15200 (11.109 sec) |
| global_step/sec: 18.0166 |
| loss = 0.1169194, step = 15400 (11.101 sec) |
| global_step/sec: 17.7961 |
| loss = 0.094311416, step = 15600 (11.238 sec) |
| global_step/sec: 18.069 |
| loss = 0.08750421, step = 15800 (11.069 sec) |
| global_step/sec: 17.8826 |
| loss = 0.013639748, step = 16000 (11.184 sec) |
| global_step/sec: 17.9857 |
| loss = 0.051932037, step = 16200 (11.120 sec) |
| global_step/sec: 17.8354 |
| loss = 0.11339444, step = 16400 (11.214 sec) |
| global_step/sec: 17.8948 |
| loss = 0.07497531, step = 16600 (11.176 sec) |
| global_step/sec: 17.9611 |
| loss = 0.09614849, step = 16800 (11.135 sec) |
| global_step/sec: 17.984 |
| loss = 0.044727564, step = 17000 (11.121 sec) |
| global_step/sec: 17.9836 |
| loss = 0.06893462, step = 17200 (11.121 sec) |
| Saving checkpoints for 17244 into /workspace/finetune/results/model/model.ckpt. |
| Calling model_fn. |
| Done calling model_fn. |
| Starting evaluation at 2026-04-28T05:32:21Z |
| Graph was finalized. |
| 2026-04-28 05:32:21.131189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: |
| name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 |
| pciBusID: 0000:ad:00.0 |
| 2026-04-28 05:32:21.131230: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 |
| 2026-04-28 05:32:21.131259: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 |
| 2026-04-28 05:32:21.131265: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 |
| 2026-04-28 05:32:21.131271: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 |
| 2026-04-28 05:32:21.131276: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 |
| 2026-04-28 05:32:21.131281: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 |
| 2026-04-28 05:32:21.131288: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 |
| 2026-04-28 05:32:21.131566: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 |
| 2026-04-28 05:32:21.131590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: |
| 2026-04-28 05:32:21.131594: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 |
| 2026-04-28 05:32:21.131598: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N |
| 2026-04-28 05:32:21.131906: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) |
| Restoring parameters from /workspace/finetune/results/model/model.ckpt-17244 |
| Running local_init_op. |
| Done running local_init_op. |
| Evaluation [10/100] |
| Evaluation [20/100] |
| Evaluation [30/100] |
| Evaluation [40/100] |
| Evaluation [50/100] |
| Evaluation [60/100] |
| Evaluation [70/100] |
| Evaluation [80/100] |
| Evaluation [90/100] |
| Evaluation [100/100] |
| Finished evaluation at 2026-04-28-05:32:25 |
| Saving dict for global step 17244: acc = 0.9983455, f1 = 0.9955927, global_step = 17244, loss = 0.10923356, precision = 0.99532104, recall = 0.99586445 |
| Saving 'checkpoint_path' summary for global step 17244: /workspace/finetune/results/model/model.ckpt-17244 |
| global_step/sec: 12.2121 |
| loss = 0.07070118, step = 17400 (16.377 sec) |
| global_step/sec: 17.9306 |
| loss = 0.07339883, step = 17600 (11.154 sec) |
| global_step/sec: 17.8084 |
| loss = 0.12021345, step = 17800 (11.231 sec) |
| global_step/sec: 17.8753 |
| loss = 0.0967865, step = 18000 (11.189 sec) |
| global_step/sec: 17.6591 |
| loss = 0.0594576, step = 18200 (11.325 sec) |
| global_step/sec: 17.9587 |
| loss = 0.06613392, step = 18400 (11.137 sec) |
| global_step/sec: 17.6629 |
| loss = 0.11538398, step = 18600 (11.323 sec) |
| global_step/sec: 17.6883 |
| loss = 0.008395612, step = 18800 (11.307 sec) |
| global_step/sec: 18.0527 |
| loss = 0.08756232, step = 19000 (11.079 sec) |
| global_step/sec: 17.9936 |
| loss = 0.023801446, step = 19200 (11.115 sec) |
| global_step/sec: 17.7398 |
| loss = 0.11921042, step = 19400 (11.274 sec) |
| global_step/sec: 17.6782 |
| loss = 0.048651278, step = 19600 (11.314 sec) |
| global_step/sec: 17.7342 |
| loss = 0.17171317, step = 19800 (11.278 sec) |
| global_step/sec: 17.756 |
| loss = 0.072808385, step = 20000 (11.264 sec) |
| global_step/sec: 17.7842 |
| loss = 0.08458197, step = 20200 (11.246 sec) |
| global_step/sec: 18.1083 |
| loss = 0.09870511, step = 20400 (11.045 sec) |
| global_step/sec: 17.8018 |
| loss = 0.014859796, step = 20600 (11.235 sec) |
| global_step/sec: 17.9517 |
| loss = 0.02439493, step = 20800 (11.141 sec) |
| global_step/sec: 17.8295 |
| loss = 0.063162625, step = 21000 (11.218 sec) |
| global_step/sec: 17.8725 |
| loss = 0.026917815, step = 21200 (11.190 sec) |
| global_step/sec: 17.8992 |
| loss = 0.13421822, step = 21400 (11.174 sec) |
| global_step/sec: 18.0597 |
| loss = 0.11149919, step = 21600 (11.074 sec) |
| global_step/sec: 17.3746 |
| loss = 0.019822836, step = 21800 (11.511 sec) |
| global_step/sec: 17.9356 |
| loss = 0.064364076, step = 22000 (11.151 sec) |
| global_step/sec: 17.8118 |
| loss = 0.02763486, step = 22200 (11.228 sec) |
| global_step/sec: 17.6316 |
| loss = 0.09873259, step = 22400 (11.343 sec) |
| global_step/sec: 17.9541 |
| loss = 0.07205546, step = 22600 (11.140 sec) |
| global_step/sec: 17.9894 |
| loss = 0.052541614, step = 22800 (11.118 sec) |
| global_step/sec: 17.7515 |
| loss = 0.07275927, step = 23000 (11.267 sec) |
| global_step/sec: 17.8268 |
| loss = 0.13450235, step = 23200 (11.219 sec) |
| global_step/sec: 18.1835 |
| loss = 0.016338944, step = 23400 (10.999 sec) |
| global_step/sec: 18.0212 |
| loss = 0.07616836, step = 23600 (11.098 sec) |
| global_step/sec: 17.9681 |
| loss = 0.0625878, step = 23800 (11.131 sec) |
| global_step/sec: 17.786 |
| loss = 0.04009646, step = 24000 (11.245 sec) |
| global_step/sec: 17.8586 |
| loss = 0.047451556, step = 24200 (11.199 sec) |
| global_step/sec: 17.7901 |
| loss = 0.04998851, step = 24400 (11.242 sec) |
| global_step/sec: 17.985 |
| loss = 0.043762565, step = 24600 (11.120 sec) |
| global_step/sec: 17.9799 |
| loss = 0.06248969, step = 24800 (11.124 sec) |
| Saving checkpoints for 25000 into /workspace/finetune/results/model/model.ckpt. |
| Calling model_fn. |
| Done calling model_fn. |
| Starting evaluation at 2026-04-28T05:39:40Z |
| Graph was finalized. |
| 2026-04-28 05:39:40.866675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: |
| name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 |
| pciBusID: 0000:ad:00.0 |
| 2026-04-28 05:39:40.866717: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 |
| 2026-04-28 05:39:40.866747: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 |
| 2026-04-28 05:39:40.866754: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 |
| 2026-04-28 05:39:40.866760: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 |
| 2026-04-28 05:39:40.866765: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 |
| 2026-04-28 05:39:40.866770: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 |
| 2026-04-28 05:39:40.866777: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 |
| 2026-04-28 05:39:40.867014: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 |
| 2026-04-28 05:39:40.867040: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: |
| 2026-04-28 05:39:40.867044: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 |
| 2026-04-28 05:39:40.867048: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N |
| 2026-04-28 05:39:40.867311: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) |
| Restoring parameters from /workspace/finetune/results/model/model.ckpt-25000 |
| Running local_init_op. |
| Done running local_init_op. |
| Evaluation [10/100] |
| Evaluation [20/100] |
| Evaluation [30/100] |
| Evaluation [40/100] |
| Evaluation [50/100] |
| Evaluation [60/100] |
| Evaluation [70/100] |
| Evaluation [80/100] |
| Evaluation [90/100] |
| Evaluation [100/100] |
| Finished evaluation at 2026-04-28-05:39:45 |
| Saving dict for global step 25000: acc = 0.9985107, f1 = 0.99607116, global_step = 25000, loss = 0.11429862, precision = 0.99621725, recall = 0.9959251 |
| Saving 'checkpoint_path' summary for global step 25000: /workspace/finetune/results/model/model.ckpt-25000 |
| Loss for final step: 0.057276487. |
| Calling model_fn. |
| Done calling model_fn. |
| Graph was finalized. |
| 2026-04-28 05:39:45.501816: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: |
| name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 |
| pciBusID: 0000:ad:00.0 |
| 2026-04-28 05:39:45.501861: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 |
| 2026-04-28 05:39:45.501878: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 |
| 2026-04-28 05:39:45.501884: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 |
| 2026-04-28 05:39:45.501890: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 |
| 2026-04-28 05:39:45.501896: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 |
| 2026-04-28 05:39:45.501901: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 |
| 2026-04-28 05:39:45.501908: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 |
| 2026-04-28 05:39:45.502144: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 |
| 2026-04-28 05:39:45.502167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: |
| 2026-04-28 05:39:45.502171: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 |
| 2026-04-28 05:39:45.502176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N |
| 2026-04-28 05:39:45.502823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) |
| Restoring parameters from /workspace/finetune/results/model/model.ckpt-25000 |
| Running local_init_op. |
| Done running local_init_op. |
| [predict] wrote /workspace/finetune/results/score/train.preds.txt |
| Calling model_fn. |
| Done calling model_fn. |
| Graph was finalized. |
| 2026-04-28 05:40:35.402702: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: |
| name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 |
| pciBusID: 0000:ad:00.0 |
| 2026-04-28 05:40:35.402752: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 |
| 2026-04-28 05:40:35.402780: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 |
| 2026-04-28 05:40:35.402786: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 |
| 2026-04-28 05:40:35.402792: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 |
| 2026-04-28 05:40:35.402797: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 |
| 2026-04-28 05:40:35.402802: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 |
| 2026-04-28 05:40:35.402809: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 |
| 2026-04-28 05:40:35.403043: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 |
| 2026-04-28 05:40:35.403069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: |
| 2026-04-28 05:40:35.403073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 |
| 2026-04-28 05:40:35.403077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N |
| 2026-04-28 05:40:35.403350: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) |
| Restoring parameters from /workspace/finetune/results/model/model.ckpt-25000 |
| Running local_init_op. |
| Done running local_init_op. |
| [predict] wrote /workspace/finetune/results/score/testa.preds.txt |
| Calling model_fn. |
| Done calling model_fn. |
| Graph was finalized. |
| 2026-04-28 05:40:42.254590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1669] Found device 0 with properties: |
| name: NVIDIA H100 major: 9 minor: 0 memoryClockRate(GHz): 1.98 |
| pciBusID: 0000:ad:00.0 |
| 2026-04-28 05:40:42.254635: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 |
| 2026-04-28 05:40:42.254654: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11 |
| 2026-04-28 05:40:42.254662: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10 |
| 2026-04-28 05:40:42.254668: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10 |
| 2026-04-28 05:40:42.254674: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11 |
| 2026-04-28 05:40:42.254679: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11 |
| 2026-04-28 05:40:42.254685: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8 |
| 2026-04-28 05:40:42.254917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1797] Adding visible gpu devices: 0 |
| 2026-04-28 05:40:42.254938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1209] Device interconnect StreamExecutor with strength 1 edge matrix: |
| 2026-04-28 05:40:42.254942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1215] 0 |
| 2026-04-28 05:40:42.254947: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1228] 0: N |
| 2026-04-28 05:40:42.255201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1354] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 60325 MB memory) -> physical GPU (device: 0, name: NVIDIA H100, pci bus id: 0000:ad:00.0, compute capability: 9.0) |
| Restoring parameters from /workspace/finetune/results/model/model.ckpt-25000 |
| Running local_init_op. |
| Done running local_init_op. |
| [predict] wrote /workspace/finetune/results/score/testb.preds.txt |
|
|