Hanrui / SpecForge-ext /wandb /debug.log
Lekr0's picture
Add files using upload-large-folder tool
d522318 verified
2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_setup.py:_flush():81] Current SDK version is 0.24.1
2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_setup.py:_flush():81] Configure stats pid to 601
2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_setup.py:_flush():81] Loading settings from environment variables
2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_init.py:setup_run_log_directory():717] Logging user logs to /workspace/hanrui/SpecForge-ext/wandb/run-20260202_071323-2yze80jn/logs/debug.log
2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_init.py:setup_run_log_directory():718] Logging internal logs to /workspace/hanrui/SpecForge-ext/wandb/run-20260202_071323-2yze80jn/logs/debug-internal.log
2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_init.py:init():844] calling init triggers
2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_init.py:init():849] wandb.init called with sweep_config: {}
config: {'target_model_path': '/workspace/Qwen3-8B', 'trust_remote_code': False, 'draft_model_config': 'configs/qwen3-8b-qwen3eagle-5layer.json', 'embedding_key': 'model.embed_tokens.weight', 'lm_head_key': 'lm_head.weight', 'is_vlm': False, 'target_model_backend': 'sglang', 'train_data_path': '/workspace/hanrui/qwen3-8b_dflash_regen/sharegpt_train_regenerated.jsonl', 'train_hidden_states_path': None, 'eval_hidden_states_path': None, 'eval_data_path': None, 'chat_template': 'qwen', 'is_preformatted': False, 'train_only_last_turn': False, 'build_dataset_num_proc': 8, 'dataloader_num_workers': 4, 'num_epochs': 10, 'max_num_steps': None, 'batch_size': 2, 'learning_rate': 0.0001, 'max_length': 2048, 'warmup_ratio': 0.015, 'total_steps': 49260, 'max_grad_norm': 0.5, 'ttt_length': 7, 'resume': False, 'ckpt_dir': None, 'eval_interval': 5000, 'save_interval': 5000, 'log_interval': 100, 'seed': 0, 'draft_accumulation_steps': 1, 'tp_size': 1, 'sp_ulysses_size': 1, 'sp_ring_size': 1, 'attention_backend': 'flex_attention', 'cache_key': None, 'cache_dir': 'cache', 'output_dir': 'outputs/qwen3-8b-qwen3eagle-5layer', 'verbose': False, 'dist_timeout': 20, 'model_download_dir': None, 'min_pixels': 50176, 'max_pixels': 802816, 'profile': False, 'profile_start_step': 30, 'profile_num_steps': 4, 'profile_record_shapes': False, 'sglang_attention_backend': 'flashinfer', 'sglang_mem_fraction_static': 0.4, 'sglang_context_length': None, 'sglang_enable_nccl_nvls': False, 'sglang_enable_symm_mem': False, 'sglang_enable_torch_compile': False, 'sglang_enable_dp_attention': False, 'sglang_enable_dp_lm_head': False, 'sglang_enable_piecewise_cuda_graph': False, 'sglang_piecewise_cuda_graph_max_tokens': 4096, 'sglang_piecewise_cuda_graph_tokens': None, 'sglang_ep_size': 1, 'report_to': 'wandb', 'wandb_project': 'qwen3-8b-qwen3eagle', 'wandb_name': '5layer-ttt7', 'wandb_key': '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', 'swanlab_project': None, 'swanlab_name': None, 'swanlab_key': None, 'mlflow_tracking_uri': None, 'mlflow_experiment_name': None, 'mlflow_run_name': None, 'dp_size': 8, 'target_batch_size': 2, '_wandb': {}}
2026-02-02 07:13:23,912 INFO MainThread:601 [wandb_init.py:init():892] starting backend
2026-02-02 07:13:24,247 INFO MainThread:601 [wandb_init.py:init():895] sending inform_init request
2026-02-02 07:13:24,263 INFO MainThread:601 [wandb_init.py:init():903] backend started and connected
2026-02-02 07:13:24,270 INFO MainThread:601 [wandb_init.py:init():973] updated telemetry
2026-02-02 07:13:24,285 INFO MainThread:601 [wandb_init.py:init():997] communicating run to backend with 90.0 second timeout
2026-02-02 07:13:55,052 INFO Thread-7 (wrapped_target):601 [retry.py:__call__():164] [no run ID] Retry attempt failed:
Traceback (most recent call last):
File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connection.py", line 204, in _new_conn
sock = connection.create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
raise err
File "/workspace/specforge/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
TimeoutError: timed out
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 787, in urlopen
response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 488, in _make_request
raise new_e
File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 464, in _make_request
self._validate_conn(conn)
File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1093, in _validate_conn
conn.connect()
File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connection.py", line 759, in connect
self.sock = sock = self._new_conn()
^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connection.py", line 213, in _new_conn
raise ConnectTimeoutError(
urllib3.exceptions.ConnectTimeoutError: (<HTTPSConnection(host='api.wandb.ai', port=443) at 0x7fcc1c1ea6d0>, 'Connection to api.wandb.ai timed out. (connect timeout=20)')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/workspace/specforge/lib/python3.11/site-packages/requests/adapters.py", line 644, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 841, in urlopen
retries = retries.increment(
^^^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/urllib3/util/retry.py", line 535, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.wandb.ai', port=443): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<HTTPSConnection(host='api.wandb.ai', port=443) at 0x7fcc1c1ea6d0>, 'Connection to api.wandb.ai timed out. (connect timeout=20)'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/workspace/specforge/lib/python3.11/site-packages/wandb/sdk/lib/retry.py", line 157, in __call__
result = self._call_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/wandb/sdk/internal/internal_api.py", line 397, in execute
return self.client.execute(*args, **kwargs) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 52, in execute
result = self._get_result(document, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 60, in _get_result
return self.transport.execute(document, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/wandb/sdk/lib/gql_request.py", line 70, in execute
request = self.session.post(self.url, **post_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/requests/sessions.py", line 637, in post
return self.request("POST", url, data=data, json=json, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/requests/adapters.py", line 665, in send
raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='api.wandb.ai', port=443): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<HTTPSConnection(host='api.wandb.ai', port=443) at 0x7fcc1c1ea6d0>, 'Connection to api.wandb.ai timed out. (connect timeout=20)'))
2026-02-02 07:14:12,432 INFO Thread-6 (wrapped_target):601 [retry.py:__call__():164] [no run ID] Retry attempt failed:
Traceback (most recent call last):
File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connection.py", line 204, in _new_conn
sock = connection.create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
raise err
File "/workspace/specforge/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
TimeoutError: timed out
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 787, in urlopen
response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 488, in _make_request
raise new_e
File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 464, in _make_request
self._validate_conn(conn)
File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1093, in _validate_conn
conn.connect()
File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connection.py", line 759, in connect
self.sock = sock = self._new_conn()
^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connection.py", line 213, in _new_conn
raise ConnectTimeoutError(
urllib3.exceptions.ConnectTimeoutError: (<HTTPSConnection(host='api.wandb.ai', port=443) at 0x7fcc1c1e8810>, 'Connection to api.wandb.ai timed out. (connect timeout=20)')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/workspace/specforge/lib/python3.11/site-packages/requests/adapters.py", line 644, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/urllib3/connectionpool.py", line 841, in urlopen
retries = retries.increment(
^^^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/urllib3/util/retry.py", line 535, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.wandb.ai', port=443): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<HTTPSConnection(host='api.wandb.ai', port=443) at 0x7fcc1c1e8810>, 'Connection to api.wandb.ai timed out. (connect timeout=20)'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/workspace/specforge/lib/python3.11/site-packages/wandb/sdk/lib/retry.py", line 157, in __call__
result = self._call_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/wandb/sdk/internal/internal_api.py", line 397, in execute
return self.client.execute(*args, **kwargs) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 52, in execute
result = self._get_result(document, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/wandb/vendor/gql-0.2.0/wandb_gql/client.py", line 60, in _get_result
return self.transport.execute(document, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/wandb/sdk/lib/gql_request.py", line 70, in execute
request = self.session.post(self.url, **post_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/requests/sessions.py", line 637, in post
return self.request("POST", url, data=data, json=json, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/specforge/lib/python3.11/site-packages/requests/adapters.py", line 665, in send
raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='api.wandb.ai', port=443): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<HTTPSConnection(host='api.wandb.ai', port=443) at 0x7fcc1c1e8810>, 'Connection to api.wandb.ai timed out. (connect timeout=20)'))