File size: 16,829 Bytes
542f0e8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
386b4da
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7e06de3
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
2024-07-17 16:22:04,040 INFO    StreamThr :2116314 [internal.py:wandb_internal():85] W&B internal server running at pid: 2116314, started at: 2024-07-17 16:22:04.039368
2024-07-17 16:22:04,043 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: status
2024-07-17 16:22:04,046 INFO    WriterThread:2116314 [datastore.py:open_for_write():87] open: /home/cc/polymorph/fine-tuning/results/train-lora/5/loras/19/wandb/run-20240717_162204-wglu07sk/run-wglu07sk.wandb
2024-07-17 16:22:04,050 DEBUG   SenderThread:2116314 [sender.py:send():379] send: header
2024-07-17 16:22:04,051 DEBUG   SenderThread:2116314 [sender.py:send():379] send: run
2024-07-17 16:22:04,205 INFO    SenderThread:2116314 [dir_watcher.py:__init__():211] watching files in: /home/cc/polymorph/fine-tuning/results/train-lora/5/loras/19/wandb/run-20240717_162204-wglu07sk/files
2024-07-17 16:22:04,205 INFO    SenderThread:2116314 [sender.py:_start_run_threads():1188] run started: wglu07sk with start time 1721233324.03891
2024-07-17 16:22:04,218 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: check_version
2024-07-17 16:22:04,219 DEBUG   SenderThread:2116314 [sender.py:send_request():406] send_request: check_version
2024-07-17 16:22:04,281 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: run_start
2024-07-17 16:22:04,332 DEBUG   HandlerThread:2116314 [system_info.py:__init__():26] System info init
2024-07-17 16:22:04,332 DEBUG   HandlerThread:2116314 [system_info.py:__init__():41] System info init done
2024-07-17 16:22:04,332 INFO    HandlerThread:2116314 [system_monitor.py:start():194] Starting system monitor
2024-07-17 16:22:04,332 INFO    SystemMonitor:2116314 [system_monitor.py:_start():158] Starting system asset monitoring threads
2024-07-17 16:22:04,332 INFO    HandlerThread:2116314 [system_monitor.py:probe():214] Collecting system info
2024-07-17 16:22:04,333 INFO    SystemMonitor:2116314 [interfaces.py:start():188] Started cpu monitoring
2024-07-17 16:22:04,335 INFO    SystemMonitor:2116314 [interfaces.py:start():188] Started disk monitoring
2024-07-17 16:22:04,336 INFO    SystemMonitor:2116314 [interfaces.py:start():188] Started gpu monitoring
2024-07-17 16:22:04,337 INFO    SystemMonitor:2116314 [interfaces.py:start():188] Started memory monitoring
2024-07-17 16:22:04,338 INFO    SystemMonitor:2116314 [interfaces.py:start():188] Started network monitoring
2024-07-17 16:22:04,400 DEBUG   HandlerThread:2116314 [system_info.py:probe():152] Probing system
2024-07-17 16:22:04,405 DEBUG   HandlerThread:2116314 [system_info.py:_probe_git():137] Probing git
2024-07-17 16:22:04,416 DEBUG   HandlerThread:2116314 [system_info.py:_probe_git():145] Probing git done
2024-07-17 16:22:04,416 DEBUG   HandlerThread:2116314 [system_info.py:probe():200] Probing system done
2024-07-17 16:22:04,416 DEBUG   HandlerThread:2116314 [system_monitor.py:probe():223] {'os': 'Linux-5.15.0-101-generic-x86_64-with-glibc2.35', 'python': '3.11.9', 'heartbeatAt': '2024-07-17T16:22:04.400954', 'startedAt': '2024-07-17T16:22:04.032523', 'docker': None, 'cuda': None, 'args': (), 'state': 'running', 'program': '/home/cc/polymorph/fine-tuning/main-lora-train.py', 'codePathLocal': 'main-lora-train.py', 'codePath': 'fine-tuning/main-lora-train.py', 'git': {'remote': 'https://github.com/inference-serving/polymorph.git', 'commit': 'e84189a37f0838a7e4ac1496b2345fe84c6a7683'}, 'email': 's.ghafouri@qub.ac.uk', 'root': '/home/cc/polymorph', 'host': 'gpu', 'username': 'cc', 'executable': '/home/cc/miniconda3/envs/vision/bin/python', 'cpu_count': 24, 'cpu_count_logical': 48, 'cpu_freq': {'current': 2576.3446041666666, 'min': 1000.0, 'max': 3700.0}, 'cpu_freq_per_core': [{'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 1401.746, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}, {'current': 2600.0, 'min': 1000.0, 'max': 3700.0}], 'disk': {'/': {'total': 208.95753479003906, 'used': 157.59302139282227}}, 'gpu': 'Quadro RTX 6000', 'gpu_count': 1, 'gpu_devices': [{'name': 'Quadro RTX 6000', 'memory_total': 25769803776}], 'memory': {'total': 187.4629783630371}}
2024-07-17 16:22:04,417 INFO    HandlerThread:2116314 [system_monitor.py:probe():224] Finished collecting system info
2024-07-17 16:22:04,417 INFO    HandlerThread:2116314 [system_monitor.py:probe():227] Publishing system info
2024-07-17 16:22:04,417 DEBUG   HandlerThread:2116314 [system_info.py:_save_conda():209] Saving list of conda packages installed into the current environment
2024-07-17 16:22:05,208 INFO    Thread-12 :2116314 [dir_watcher.py:_on_file_created():271] file/dir created: /home/cc/polymorph/fine-tuning/results/train-lora/5/loras/19/wandb/run-20240717_162204-wglu07sk/files/conda-environment.yaml
2024-07-17 16:22:07,942 DEBUG   HandlerThread:2116314 [system_info.py:_save_conda():224] Saving conda packages done
2024-07-17 16:22:07,943 INFO    HandlerThread:2116314 [system_monitor.py:probe():229] Finished publishing system info
2024-07-17 16:22:07,953 DEBUG   SenderThread:2116314 [sender.py:send():379] send: files
2024-07-17 16:22:07,953 INFO    SenderThread:2116314 [sender.py:_save_file():1454] saving file wandb-metadata.json with policy now
2024-07-17 16:22:08,106 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: python_packages
2024-07-17 16:22:08,106 DEBUG   SenderThread:2116314 [sender.py:send_request():406] send_request: python_packages
2024-07-17 16:22:08,107 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: stop_status
2024-07-17 16:22:08,108 DEBUG   SenderThread:2116314 [sender.py:send_request():406] send_request: stop_status
2024-07-17 16:22:08,112 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: internal_messages
2024-07-17 16:22:08,159 DEBUG   SenderThread:2116314 [sender.py:send():379] send: telemetry
2024-07-17 16:22:08,159 DEBUG   SenderThread:2116314 [sender.py:send():379] send: config
2024-07-17 16:22:08,161 DEBUG   SenderThread:2116314 [sender.py:send():379] send: telemetry
2024-07-17 16:22:08,162 DEBUG   SenderThread:2116314 [sender.py:send():379] send: metric
2024-07-17 16:22:08,162 DEBUG   SenderThread:2116314 [sender.py:send():379] send: telemetry
2024-07-17 16:22:08,162 DEBUG   SenderThread:2116314 [sender.py:send():379] send: metric
2024-07-17 16:22:08,162 WARNING SenderThread:2116314 [sender.py:send_metric():1405] Seen metric with glob (shouldn't happen)
2024-07-17 16:22:08,162 DEBUG   SenderThread:2116314 [sender.py:send():379] send: telemetry
2024-07-17 16:22:08,163 DEBUG   SenderThread:2116314 [sender.py:send():379] send: telemetry
2024-07-17 16:22:08,163 DEBUG   SenderThread:2116314 [sender.py:send():379] send: config
2024-07-17 16:22:08,206 INFO    Thread-12 :2116314 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/cc/polymorph/fine-tuning/results/train-lora/5/loras/19/wandb/run-20240717_162204-wglu07sk/files/conda-environment.yaml
2024-07-17 16:22:08,207 INFO    Thread-12 :2116314 [dir_watcher.py:_on_file_created():271] file/dir created: /home/cc/polymorph/fine-tuning/results/train-lora/5/loras/19/wandb/run-20240717_162204-wglu07sk/files/requirements.txt
2024-07-17 16:22:08,207 INFO    Thread-12 :2116314 [dir_watcher.py:_on_file_created():271] file/dir created: /home/cc/polymorph/fine-tuning/results/train-lora/5/loras/19/wandb/run-20240717_162204-wglu07sk/files/wandb-metadata.json
2024-07-17 16:22:08,216 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: log_artifact
2024-07-17 16:22:08,216 DEBUG   SenderThread:2116314 [sender.py:send_request():406] send_request: log_artifact
2024-07-17 16:22:08,259 INFO    wandb-upload_0:2116314 [upload_job.py:push():130] Uploaded file /tmp/tmp6xd4bkanwandb/g8dj0aj4-wandb-metadata.json
2024-07-17 16:22:08,735 INFO    wandb-upload_0:2116314 [upload_job.py:push():88] Uploaded file /tmp/tmpyvr0ja9x/model_architecture.txt
2024-07-17 16:22:09,108 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: internal_messages
2024-07-17 16:22:09,159 INFO    SenderThread:2116314 [sender.py:send_request_log_artifact():1518] logged artifact model-wglu07sk - {'id': 'QXJ0aWZhY3Q6OTkxMzgwOTg0', 'state': 'PENDING', 'artifactSequence': {'id': 'QXJ0aWZhY3RDb2xsZWN0aW9uOjI4MjA3NTAwNA==', 'latestArtifact': None}}
2024-07-17 16:22:09,160 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: status_report
2024-07-17 16:22:09,207 INFO    Thread-12 :2116314 [dir_watcher.py:_on_file_created():271] file/dir created: /home/cc/polymorph/fine-tuning/results/train-lora/5/loras/19/wandb/run-20240717_162204-wglu07sk/files/output.log
2024-07-17 16:22:10,107 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: internal_messages
2024-07-17 16:22:11,108 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: internal_messages
2024-07-17 16:22:11,208 INFO    Thread-12 :2116314 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/cc/polymorph/fine-tuning/results/train-lora/5/loras/19/wandb/run-20240717_162204-wglu07sk/files/output.log
2024-07-17 16:22:12,108 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: internal_messages
2024-07-17 16:22:13,108 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: internal_messages
2024-07-17 16:22:13,209 INFO    Thread-12 :2116314 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/cc/polymorph/fine-tuning/results/train-lora/5/loras/19/wandb/run-20240717_162204-wglu07sk/files/output.log
2024-07-17 16:22:13,550 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: partial_history
2024-07-17 16:22:13,554 DEBUG   SenderThread:2116314 [sender.py:send():379] send: metric
2024-07-17 16:22:13,556 DEBUG   SenderThread:2116314 [sender.py:send():379] send: metric
2024-07-17 16:22:13,556 DEBUG   SenderThread:2116314 [sender.py:send():379] send: metric
2024-07-17 16:22:13,557 DEBUG   SenderThread:2116314 [sender.py:send():379] send: metric
2024-07-17 16:22:13,557 DEBUG   SenderThread:2116314 [sender.py:send():379] send: metric
2024-07-17 16:22:13,557 DEBUG   SenderThread:2116314 [sender.py:send():379] send: metric
2024-07-17 16:22:13,557 DEBUG   SenderThread:2116314 [sender.py:send():379] send: history
2024-07-17 16:22:13,557 DEBUG   SenderThread:2116314 [sender.py:send_request():406] send_request: summary_record
2024-07-17 16:22:13,558 INFO    SenderThread:2116314 [sender.py:_save_file():1454] saving file wandb-summary.json with policy end
2024-07-17 16:22:14,108 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: internal_messages
2024-07-17 16:22:14,209 INFO    Thread-12 :2116314 [dir_watcher.py:_on_file_created():271] file/dir created: /home/cc/polymorph/fine-tuning/results/train-lora/5/loras/19/wandb/run-20240717_162204-wglu07sk/files/wandb-summary.json
2024-07-17 16:22:14,560 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: status_report
2024-07-17 16:22:15,108 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: internal_messages
2024-07-17 16:22:15,210 INFO    Thread-12 :2116314 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/cc/polymorph/fine-tuning/results/train-lora/5/loras/19/wandb/run-20240717_162204-wglu07sk/files/output.log
2024-07-17 16:22:16,109 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: internal_messages
2024-07-17 16:22:16,210 INFO    Thread-12 :2116314 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/cc/polymorph/fine-tuning/results/train-lora/5/loras/19/wandb/run-20240717_162204-wglu07sk/files/output.log
2024-07-17 16:22:17,109 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: internal_messages
2024-07-17 16:22:17,212 INFO    Thread-12 :2116314 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/cc/polymorph/fine-tuning/results/train-lora/5/loras/19/wandb/run-20240717_162204-wglu07sk/files/output.log
2024-07-17 16:22:18,109 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: internal_messages
2024-07-17 16:22:18,774 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: partial_history
2024-07-17 16:22:18,776 DEBUG   SenderThread:2116314 [sender.py:send():379] send: history
2024-07-17 16:22:18,777 DEBUG   SenderThread:2116314 [sender.py:send_request():406] send_request: summary_record
2024-07-17 16:22:18,780 INFO    SenderThread:2116314 [sender.py:_save_file():1454] saving file wandb-summary.json with policy end
2024-07-17 16:22:19,109 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: internal_messages
2024-07-17 16:22:19,212 INFO    Thread-12 :2116314 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/cc/polymorph/fine-tuning/results/train-lora/5/loras/19/wandb/run-20240717_162204-wglu07sk/files/wandb-summary.json
2024-07-17 16:22:19,212 INFO    Thread-12 :2116314 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/cc/polymorph/fine-tuning/results/train-lora/5/loras/19/wandb/run-20240717_162204-wglu07sk/files/output.log
2024-07-17 16:22:19,781 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: status_report
2024-07-17 16:22:20,109 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: internal_messages
2024-07-17 16:22:20,213 INFO    Thread-12 :2116314 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/cc/polymorph/fine-tuning/results/train-lora/5/loras/19/wandb/run-20240717_162204-wglu07sk/files/output.log
2024-07-17 16:22:21,109 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: internal_messages
2024-07-17 16:22:21,213 INFO    Thread-12 :2116314 [dir_watcher.py:_on_file_modified():288] file/dir modified: /home/cc/polymorph/fine-tuning/results/train-lora/5/loras/19/wandb/run-20240717_162204-wglu07sk/files/output.log
2024-07-17 16:22:22,109 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: internal_messages
2024-07-17 16:22:23,106 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: stop_status
2024-07-17 16:22:23,107 DEBUG   SenderThread:2116314 [sender.py:send_request():406] send_request: stop_status
2024-07-17 16:22:23,110 DEBUG   HandlerThread:2116314 [handler.py:handle_request():158] handle_request: internal_messages