| [2026-03-24 17:43:34,672][mllm.models.large_language_model_local][INFO] - Initializing adapter 'agent_adapter': no initial weights provided or found; starting from scratch. | |
| [2026-03-24 17:43:35,468][mllm.models.adapter_training_wrapper][INFO] - Adapter 'agent_adapter': initialized with fresh weights (no initial weights found). | |
| [2026-03-24 17:43:35,474][mllm.models.large_language_model_local][INFO] - Initializing adapter 'critic_adapter': no initial weights provided or found; starting from scratch. | |
| [2026-03-24 17:43:36,290][mllm.models.adapter_training_wrapper][INFO] - Adapter 'critic_adapter': initialized with fresh weights (no initial weights found). | |
| [2026-03-24 17:46:27,033][__main__][INFO] - Starting iteration 0. | |
| [2026-03-24 17:46:27,040][__main__][INFO] - Inference policies count is regular policies 2 and buffer policies 0 and human policies 1. | |
| [2026-03-24 17:46:27,040][__main__][INFO] - Hard coded buffer agents are set to False with prob 0 | |
| [2026-03-24 17:46:36,051][__main__][INFO] - Number of regex retries in iteration 0: 0 | |
| [2026-03-24 17:46:36,052][__main__][INFO] - agents played in iteration 0 are Bob, Alice_buffer, Alice, Bob_buffer | |
| [2026-03-24 17:48:02,370][mllm.training.trainer_ad_align][INFO] - For task: Get advantages with critic gradient accumulation, ΔVRAM % (total): 0.00%, Current % of VRAM taken: 37.33%, Block Peak % of device VRAM: 18.62%, ΔTime: 00:01:25 | |