Layers that will be compiled: group_conv_0 single_conv_0 group_conv_1 single_conv_1 group_pre_2 group_post_2 single_pre_2 single_post_2 group_conv_3 single_conv_3 group_conv_4 single_conv_4 group_pre_5 group_post_5 single_pre_5 single_post_5 group_conv_6 single_conv_6 group_conv_7 single_conv_7 group_pre_8 group_post_8 single_pre_8 single_post_8 group_conv_9 single_conv_9 group_pre_10 group_post_10 single_pre_10 single_post_10 group_conv_11 single_conv_11 group_pre_12 group_post_12 single_pre_12 single_post_12 group_conv_13 single_conv_13 group_pre_14 group_post_14 single_pre_14 single_post_14 group_conv_15 single_conv_15 group_cache_0 group_cache_128 group_cache_256 group_cache_384 group_cache_512 group_cache_640 group_cache_768 group_cache_896 group_cache_1024 group_cache_1152 group_cache_1280 group_cache_1408 group_cache_1536 group_cache_1664 group_cache_1792 group_cache_1920 single_cache_127 single_cache_255 single_cache_383 single_cache_511 single_cache_639 single_cache_767 single_cache_895 single_cache_1023 single_cache_1151 single_cache_1279 single_cache_1407 single_cache_1535 single_cache_1663 single_cache_1791 single_cache_1919 single_cache_2047 conv_post_final_15 2026-03-07 10:07:36,108 - sima_lmm.model.vision_language_model - INFO - Generating FileGenMode.DEVKIT files... Generated all mode=DEVKIT files 2026-03-07 10:07:39,307 - sima_lmm.model.vision_language_model - INFO - Generating FileGenMode.SOURCE_TO_ONNX files... 2026-03-07 10:09:00,717 - sima_lmm.model.vision_language_model - INFO - FileGenMode.SOURCE_TO_ONNX files generation completed. Generated all mode=SOURCE_TO_ONNX files 2026-03-07 10:09:00,717 - sima_lmm.model.vision_language_model - INFO - Generating FileGenMode.ONNX_TO_QUANT files... 2026-03-07 10:09:26,301 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_layer0_conv.onnx'] in onnx format 2026-03-07 10:09:26,301 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_layer0_conv.onnx'] in onnx format 2026-03-07 10:09:26,301 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_layer1_conv.onnx'] in onnx format 2026-03-07 10:09:26,302 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_layer1_conv.onnx'] in onnx format 2026-03-07 10:09:26,325 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_pre_layer2.onnx'] in onnx format 2026-03-07 10:09:27,204 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_post_layer2.onnx'] in onnx format 2026-03-07 10:09:27,525 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:09:27,525 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:09:29,015 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:09:29,015 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:09:29,061 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:09:29,061 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:09:29,240 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:09:29,240 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:09:29,248 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:09:29,248 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:09:29,260 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:09:29,261 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:09:33,464 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:09:35,112 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:09:35,200 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:09:35,519 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_pre_layer2.sima 2026-03-07 10:09:35,525 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_pre_layer2.onnx'] in onnx format 2026-03-07 10:09:36,701 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:09:36,739 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:09:36,740 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:09:38,508 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:09:38,518 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:09:42,435 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_post_layer2.sima 2026-03-07 10:09:42,442 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_post_layer2.onnx'] in onnx format 2026-03-07 10:09:43,984 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:09:43,984 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:09:44,558 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:09:44,858 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_layer0_conv.sima 2026-03-07 10:09:44,908 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_layer3_conv.onnx'] in onnx format 2026-03-07 10:09:44,948 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_layer1_conv.sima 2026-03-07 10:09:45,001 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_layer3_conv.onnx'] in onnx format 2026-03-07 10:09:45,786 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_layer0_conv.sima 2026-03-07 10:09:45,793 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_layer4_conv.onnx'] in onnx format 2026-03-07 10:09:45,864 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_pre_layer2.sima 2026-03-07 10:09:45,878 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_layer4_conv.onnx'] in onnx format 2026-03-07 10:09:45,952 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_layer1_conv.sima 2026-03-07 10:09:45,961 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_pre_layer5.onnx'] in onnx format 2026-03-07 10:09:47,021 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:09:47,022 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:09:47,400 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:09:47,400 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:09:47,420 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:09:47,420 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:09:48,146 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:09:48,355 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:09:48,356 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:09:48,377 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:09:48,377 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:09:52,770 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:09:53,666 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:09:54,113 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_post_layer2.sima 2026-03-07 10:09:54,128 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_post_layer5.onnx'] in onnx format 2026-03-07 10:09:54,522 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:09:54,922 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_pre_layer5.sima 2026-03-07 10:09:54,933 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_pre_layer5.onnx'] in onnx format 2026-03-07 10:09:55,594 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:09:55,594 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:09:56,029 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:09:56,029 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:09:57,263 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:09:57,265 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:03,230 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_layer3_conv.sima 2026-03-07 10:10:03,260 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_post_layer5.onnx'] in onnx format 2026-03-07 10:10:03,697 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:03,967 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_layer4_conv.sima 2026-03-07 10:10:04,031 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_layer6_conv.onnx'] in onnx format 2026-03-07 10:10:04,493 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_layer4_conv.sima 2026-03-07 10:10:04,503 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_layer6_conv.onnx'] in onnx format 2026-03-07 10:10:04,745 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_layer3_conv.sima 2026-03-07 10:10:04,759 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_layer7_conv.onnx'] in onnx format 2026-03-07 10:10:04,903 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:05,013 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_pre_layer5.sima 2026-03-07 10:10:05,018 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_layer7_conv.onnx'] in onnx format 2026-03-07 10:10:05,082 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:05,082 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:06,863 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:06,863 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:06,948 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:06,948 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:07,847 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:07,847 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:08,003 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:08,003 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:09,116 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_post_layer5.sima 2026-03-07 10:10:09,119 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_pre_layer8.onnx'] in onnx format 2026-03-07 10:10:09,461 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:10,246 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:10,246 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:12,654 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:14,020 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:14,788 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_post_layer5.sima 2026-03-07 10:10:14,814 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_post_layer8.onnx'] in onnx format 2026-03-07 10:10:15,901 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:16,199 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:16,675 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:16,675 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:16,838 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:19,211 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_pre_layer8.sima 2026-03-07 10:10:19,216 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_pre_layer8.onnx'] in onnx format 2026-03-07 10:10:21,213 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:21,213 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:22,233 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_layer6_conv.sima 2026-03-07 10:10:22,254 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_post_layer8.onnx'] in onnx format 2026-03-07 10:10:23,321 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_layer6_conv.sima 2026-03-07 10:10:23,337 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_layer9_conv.onnx'] in onnx format 2026-03-07 10:10:23,532 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:23,532 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:23,948 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_layer7_conv.sima 2026-03-07 10:10:23,974 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_layer9_conv.onnx'] in onnx format 2026-03-07 10:10:23,991 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_layer7_conv.sima 2026-03-07 10:10:24,005 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_pre_layer10.onnx'] in onnx format 2026-03-07 10:10:25,028 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:25,028 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:25,559 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:25,870 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:25,870 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:26,429 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:26,429 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:27,101 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:28,007 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:28,718 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_pre_layer8.sima 2026-03-07 10:10:28,722 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_post_layer10.onnx'] in onnx format 2026-03-07 10:10:30,381 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_post_layer8.sima 2026-03-07 10:10:30,390 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_pre_layer10.onnx'] in onnx format 2026-03-07 10:10:30,426 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:30,426 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:31,399 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:31,489 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:31,489 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:32,488 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:33,437 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_pre_layer10.sima 2026-03-07 10:10:33,440 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_post_layer10.onnx'] in onnx format 2026-03-07 10:10:33,472 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_post_layer8.sima 2026-03-07 10:10:33,502 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_layer11_conv.onnx'] in onnx format 2026-03-07 10:10:34,742 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:34,742 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:35,710 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:35,710 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:35,748 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:37,684 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:38,071 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:39,941 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:39,999 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_pre_layer10.sima 2026-03-07 10:10:40,009 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_layer11_conv.onnx'] in onnx format 2026-03-07 10:10:41,755 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_layer9_conv.sima 2026-03-07 10:10:41,787 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_pre_layer12.onnx'] in onnx format 2026-03-07 10:10:42,194 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_layer9_conv.sima 2026-03-07 10:10:42,204 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_post_layer12.onnx'] in onnx format 2026-03-07 10:10:42,464 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:42,464 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:42,870 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:42,871 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:43,011 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_post_layer10.sima 2026-03-07 10:10:43,038 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_pre_layer12.onnx'] in onnx format 2026-03-07 10:10:43,538 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:43,538 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:44,175 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:44,175 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:45,543 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_post_layer10.sima 2026-03-07 10:10:45,553 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_post_layer12.onnx'] in onnx format 2026-03-07 10:10:46,342 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:47,138 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:47,138 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:48,724 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:48,784 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:50,483 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:51,649 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_pre_layer12.sima 2026-03-07 10:10:51,654 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_layer13_conv.onnx'] in onnx format 2026-03-07 10:10:51,902 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:52,064 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_pre_layer12.sima 2026-03-07 10:10:52,074 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_layer13_conv.onnx'] in onnx format 2026-03-07 10:10:52,828 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:10:53,683 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_layer11_conv.sima 2026-03-07 10:10:53,706 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_pre_layer14.onnx'] in onnx format 2026-03-07 10:10:54,729 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:54,730 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:54,926 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:54,926 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:55,094 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:55,094 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:56,310 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_post_layer12.sima 2026-03-07 10:10:56,315 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_post_layer14.onnx'] in onnx format 2026-03-07 10:10:57,484 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_layer11_conv.sima 2026-03-07 10:10:57,525 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_pre_layer14.onnx'] in onnx format 2026-03-07 10:10:57,740 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:57,740 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:58,548 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:10:58,548 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:10:58,752 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_post_layer12.sima 2026-03-07 10:10:58,777 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_post_layer14.onnx'] in onnx format 2026-03-07 10:11:00,380 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:00,381 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:00,705 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:01,088 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:02,859 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_pre_layer14.sima 2026-03-07 10:11:02,864 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_layer15_conv.onnx'] in onnx format 2026-03-07 10:11:03,373 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:03,374 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:03,827 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:04,698 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:05,167 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:05,323 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:06,997 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_pre_layer14.sima 2026-03-07 10:11:07,000 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_layer15_conv.onnx'] in onnx format 2026-03-07 10:11:07,813 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:07,816 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:08,778 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:10,663 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_layer13_conv.sima 2026-03-07 10:11:10,695 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_cache_token0.onnx'] in onnx format 2026-03-07 10:11:10,782 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:10,782 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:10,785 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_layer15_conv.sima 2026-03-07 10:11:10,796 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_cache_token128.onnx'] in onnx format 2026-03-07 10:11:10,881 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:10,881 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:10,980 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_post_layer14.sima 2026-03-07 10:11:10,986 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_cache_token256.onnx'] in onnx format 2026-03-07 10:11:11,044 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_layer13_conv.sima 2026-03-07 10:11:11,058 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_cache_token384.onnx'] in onnx format 2026-03-07 10:11:11,071 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:11,071 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:11,143 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:11,143 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:11,479 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:12,807 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_post_layer14.sima 2026-03-07 10:11:12,827 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_cache_token512.onnx'] in onnx format 2026-03-07 10:11:12,917 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:12,917 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:13,760 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:13,782 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:13,898 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_layer15_conv.sima 2026-03-07 10:11:13,908 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_cache_token640.onnx'] in onnx format 2026-03-07 10:11:14,082 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:14,082 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:14,751 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:14,793 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:16,188 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token0.sima 2026-03-07 10:11:16,191 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_cache_token768.onnx'] in onnx format 2026-03-07 10:11:16,280 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:16,280 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:16,740 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:16,939 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token128.sima 2026-03-07 10:11:16,950 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_cache_token896.onnx'] in onnx format 2026-03-07 10:11:17,213 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:17,214 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:18,462 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:18,722 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token256.sima 2026-03-07 10:11:18,725 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_cache_token1024.onnx'] in onnx format 2026-03-07 10:11:18,814 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:18,815 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:19,290 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token384.sima 2026-03-07 10:11:19,294 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_cache_token1152.onnx'] in onnx format 2026-03-07 10:11:19,384 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:19,384 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:20,233 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:20,528 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:21,468 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token512.sima 2026-03-07 10:11:21,471 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_cache_token1280.onnx'] in onnx format 2026-03-07 10:11:21,562 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:21,562 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:22,199 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:22,793 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:23,511 - afe.ir.transform.calibration_transforms - INFO - Calibration progress: completed 1 samples 2026-03-07 10:11:23,674 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token640.sima 2026-03-07 10:11:23,677 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_cache_token1408.onnx'] in onnx format 2026-03-07 10:11:23,769 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:23,769 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:24,806 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:26,340 - afe.ir.transform.calibration_transforms - INFO - Calibration progress: completed 1 samples 2026-03-07 10:11:26,877 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token768.sima 2026-03-07 10:11:26,880 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_cache_token1536.onnx'] in onnx format 2026-03-07 10:11:26,971 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:26,971 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:27,100 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:27,178 - afe.ir.transform.calibration_transforms - INFO - Calibration progress: completed 1 samples 2026-03-07 10:11:27,444 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token896.sima 2026-03-07 10:11:27,450 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_cache_token1664.onnx'] in onnx format 2026-03-07 10:11:27,549 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:27,549 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:29,432 - afe.ir.transform.calibration_transforms - INFO - Calibration progress: completed 1 samples 2026-03-07 10:11:29,589 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token1024.sima 2026-03-07 10:11:29,592 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_cache_token1792.onnx'] in onnx format 2026-03-07 10:11:29,682 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:29,682 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:30,228 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:30,541 - afe.ir.transform.calibration_transforms - INFO - Calibration progress: completed 1 samples 2026-03-07 10:11:30,702 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token1152.sima 2026-03-07 10:11:30,706 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n128_cache_token1920.onnx'] in onnx format 2026-03-07 10:11:30,737 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:30,797 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:30,797 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:33,043 - afe.ir.transform.calibration_transforms - INFO - Calibration progress: completed 1 samples 2026-03-07 10:11:33,065 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:33,209 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token1280.sima 2026-03-07 10:11:33,212 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_cache_token127.onnx'] in onnx format 2026-03-07 10:11:33,300 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:33,300 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:33,961 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:35,826 - afe.ir.transform.calibration_transforms - INFO - Calibration progress: completed 1 samples 2026-03-07 10:11:35,989 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token1408.sima 2026-03-07 10:11:35,992 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_cache_token255.onnx'] in onnx format 2026-03-07 10:11:36,083 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:36,084 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:36,547 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:37,363 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token127.sima 2026-03-07 10:11:37,367 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_cache_token383.onnx'] in onnx format 2026-03-07 10:11:37,456 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:37,456 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:39,272 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:40,445 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token255.sima 2026-03-07 10:11:40,449 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_cache_token511.onnx'] in onnx format 2026-03-07 10:11:40,533 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:40,533 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:40,712 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:40,728 - afe.ir.transform.calibration_transforms - INFO - Calibration progress: completed 1 samples 2026-03-07 10:11:41,024 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token1536.sima 2026-03-07 10:11:41,027 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_cache_token639.onnx'] in onnx format 2026-03-07 10:11:41,113 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:41,113 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:41,333 - afe.ir.transform.calibration_transforms - INFO - Calibration progress: completed 1 samples 2026-03-07 10:11:41,830 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token1664.sima 2026-03-07 10:11:41,833 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_cache_token767.onnx'] in onnx format 2026-03-07 10:11:41,919 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:41,919 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:42,166 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token383.sima 2026-03-07 10:11:42,169 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_cache_token895.onnx'] in onnx format 2026-03-07 10:11:42,254 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:42,254 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:44,075 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:44,254 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:44,505 - afe.ir.transform.calibration_transforms - INFO - Calibration progress: completed 1 samples 2026-03-07 10:11:44,705 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token1792.sima 2026-03-07 10:11:44,708 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_cache_token1023.onnx'] in onnx format 2026-03-07 10:11:44,885 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:44,885 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:45,274 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:45,472 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:45,826 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token511.sima 2026-03-07 10:11:45,829 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_cache_token1151.onnx'] in onnx format 2026-03-07 10:11:45,914 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:45,914 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:46,072 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token639.sima 2026-03-07 10:11:46,075 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_cache_token1279.onnx'] in onnx format 2026-03-07 10:11:46,209 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:46,209 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:46,474 - afe.ir.transform.calibration_transforms - INFO - Calibration progress: completed 1 samples 2026-03-07 10:11:46,633 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token1920.sima 2026-03-07 10:11:46,636 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_cache_token1407.onnx'] in onnx format 2026-03-07 10:11:46,722 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:46,722 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:46,955 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token767.sima 2026-03-07 10:11:46,957 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_cache_token1535.onnx'] in onnx format 2026-03-07 10:11:47,048 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:47,048 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:47,174 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token895.sima 2026-03-07 10:11:47,177 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_cache_token1663.onnx'] in onnx format 2026-03-07 10:11:47,263 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:47,263 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:48,290 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:48,924 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:49,403 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:49,988 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:50,339 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token1023.sima 2026-03-07 10:11:50,348 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_cache_token1791.onnx'] in onnx format 2026-03-07 10:11:50,416 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:50,515 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:50,515 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:50,622 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:51,965 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token1151.sima 2026-03-07 10:11:51,968 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_cache_token1919.onnx'] in onnx format 2026-03-07 10:11:52,054 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:52,054 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:52,419 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token1279.sima 2026-03-07 10:11:52,423 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_cache_token2047.onnx'] in onnx format 2026-03-07 10:11:52,508 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:52,508 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:52,970 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token1407.sima 2026-03-07 10:11:52,974 - afe.apis.loaded_net - INFO - Loading ['CompiledModels/LFM2.5-1.2B-Instruct/onnx_files/LFM2.5-1.2B-Instruct_language_n1_post_layer15_conv_final.onnx'] in onnx format 2026-03-07 10:11:53,436 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token1535.sima 2026-03-07 10:11:53,452 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token1663.sima 2026-03-07 10:11:54,004 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:55,106 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:55,560 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:11:56,229 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token1791.sima 2026-03-07 10:11:57,661 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token1919.sima 2026-03-07 10:11:57,702 - afe.apis.loaded_net - INFO - Quantize loaded net, layout = NCHW, arm_only = False 2026-03-07 10:11:57,702 - afe.apis.loaded_net - INFO - Calibration method = mse 2026-03-07 10:11:58,129 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token2047.sima 2026-03-07 10:12:04,050 - afe.backends.mla.mla_checkers - INFO - Cannot assign node cast_10, source_name(['argmax']) to MLA. ['Unsupported'] 2026-03-07 10:12:04,819 - afe.ir.transform.calibration_transforms - INFO - Running Calibration ... 2026-03-07 10:12:23,267 - afe.ir.serializer.api - INFO - Saved model: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_post_layer15_conv_final.sima Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Match check_no_dynamic_weights pattern Match check_no_dynamic_weights pattern Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Match check_no_dynamic_weights pattern Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Match check_no_dynamic_weights pattern Match check_no_dynamic_weights pattern Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Match check_no_dynamic_weights pattern Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Match check_no_dynamic_weights pattern Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Match check_no_dynamic_weights pattern Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Match check_no_dynamic_weights pattern Match check_no_dynamic_weights pattern Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Match check_no_dynamic_weights pattern Match check_no_dynamic_weights pattern Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Match check_no_dynamic_weights pattern Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Match check_no_dynamic_weights pattern Match check_no_dynamic_weights pattern Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Match check_no_dynamic_weights pattern Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Match check_no_dynamic_weights pattern Match check_no_dynamic_weights pattern Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Match check_no_dynamic_weights pattern Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Match check_no_dynamic_weights pattern Match check_no_dynamic_weights pattern Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Match check_no_dynamic_weights pattern Match check_no_dynamic_weights pattern Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Match check_no_dynamic_weights pattern Match check_no_dynamic_weights pattern Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE Match check_no_dynamic_weights pattern Running Calibration ... Calibration Progress: |██████████████████████████████| 100.0% 1|1 Complete. 1/1 Running Calibration ...DONE Running quantization ... Running quantization ...DONE 2026-03-07 10:12:25,605 - sima_lmm.model.vision_language_model - INFO - FileGenMode.ONNX_TO_QUANT files generation completed. Generated all mode=ONNX_TO_QUANT files 2026-03-07 10:12:25,606 - sima_lmm.model.vision_language_model - INFO - Generating FileGenMode.MODEL_SDK_COMPILE files... 2026-03-07 10:12:51,318 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_pre_layer2.sima 2026-03-07 10:12:51,332 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_layer0_conv.sima 2026-03-07 10:12:51,348 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:12:51,348 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_pre_layer2" 2026-03-07 10:12:51,351 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_layer1_conv.sima 2026-03-07 10:12:51,353 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:12:51,353 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:12:51,360 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:12:51,361 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:12:51,361 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:12:51,361 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:12:51,413 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:12:51,413 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_layer0_conv" 2026-03-07 10:12:51,417 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:12:51,417 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:12:51,432 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:12:51,433 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_layer1_conv" 2026-03-07 10:12:51,436 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:12:51,436 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:12:51,447 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:12:51,447 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:12:51,447 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:12:51,447 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:12:51,466 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:12:51,466 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:12:51,466 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:12:51,466 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:12:51,472 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_layer0_conv.sima 2026-03-07 10:12:51,481 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_layer1_conv.sima 2026-03-07 10:12:51,523 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:12:51,543 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:12:51,550 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:12:51,560 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:12:51,560 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_layer0_conv" 2026-03-07 10:12:51,565 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:12:51,565 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:12:51,569 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:12:51,569 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_layer1_conv" 2026-03-07 10:12:51,571 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:12:51,572 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:12:51,574 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:12:51,574 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:12:51,593 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:12:51,605 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:12:51,605 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:12:51,605 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:12:51,605 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:12:51,613 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:12:51,614 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:12:51,614 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:12:51,614 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:12:51,755 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:12:51,756 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:12:51,773 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:12:51,776 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:12:51,777 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:12:51,791 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:12:52,806 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_post_layer2.sima 2026-03-07 10:12:52,882 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:12:52,882 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_post_layer2" 2026-03-07 10:12:52,886 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:12:52,886 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:12:52,918 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:12:52,919 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:12:52,919 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:12:52,919 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:12:56,292 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:12:56,344 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:12:56,699 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:12:56,747 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:12:56,781 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:12:56,830 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:12:57,070 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:12:57,071 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:12:57,084 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:12:57,166 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:12:57,167 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:12:57,180 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:12:57,186 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:12:57,243 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:12:57,280 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:12:57,702 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:12:57,707 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:12:57,721 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:12:57,781 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:12:58,294 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:12:58,330 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:12:58,379 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:12:58,414 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:12:58,419 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:12:58,458 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:12:58,504 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:12:58,506 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:12:58,526 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:13:01,485 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:13:01,539 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:13:02,484 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:13:02,552 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:13:02,669 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:13:02,703 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:13:02,737 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:13:02,771 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:13:07,410 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:13:07,506 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:13:08,728 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:13:08,857 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:13:09,031 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:13:09,148 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:13:09,163 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:13:09,280 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:13:09,955 - mlc.test_util.test_context - INFO - Compression done in 7s. Compression ratio: 0.981 2026-03-07 10:13:09,955 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:13:10,066 - mlc.test_util.test_context - INFO - Compression done in 7s. Compression ratio: 0.979 2026-03-07 10:13:10,066 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:13:10,133 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:13:10,242 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:13:10,243 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:13:10,243 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:13:10,353 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:13:10,353 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:13:15,053 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:13:15,997 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:13:16,426 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:13:16,494 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:13:19,304 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:13:19,469 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:13:19,751 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:13:20,542 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:13:20,691 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:13:20,800 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:13:21,094 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:13:21,178 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:13:21,242 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:13:21,302 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:13:21,324 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:13:21,348 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:13:22,672 - mlc.test_util.test_context - INFO - Compression done in 1s. Compression ratio: 0.945 2026-03-07 10:13:22,672 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:13:22,696 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:13:22,848 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:13:22,849 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:13:28,788 - mlc.test_util.test_context - INFO - Compression done in 12s. Compression ratio: 0.957 2026-03-07 10:13:28,788 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:13:28,788 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:13:28,796 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 2s. 2026-03-07 10:13:28,909 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:13:29,139 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:13:29,139 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:13:29,310 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:13:29,310 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 3s. 2026-03-07 10:13:29,724 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:13:29,725 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 3s. 2026-03-07 10:13:33,636 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:13:33,636 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:13:33,636 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:13:33,636 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:13:33,636 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:13:33,636 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:13:33,636 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:13:33,637 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:13:33,637 - afe.backends.mpk.interface - INFO - EV74: 7 2026-03-07 10:13:33,637 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:13:33,637 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:13:33,637 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_pre_layer2_mpk.json, LFM2.5-1.2B-Instruct_language_n128_pre_layer2_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_pre_layer2_stage1_mla.elf, llima-compile 2026-03-07 10:13:33,816 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:13:33,816 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:13:33,816 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:13:33,816 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:13:33,816 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:13:33,816 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:13:33,816 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:13:33,816 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:13:33,816 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:13:33,816 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:13:33,816 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:13:33,816 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_layer0_conv_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_layer0_conv_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n1_layer0_conv_mpk.json, llima-compile 2026-03-07 10:13:34,044 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_pre_layer2.sima 2026-03-07 10:13:34,066 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:13:34,066 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_pre_layer2" 2026-03-07 10:13:34,070 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:13:34,070 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:13:34,075 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:13:34,076 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:13:34,076 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:13:34,076 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:13:34,187 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:13:34,187 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:13:34,187 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:13:34,187 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:13:34,187 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:13:34,187 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:13:34,187 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:13:34,187 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:13:34,187 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:13:34,187 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:13:34,187 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:13:34,187 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_layer1_conv_mpk.json, LFM2.5-1.2B-Instruct_language_n1_layer1_conv_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_layer1_conv_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:13:34,248 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:13:34,269 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:13:34,301 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:13:34,330 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:13:34,331 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:13:34,341 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:13:34,488 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_post_layer2.sima 2026-03-07 10:13:34,558 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:13:34,559 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_post_layer2" 2026-03-07 10:13:34,560 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:13:34,560 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:13:34,583 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:13:34,584 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:13:34,584 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:13:34,584 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:13:34,625 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:13:34,630 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:13:34,641 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:13:34,785 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:13:34,786 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:13:34,794 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:13:35,137 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_layer3_conv.sima 2026-03-07 10:13:35,142 - mlc.test_util.test_context - INFO - Compression done in 13s. Compression ratio: 0.964 2026-03-07 10:13:35,142 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:13:35,231 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:13:35,231 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_layer3_conv" 2026-03-07 10:13:35,235 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:13:35,235 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:13:35,274 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:13:35,275 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:13:35,275 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:13:35,275 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:13:35,306 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:13:35,340 - mlc.test_util.test_context - INFO - Compression done in 14s. Compression ratio: 0.961 2026-03-07 10:13:35,340 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:13:35,504 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:13:35,585 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:13:35,585 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:13:35,782 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:13:35,782 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:13:36,860 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:13:36,892 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:13:38,990 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:13:39,358 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:13:39,471 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:13:39,482 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:13:39,593 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:13:39,619 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:13:40,052 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:13:40,223 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.969 2026-03-07 10:13:40,223 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:13:40,255 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:13:40,295 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:13:40,296 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:13:40,478 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:13:40,530 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:13:40,859 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:13:40,860 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:13:40,872 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:13:41,915 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:13:42,614 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:13:42,705 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:13:42,761 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:13:42,788 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:13:48,697 - mlc.test_util.test_context - INFO - Compression done in 5s. Compression ratio: 0.977 2026-03-07 10:13:48,697 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:13:49,413 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:13:49,496 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:13:49,496 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:13:53,077 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:13:53,193 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:13:59,304 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 0s. 2026-03-07 10:14:00,300 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:14:00,300 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:14:00,301 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:14:00,301 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:14:00,301 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:14:00,301 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:14:00,301 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:14:00,301 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:14:00,301 - afe.backends.mpk.interface - INFO - EV74: 7 2026-03-07 10:14:00,301 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:14:00,301 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:14:00,301 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_pre_layer2_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_pre_layer2_mpk.json, LFM2.5-1.2B-Instruct_language_n1_pre_layer2_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:14:00,361 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:14:00,370 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 8s. 2026-03-07 10:14:00,731 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_layer3_conv.sima 2026-03-07 10:14:00,804 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:14:00,805 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_layer3_conv" 2026-03-07 10:14:00,808 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:14:00,808 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:14:00,833 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:14:00,834 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:14:00,834 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:14:00,834 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:14:00,910 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:14:00,938 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:14:00,958 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:14:01,137 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:14:01,138 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:14:01,151 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:14:02,800 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:14:04,039 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:14:04,595 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:14:04,679 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:14:04,992 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:14:05,001 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 1s. 2026-03-07 10:14:07,665 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:14:07,700 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:14:08,367 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:14:08,367 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:14:08,367 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:14:08,367 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:14:08,367 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:14:08,367 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:14:08,367 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:14:08,367 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:14:08,367 - afe.backends.mpk.interface - INFO - EV74: 3 2026-03-07 10:14:08,367 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:14:08,367 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:14:08,367 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_post_layer2_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_post_layer2_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n128_post_layer2_mpk.json, llima-compile 2026-03-07 10:14:08,417 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:14:08,417 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:14:08,417 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:14:08,417 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:14:08,417 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:14:08,417 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:14:08,417 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:14:08,417 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:14:08,417 - afe.backends.mpk.interface - INFO - EV74: 3 2026-03-07 10:14:08,417 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:14:08,417 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:14:08,417 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_post_layer2_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n1_post_layer2_mpk.json, LFM2.5-1.2B-Instruct_language_n1_post_layer2_stage1_mla.elf, llima-compile 2026-03-07 10:14:08,850 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_layer4_conv.sima 2026-03-07 10:14:08,911 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:14:08,912 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_layer4_conv" 2026-03-07 10:14:08,914 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:14:08,915 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:14:08,941 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_layer4_conv.sima 2026-03-07 10:14:08,943 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:14:08,944 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:14:08,944 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:14:08,944 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:14:09,018 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:14:09,028 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:14:09,028 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_layer4_conv" 2026-03-07 10:14:09,032 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:14:09,032 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:14:09,046 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:14:09,067 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:14:09,072 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:14:09,072 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:14:09,072 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:14:09,072 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:14:09,248 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:14:09,249 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:14:09,262 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:14:10,731 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:14:11,683 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:14:11,870 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:14:11,904 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:14:13,457 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:14:13,692 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:14:13,704 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 12s. 2026-03-07 10:14:13,871 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:14:13,902 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:14:13,913 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 12s. 2026-03-07 10:14:13,920 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:14:14,243 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:14:14,244 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:14:14,256 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:14:15,343 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:14:15,378 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:14:18,520 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:14:18,846 - mlc.test_util.test_context - INFO - Compression done in 14s. Compression ratio: 0.96 2026-03-07 10:14:18,846 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:14:19,017 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:14:19,297 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:14:19,297 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:14:19,300 - mlc.test_util.test_context - INFO - Compression done in 7s. Compression ratio: 0.979 2026-03-07 10:14:19,300 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:14:20,156 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:14:20,157 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:14:20,267 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:14:20,267 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:14:20,340 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:14:20,372 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:14:24,549 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:14:24,549 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:14:24,549 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:14:24,549 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:14:24,549 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:14:24,549 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:14:24,549 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:14:24,549 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:14:24,549 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:14:24,549 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:14:24,549 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:14:24,549 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_layer1_conv_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n128_layer1_conv_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_layer1_conv_mpk.json, llima-compile 2026-03-07 10:14:24,969 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_pre_layer5.sima 2026-03-07 10:14:24,975 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:14:24,975 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:14:24,975 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:14:24,975 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:14:24,975 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:14:24,975 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:14:24,975 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:14:24,975 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:14:24,975 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:14:24,975 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:14:24,975 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:14:24,975 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_layer0_conv_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_layer0_conv_mpk.json, LFM2.5-1.2B-Instruct_language_n128_layer0_conv_stage1_mla.elf, llima-compile 2026-03-07 10:14:24,990 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:14:24,990 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_pre_layer5" 2026-03-07 10:14:24,993 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:14:24,994 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:14:25,000 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:14:25,001 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:14:25,001 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:14:25,001 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:14:25,415 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_post_layer5.sima 2026-03-07 10:14:25,483 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:14:25,483 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_post_layer5" 2026-03-07 10:14:25,486 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:14:25,487 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:14:25,518 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:14:25,519 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:14:25,519 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:14:25,519 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:14:26,258 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:14:26,375 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:14:27,597 - mlc.test_util.test_context - INFO - Compression done in 7s. Compression ratio: 0.974 2026-03-07 10:14:27,597 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:14:27,778 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:14:27,888 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:14:27,888 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:14:29,839 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:14:29,898 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:14:29,935 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:14:30,355 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:14:30,356 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:14:30,365 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:14:31,703 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:14:32,136 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:14:32,172 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:14:32,348 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:14:32,349 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:14:32,363 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:14:36,811 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:14:38,051 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:14:38,623 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:14:38,714 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:14:39,595 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:14:39,606 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 3s. 2026-03-07 10:14:39,929 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:14:40,022 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:14:42,286 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:14:42,419 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:14:43,986 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:14:43,987 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:14:43,987 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:14:43,987 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:14:43,987 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:14:43,987 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:14:43,987 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:14:43,987 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:14:43,987 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:14:43,987 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:14:43,987 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:14:43,987 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_layer3_conv_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_layer3_conv_mpk.json, LFM2.5-1.2B-Instruct_language_n1_layer3_conv_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:14:44,399 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_pre_layer5.sima 2026-03-07 10:14:44,421 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:14:44,421 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_pre_layer5" 2026-03-07 10:14:44,425 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:14:44,425 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:14:44,431 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:14:44,431 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:14:44,431 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:14:44,431 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:14:44,605 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:14:44,625 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:14:44,659 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:14:44,688 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:14:44,689 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:14:44,699 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:14:47,231 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:14:47,261 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:14:47,359 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:14:47,369 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 3s. 2026-03-07 10:14:47,524 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:14:48,483 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:14:48,928 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:14:49,000 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:14:49,330 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:14:49,681 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:14:49,794 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:14:49,806 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:14:50,517 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.967 2026-03-07 10:14:50,518 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:14:50,549 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:14:50,588 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:14:50,589 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:14:51,730 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:14:51,730 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:14:51,730 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:14:51,730 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:14:51,730 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:14:51,730 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:14:51,730 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:14:51,730 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:14:51,730 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:14:51,730 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:14:51,730 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:14:51,730 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_layer4_conv_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_layer4_conv_mpk.json, LFM2.5-1.2B-Instruct_language_n1_layer4_conv_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:14:52,019 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_post_layer5.sima 2026-03-07 10:14:52,065 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:14:52,065 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_post_layer5" 2026-03-07 10:14:52,067 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:14:52,067 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:14:52,081 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:14:52,081 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:14:52,081 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:14:52,081 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:14:52,123 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:14:52,128 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:14:52,139 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:14:52,283 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:14:52,283 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:14:52,292 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:14:52,923 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:14:52,929 - mlc.test_util.test_context - INFO - Compression done in 14s. Compression ratio: 0.953 2026-03-07 10:14:52,929 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:14:53,096 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:14:53,377 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:14:53,377 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:14:53,539 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:14:54,600 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:14:55,118 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:14:55,164 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:14:56,500 - mlc.test_util.test_context - INFO - Compression done in 1s. Compression ratio: 0.94 2026-03-07 10:14:56,500 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:14:56,514 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:14:56,664 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:14:56,664 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:14:57,118 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:14:57,144 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:14:57,988 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:14:57,988 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 12s. 2026-03-07 10:14:59,429 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:15:00,112 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:15:00,251 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:15:00,275 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:15:00,587 - mlc.test_util.test_context - INFO - Compression done in 11s. Compression ratio: 0.956 2026-03-07 10:15:00,587 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:15:00,712 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:15:00,941 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:15:00,941 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:15:02,730 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:15:02,733 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 2s. 2026-03-07 10:15:06,143 - mlc.test_util.test_context - INFO - Compression done in 5s. Compression ratio: 0.977 2026-03-07 10:15:06,143 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:15:06,286 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:15:06,371 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:15:06,371 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:15:06,609 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 1s. 2026-03-07 10:15:07,242 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:15:07,242 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:15:07,242 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:15:07,242 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:15:07,242 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:15:07,242 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:15:07,242 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:15:07,242 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:15:07,242 - afe.backends.mpk.interface - INFO - EV74: 7 2026-03-07 10:15:07,242 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:15:07,242 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:15:07,242 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_pre_layer5_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_pre_layer5_mpk.json, LFM2.5-1.2B-Instruct_language_n128_pre_layer5_stage1_mla.elf, llima-compile 2026-03-07 10:15:07,554 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:15:07,555 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:15:07,555 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:15:07,555 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:15:07,555 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:15:07,555 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:15:07,555 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:15:07,555 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:15:07,555 - afe.backends.mpk.interface - INFO - EV74: 7 2026-03-07 10:15:07,555 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:15:07,555 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:15:07,555 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_pre_layer5_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_pre_layer5_mpk.json, LFM2.5-1.2B-Instruct_language_n1_pre_layer5_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:15:07,761 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:15:07,761 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:15:07,761 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:15:07,761 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:15:07,761 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:15:07,761 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:15:07,761 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:15:07,761 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:15:07,761 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:15:07,761 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:15:07,761 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:15:07,761 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_layer3_conv_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_layer3_conv_mpk.json, LFM2.5-1.2B-Instruct_language_n128_layer3_conv_stage1_mla.elf, llima-compile 2026-03-07 10:15:07,819 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_layer6_conv.sima 2026-03-07 10:15:07,903 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:15:07,903 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_layer6_conv" 2026-03-07 10:15:07,907 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:15:07,907 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:15:07,947 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:15:07,948 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:15:07,948 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:15:07,948 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:15:07,980 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_layer6_conv.sima 2026-03-07 10:15:08,051 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:15:08,051 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_layer6_conv" 2026-03-07 10:15:08,054 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:15:08,054 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:15:08,083 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:15:08,083 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:15:08,083 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:15:08,083 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:15:08,159 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:15:08,187 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:15:08,209 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:15:08,389 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:15:08,390 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:15:08,403 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:15:08,752 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_layer7_conv.sima 2026-03-07 10:15:08,818 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:15:08,818 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_layer7_conv" 2026-03-07 10:15:08,822 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:15:08,822 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:15:08,842 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:15:08,842 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:15:08,843 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:15:08,843 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:15:12,365 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:15:12,784 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:15:12,834 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:15:13,151 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:15:13,152 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:15:13,166 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:15:13,671 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:15:14,089 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:15:14,138 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:15:14,460 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:15:14,461 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:15:14,473 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:15:15,018 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:15:15,054 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:15:18,096 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:15:19,058 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:15:19,247 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:15:19,280 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:15:22,230 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:15:22,238 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 1s. 2026-03-07 10:15:25,119 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:15:25,237 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:15:25,753 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:15:25,754 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:15:25,754 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:15:25,754 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:15:25,754 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:15:25,754 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:15:25,754 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:15:25,754 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:15:25,754 - afe.backends.mpk.interface - INFO - EV74: 3 2026-03-07 10:15:25,754 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:15:25,754 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:15:25,754 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_post_layer5_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n1_post_layer5_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_post_layer5_mpk.json, llima-compile 2026-03-07 10:15:26,191 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_layer7_conv.sima 2026-03-07 10:15:26,246 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:15:26,246 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_layer7_conv" 2026-03-07 10:15:26,249 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:15:26,249 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:15:26,269 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:15:26,270 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:15:26,270 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:15:26,270 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:15:26,345 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:15:26,373 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:15:26,393 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:15:26,570 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:15:26,571 - mlc.test_util.test_context - INFO - Compression done in 7s. Compression ratio: 0.977 2026-03-07 10:15:26,571 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:15:26,571 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:15:26,584 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:15:26,736 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:15:26,858 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:15:27,423 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:15:27,532 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:15:27,532 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:15:31,932 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:15:31,942 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 8s. 2026-03-07 10:15:32,150 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:15:32,162 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 13s. 2026-03-07 10:15:33,215 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:15:33,251 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:15:35,842 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:15:36,409 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:15:36,737 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:15:37,070 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:15:37,369 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:15:37,554 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:15:37,588 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:15:37,638 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:15:37,721 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:15:37,975 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:15:38,573 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:15:38,661 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:15:40,767 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:15:40,768 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:15:40,768 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:15:40,768 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:15:40,768 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:15:40,768 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:15:40,768 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:15:40,768 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:15:40,768 - afe.backends.mpk.interface - INFO - EV74: 3 2026-03-07 10:15:40,768 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:15:40,768 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:15:40,768 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_post_layer5_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n128_post_layer5_mpk.json, LFM2.5-1.2B-Instruct_language_n128_post_layer5_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:15:41,174 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_pre_layer8.sima 2026-03-07 10:15:41,196 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:15:41,196 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_pre_layer8" 2026-03-07 10:15:41,200 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:15:41,200 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:15:41,205 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:15:41,206 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:15:41,206 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:15:41,206 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:15:42,917 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:15:42,917 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:15:42,917 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:15:42,917 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:15:42,917 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:15:42,917 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:15:42,917 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:15:42,917 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:15:42,917 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:15:42,917 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:15:42,917 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:15:42,917 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_layer4_conv_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n128_layer4_conv_mpk.json, LFM2.5-1.2B-Instruct_language_n128_layer4_conv_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:15:43,363 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_post_layer8.sima 2026-03-07 10:15:43,419 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:15:43,420 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_post_layer8" 2026-03-07 10:15:43,423 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:15:43,423 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:15:43,455 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:15:43,456 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:15:43,456 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:15:43,456 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:15:44,875 - mlc.test_util.test_context - INFO - Compression done in 7s. Compression ratio: 0.973 2026-03-07 10:15:44,876 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:15:45,056 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:15:45,167 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:15:45,167 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:15:46,651 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:15:46,663 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 3s. 2026-03-07 10:15:47,753 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:15:47,787 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:15:47,812 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:15:47,848 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:15:48,220 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:15:48,238 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:15:48,239 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:15:48,248 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:15:48,257 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:15:48,430 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:15:48,431 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:15:48,446 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:15:50,961 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:15:50,961 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:15:50,961 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:15:50,961 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:15:50,961 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:15:50,961 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:15:50,961 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:15:50,961 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:15:50,961 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:15:50,961 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:15:50,961 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:15:50,961 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_layer6_conv_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n1_layer6_conv_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_layer6_conv_mpk.json, llima-compile 2026-03-07 10:15:51,370 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_pre_layer8.sima 2026-03-07 10:15:51,392 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:15:51,392 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_pre_layer8" 2026-03-07 10:15:51,395 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:15:51,396 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:15:51,401 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:15:51,401 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:15:51,401 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:15:51,401 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:15:51,575 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:15:51,596 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:15:51,629 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:15:51,654 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:15:51,655 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:15:51,665 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:15:51,696 - mlc.test_util.test_context - INFO - Compression done in 13s. Compression ratio: 0.958 2026-03-07 10:15:51,696 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:15:51,863 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:15:52,136 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:15:52,136 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:15:52,792 - mlc.test_util.test_context - INFO - Compression done in 14s. Compression ratio: 0.952 2026-03-07 10:15:52,792 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:15:52,962 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:15:53,236 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:15:53,236 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:15:54,205 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:15:54,236 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:15:56,291 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:15:56,645 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:15:56,760 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:15:56,771 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:15:57,547 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.967 2026-03-07 10:15:57,548 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:15:57,579 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:15:57,595 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:15:57,617 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:15:57,617 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:15:57,684 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:15:58,218 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:15:58,347 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:15:59,940 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:16:04,454 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:16:04,455 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 3s. 2026-03-07 10:16:05,230 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:16:06,164 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:16:06,611 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:16:06,684 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:16:08,768 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:16:08,768 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:16:08,768 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:16:08,768 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:16:08,768 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:16:08,768 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:16:08,768 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:16:08,768 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:16:08,768 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:16:08,768 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:16:08,768 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:16:08,768 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_layer7_conv_mpk.json, LFM2.5-1.2B-Instruct_language_n1_layer7_conv_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_layer7_conv_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:16:09,045 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_post_layer8.sima 2026-03-07 10:16:09,090 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:16:09,090 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_post_layer8" 2026-03-07 10:16:09,092 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:16:09,092 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:16:09,106 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:16:09,107 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:16:09,107 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:16:09,107 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:16:09,126 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:16:09,148 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:16:09,153 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:16:09,164 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:16:09,305 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:16:09,306 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:16:09,313 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:16:10,186 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:16:10,701 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:16:10,749 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:16:11,024 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 0s. 2026-03-07 10:16:11,978 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:16:11,978 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:16:11,978 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:16:11,978 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:16:11,978 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:16:11,978 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:16:11,978 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:16:11,979 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:16:11,979 - afe.backends.mpk.interface - INFO - EV74: 7 2026-03-07 10:16:11,979 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:16:11,979 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:16:11,979 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_pre_layer8_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_pre_layer8_mpk.json, LFM2.5-1.2B-Instruct_language_n1_pre_layer8_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:16:12,110 - mlc.test_util.test_context - INFO - Compression done in 1s. Compression ratio: 0.942 2026-03-07 10:16:12,110 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:16:12,124 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:16:12,277 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:16:12,277 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:16:12,527 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_layer9_conv.sima 2026-03-07 10:16:12,589 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:16:12,590 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_layer9_conv" 2026-03-07 10:16:12,594 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:16:12,594 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:16:12,613 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:16:12,614 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:16:12,614 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:16:12,614 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:16:14,225 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:16:14,251 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:16:16,541 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:16:17,383 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:16:17,798 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:16:17,810 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:16:17,847 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:16:17,948 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:16:17,973 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:16:18,042 - mlc.test_util.test_context - INFO - Compression done in 11s. Compression ratio: 0.956 2026-03-07 10:16:18,042 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:16:18,165 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:16:18,168 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:16:18,169 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:16:18,181 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:16:18,288 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:16:18,291 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 2s. 2026-03-07 10:16:18,395 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:16:18,395 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:16:22,849 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:16:22,849 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:16:22,849 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:16:22,849 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:16:22,849 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:16:22,849 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:16:22,849 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:16:22,849 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:16:22,849 - afe.backends.mpk.interface - INFO - EV74: 7 2026-03-07 10:16:22,849 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:16:22,849 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:16:22,849 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_pre_layer8_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n128_pre_layer8_mpk.json, LFM2.5-1.2B-Instruct_language_n128_pre_layer8_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:16:23,284 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_layer9_conv.sima 2026-03-07 10:16:23,352 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:16:23,352 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_layer9_conv" 2026-03-07 10:16:23,355 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:16:23,355 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:16:23,369 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:16:23,370 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:16:23,370 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:16:23,370 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:16:23,445 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:16:23,473 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:16:23,494 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:16:23,670 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:16:23,671 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:16:23,684 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:16:23,867 - mlc.test_util.test_context - INFO - Compression done in 5s. Compression ratio: 0.976 2026-03-07 10:16:23,867 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:16:24,010 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:16:24,094 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:16:24,094 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:16:30,134 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:16:30,252 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:16:30,267 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:16:30,303 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:16:30,331 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:16:30,343 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 13s. 2026-03-07 10:16:30,940 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:16:30,940 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 12s. 2026-03-07 10:16:33,334 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:16:34,294 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:16:34,483 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:16:34,519 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:16:39,586 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:16:39,594 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 1s. 2026-03-07 10:16:40,612 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:16:40,820 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:16:40,820 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:16:40,820 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:16:40,820 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:16:40,820 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:16:40,820 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:16:40,820 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:16:40,820 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:16:40,820 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:16:40,820 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:16:40,820 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:16:40,820 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_layer6_conv_mpk.json, LFM2.5-1.2B-Instruct_language_n128_layer6_conv_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n128_layer6_conv_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:16:41,232 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_pre_layer10.sima 2026-03-07 10:16:41,260 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:16:41,260 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_pre_layer10" 2026-03-07 10:16:41,264 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:16:41,264 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:16:41,269 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:16:41,270 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:16:41,270 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:16:41,270 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:16:41,348 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:16:41,348 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:16:41,349 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:16:41,349 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:16:41,349 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:16:41,349 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:16:41,349 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:16:41,349 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:16:41,349 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:16:41,349 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:16:41,349 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:16:41,349 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_layer7_conv_mpk.json, LFM2.5-1.2B-Instruct_language_n128_layer7_conv_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_layer7_conv_stage1_mla.elf, llima-compile 2026-03-07 10:16:41,754 - mlc.test_util.test_context - INFO - Compression done in 7s. Compression ratio: 0.971 2026-03-07 10:16:41,754 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:16:41,756 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_post_layer10.sima 2026-03-07 10:16:41,820 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:16:41,820 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_post_layer10" 2026-03-07 10:16:41,823 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:16:41,823 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:16:41,838 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:16:41,838 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:16:41,839 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:16:41,839 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:16:41,839 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:16:41,934 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:16:42,042 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:16:42,042 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:16:42,398 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:16:42,480 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:16:42,993 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:16:42,993 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:16:42,993 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:16:42,993 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:16:42,993 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:16:42,993 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:16:42,993 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:16:42,993 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:16:42,993 - afe.backends.mpk.interface - INFO - EV74: 3 2026-03-07 10:16:42,993 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:16:42,994 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:16:42,994 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_post_layer8_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_post_layer8_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n1_post_layer8_mpk.json, llima-compile 2026-03-07 10:16:43,397 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_pre_layer10.sima 2026-03-07 10:16:43,417 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:16:43,417 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_pre_layer10" 2026-03-07 10:16:43,421 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:16:43,421 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:16:43,426 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:16:43,427 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:16:43,427 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:16:43,427 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:16:43,599 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:16:43,620 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:16:43,653 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:16:43,677 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:16:43,678 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:16:43,688 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:16:46,154 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:16:46,198 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:16:46,214 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:16:46,228 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:16:46,251 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:16:46,634 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:16:46,635 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:16:46,644 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:16:48,089 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:16:48,279 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:16:48,519 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:16:48,555 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:16:48,629 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:16:48,726 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:16:48,727 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:16:48,740 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:16:48,741 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:16:48,752 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:16:48,936 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:16:48,936 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 8s. 2026-03-07 10:16:49,468 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.968 2026-03-07 10:16:49,468 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:16:49,497 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:16:49,536 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:16:49,536 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:16:51,896 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:16:55,836 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:16:55,926 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:16:56,527 - mlc.test_util.test_context - INFO - Compression done in 14s. Compression ratio: 0.95 2026-03-07 10:16:56,527 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:16:56,692 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:16:56,971 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:16:56,971 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:16:57,164 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:16:57,164 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:16:57,164 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:16:57,164 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:16:57,164 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:16:57,164 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:16:57,164 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:16:57,164 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:16:57,164 - afe.backends.mpk.interface - INFO - EV74: 3 2026-03-07 10:16:57,164 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:16:57,164 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:16:57,164 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_post_layer8_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_post_layer8_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n128_post_layer8_mpk.json, llima-compile 2026-03-07 10:16:57,465 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_post_layer10.sima 2026-03-07 10:16:57,512 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:16:57,513 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_post_layer10" 2026-03-07 10:16:57,514 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:16:57,514 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:16:57,535 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:16:57,535 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:16:57,535 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:16:57,536 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:16:57,577 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:16:57,581 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:16:57,592 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:16:57,734 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:16:57,735 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:16:57,743 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:16:58,039 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:16:58,173 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:17:00,858 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:17:00,858 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 3s. 2026-03-07 10:17:01,088 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 0s. 2026-03-07 10:17:02,115 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:17:02,116 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:17:02,116 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:17:02,116 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:17:02,116 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:17:02,116 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:17:02,116 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:17:02,116 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:17:02,116 - afe.backends.mpk.interface - INFO - EV74: 7 2026-03-07 10:17:02,116 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:17:02,116 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:17:02,116 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_pre_layer10_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_pre_layer10_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n1_pre_layer10_mpk.json, llima-compile 2026-03-07 10:17:02,708 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_layer11_conv.sima 2026-03-07 10:17:02,773 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:17:02,773 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_layer11_conv" 2026-03-07 10:17:02,777 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:17:02,777 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:17:02,799 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:17:02,799 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:17:02,799 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:17:02,799 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:17:03,089 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:17:03,116 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:17:04,089 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:17:05,103 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:17:05,258 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:17:05,259 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:17:05,259 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:17:05,259 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:17:05,259 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:17:05,259 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:17:05,259 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:17:05,259 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:17:05,259 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:17:05,259 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:17:05,259 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:17:05,259 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_layer9_conv_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_layer9_conv_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n1_layer9_conv_mpk.json, llima-compile 2026-03-07 10:17:05,420 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:17:05,539 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:17:05,609 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:17:05,730 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_layer11_conv.sima 2026-03-07 10:17:05,798 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:17:05,798 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_layer11_conv" 2026-03-07 10:17:05,803 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:17:05,803 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:17:05,832 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:17:05,833 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:17:05,833 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:17:05,833 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:17:05,909 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:17:05,937 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:17:05,958 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:17:06,096 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:17:06,135 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:17:06,136 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:17:06,149 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:17:06,235 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:17:06,259 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:17:07,980 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:17:08,391 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:17:08,440 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:17:08,762 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:17:08,764 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:17:08,776 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:17:09,035 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:17:10,083 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:17:10,604 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:17:10,650 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:17:11,976 - mlc.test_util.test_context - INFO - Compression done in 1s. Compression ratio: 0.941 2026-03-07 10:17:11,977 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:17:11,991 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:17:12,142 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:17:12,142 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:17:12,181 - mlc.test_util.test_context - INFO - Compression done in 5s. Compression ratio: 0.974 2026-03-07 10:17:12,181 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:17:12,325 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:17:12,408 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:17:12,409 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:17:12,687 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:17:12,723 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:17:15,755 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:17:17,028 - mlc.test_util.test_context - INFO - Compression done in 11s. Compression ratio: 0.952 2026-03-07 10:17:17,028 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:17:17,149 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:17:17,373 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:17:17,374 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:17:17,374 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:17:17,561 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:17:17,595 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:17:18,152 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:17:18,155 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 2s. 2026-03-07 10:17:20,981 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:17:21,099 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:17:22,662 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:17:22,662 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:17:22,662 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:17:22,662 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:17:22,662 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:17:22,662 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:17:22,662 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:17:22,662 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:17:22,662 - afe.backends.mpk.interface - INFO - EV74: 7 2026-03-07 10:17:22,662 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:17:22,662 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:17:22,662 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_pre_layer10_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n128_pre_layer10_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_pre_layer10_mpk.json, llima-compile 2026-03-07 10:17:23,071 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_pre_layer12.sima 2026-03-07 10:17:23,091 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:17:23,091 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_pre_layer12" 2026-03-07 10:17:23,094 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:17:23,095 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:17:23,100 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:17:23,100 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:17:23,100 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:17:23,100 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:17:24,830 - mlc.test_util.test_context - INFO - Compression done in 7s. Compression ratio: 0.973 2026-03-07 10:17:24,830 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:17:25,008 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:17:25,117 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:17:25,117 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:17:27,862 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:17:27,863 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 1s. 2026-03-07 10:17:29,734 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:17:30,167 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:17:30,203 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:17:30,375 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:17:30,376 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:17:30,390 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:17:30,604 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:17:31,293 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:17:31,293 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:17:31,293 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:17:31,293 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:17:31,293 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:17:31,293 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:17:31,293 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:17:31,293 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:17:31,293 - afe.backends.mpk.interface - INFO - EV74: 3 2026-03-07 10:17:31,293 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:17:31,293 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:17:31,293 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_post_layer10_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n1_post_layer10_mpk.json, LFM2.5-1.2B-Instruct_language_n1_post_layer10_stage1_mla.elf, llima-compile 2026-03-07 10:17:31,720 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_post_layer12.sima 2026-03-07 10:17:31,772 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:17:31,772 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_post_layer12" 2026-03-07 10:17:31,775 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:17:31,775 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:17:31,796 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:17:31,796 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:17:31,796 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:17:31,796 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:17:31,825 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:17:32,384 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:17:32,468 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:17:34,707 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:17:34,719 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 12s. 2026-03-07 10:17:36,072 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:17:36,131 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:17:36,168 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:17:36,588 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:17:36,589 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:17:36,599 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:17:39,703 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:17:39,834 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:17:44,147 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:17:44,157 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 3s. 2026-03-07 10:17:44,576 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:17:44,576 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:17:44,576 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:17:44,576 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:17:44,576 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:17:44,576 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:17:44,576 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:17:44,576 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:17:44,576 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:17:44,576 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:17:44,576 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:17:44,576 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_layer9_conv_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n128_layer9_conv_mpk.json, LFM2.5-1.2B-Instruct_language_n128_layer9_conv_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:17:44,989 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_pre_layer12.sima 2026-03-07 10:17:45,009 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:17:45,009 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_pre_layer12" 2026-03-07 10:17:45,013 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:17:45,013 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:17:45,018 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:17:45,018 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:17:45,018 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:17:45,018 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:17:45,191 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:17:45,212 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:17:45,245 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:17:45,270 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:17:45,271 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:17:45,280 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:17:45,956 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:17:46,046 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:17:46,549 - mlc.test_util.test_context - INFO - Compression done in 14s. Compression ratio: 0.952 2026-03-07 10:17:46,549 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:17:46,714 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:17:46,985 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:17:46,985 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:17:47,807 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:17:47,839 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:17:48,482 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:17:48,493 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 8s. 2026-03-07 10:17:48,553 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:17:48,553 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:17:48,553 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:17:48,553 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:17:48,553 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:17:48,553 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:17:48,553 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:17:48,553 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:17:48,553 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:17:48,553 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:17:48,553 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:17:48,553 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_layer11_conv_mpk.json, LFM2.5-1.2B-Instruct_language_n1_layer11_conv_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_layer11_conv_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:17:48,823 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_post_layer12.sima 2026-03-07 10:17:48,870 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:17:48,870 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_post_layer12" 2026-03-07 10:17:48,871 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:17:48,872 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:17:48,882 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:17:48,882 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:17:48,882 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:17:48,882 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:17:48,924 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:17:48,929 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:17:48,940 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:17:49,078 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:17:49,081 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:17:49,090 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:17:49,898 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:17:50,249 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:17:50,365 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:17:50,376 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:17:50,600 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:17:51,094 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.958 2026-03-07 10:17:51,095 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:17:51,126 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:17:51,166 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:17:51,166 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:17:51,678 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:17:52,202 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:17:52,249 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:17:53,486 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:17:53,551 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:17:53,586 - mlc.test_util.test_context - INFO - Compression done in 1s. Compression ratio: 0.927 2026-03-07 10:17:53,586 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:17:53,601 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:17:53,753 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:17:53,753 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:17:54,025 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:17:54,051 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:17:54,499 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:17:54,943 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:17:55,011 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:17:56,321 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:17:56,686 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:17:56,686 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:17:56,686 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:17:56,686 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:17:56,686 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:17:56,686 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:17:56,686 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:17:56,686 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:17:56,686 - afe.backends.mpk.interface - INFO - EV74: 3 2026-03-07 10:17:56,686 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:17:56,686 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:17:56,686 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_post_layer10_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_post_layer10_mpk.json, LFM2.5-1.2B-Instruct_language_n128_post_layer10_stage1_mla.elf, llima-compile 2026-03-07 10:17:57,001 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:17:57,143 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:17:57,169 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:17:57,256 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_layer13_conv.sima 2026-03-07 10:17:57,322 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:17:57,322 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_layer13_conv" 2026-03-07 10:17:57,326 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:17:57,326 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:17:57,367 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:17:57,368 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:17:57,368 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:17:57,368 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:17:59,653 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:17:59,658 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 2s. 2026-03-07 10:18:02,205 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:18:02,628 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:18:02,678 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:18:02,830 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 0s. 2026-03-07 10:18:02,977 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:18:02,978 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:18:02,990 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:18:03,035 - mlc.test_util.test_context - INFO - Compression done in 5s. Compression ratio: 0.976 2026-03-07 10:18:03,035 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:18:03,178 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:18:03,263 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:18:03,263 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:18:03,779 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:18:03,779 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:18:03,779 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:03,780 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:18:03,780 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:18:03,780 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:03,780 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:18:03,780 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:18:03,780 - afe.backends.mpk.interface - INFO - EV74: 7 2026-03-07 10:18:03,780 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:18:03,780 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:03,780 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_pre_layer12_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n1_pre_layer12_mpk.json, LFM2.5-1.2B-Instruct_language_n1_pre_layer12_stage1_mla.elf, llima-compile 2026-03-07 10:18:04,178 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:18:04,178 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:18:04,178 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:04,178 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:18:04,178 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:18:04,178 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:04,178 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:18:04,178 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:18:04,178 - afe.backends.mpk.interface - INFO - EV74: 7 2026-03-07 10:18:04,178 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:18:04,178 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:04,178 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_pre_layer12_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n128_pre_layer12_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_pre_layer12_mpk.json, llima-compile 2026-03-07 10:18:04,222 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_layer13_conv.sima 2026-03-07 10:18:04,279 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:18:04,279 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_layer13_conv" 2026-03-07 10:18:04,282 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:18:04,282 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:18:04,298 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:18:04,299 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:18:04,299 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:18:04,299 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:18:04,375 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:18:04,403 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:18:04,423 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:18:04,586 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_pre_layer14.sima 2026-03-07 10:18:04,598 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:18:04,598 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:18:04,607 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:18:04,607 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_pre_layer14" 2026-03-07 10:18:04,610 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:18:04,610 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:18:04,611 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:18:04,615 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:18:04,616 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:18:04,616 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:18:04,616 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:18:07,397 - mlc.test_util.test_context - INFO - Compression done in 12s. Compression ratio: 0.956 2026-03-07 10:18:07,397 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:18:07,521 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:18:07,755 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:18:07,755 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:18:11,067 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:18:11,100 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:18:11,134 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:18:11,494 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:18:11,530 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:18:11,699 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:18:11,700 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:18:11,714 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:18:14,709 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:18:15,127 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:18:15,245 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:18:15,679 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:18:15,866 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:18:15,895 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:18:18,950 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:18:18,951 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 1s. 2026-03-07 10:18:21,845 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:18:21,976 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:18:22,399 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:18:22,399 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:18:22,399 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:22,399 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:18:22,399 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:18:22,399 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:22,399 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:18:22,399 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:18:22,399 - afe.backends.mpk.interface - INFO - EV74: 3 2026-03-07 10:18:22,399 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:18:22,399 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:22,399 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_post_layer12_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_post_layer12_mpk.json, LFM2.5-1.2B-Instruct_language_n1_post_layer12_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:18:22,823 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_post_layer14.sima 2026-03-07 10:18:22,872 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:18:22,872 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_post_layer14" 2026-03-07 10:18:22,875 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:18:22,875 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:18:22,890 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:18:22,890 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:18:22,890 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:18:22,890 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:18:23,107 - mlc.test_util.test_context - INFO - Compression done in 7s. Compression ratio: 0.976 2026-03-07 10:18:23,107 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:18:23,289 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:18:23,400 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:18:23,400 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:18:24,639 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:18:24,652 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 12s. 2026-03-07 10:18:24,788 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:18:26,004 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:18:26,580 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:18:26,663 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:18:27,103 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:18:27,161 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:18:27,197 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:18:27,585 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:18:27,586 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:18:27,595 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:18:32,813 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:18:33,873 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:18:34,384 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:18:34,431 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:18:34,483 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:18:34,483 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:18:34,483 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:34,483 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:18:34,483 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:18:34,483 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:34,483 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:18:34,483 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:18:34,483 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:18:34,483 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:18:34,483 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:34,483 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_layer11_conv_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n128_layer11_conv_mpk.json, LFM2.5-1.2B-Instruct_language_n128_layer11_conv_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:18:34,904 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_pre_layer14.sima 2026-03-07 10:18:34,928 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:18:34,928 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_pre_layer14" 2026-03-07 10:18:34,932 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:18:34,932 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:18:34,937 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:18:34,937 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:18:34,937 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:18:34,937 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:18:35,109 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:18:35,130 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:18:35,163 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:18:35,186 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:18:35,187 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:18:35,197 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:18:35,771 - mlc.test_util.test_context - INFO - Compression done in 1s. Compression ratio: 0.888 2026-03-07 10:18:35,771 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:18:35,795 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:18:35,991 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:18:35,992 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:18:37,227 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:18:37,316 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:18:37,713 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:18:37,744 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:18:39,249 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:18:39,262 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 8s. 2026-03-07 10:18:40,254 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:18:40,608 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:18:40,698 - mlc.test_util.test_context - INFO - Compression done in 13s. Compression ratio: 0.957 2026-03-07 10:18:40,698 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:18:40,720 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:18:40,731 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:18:40,864 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:18:41,143 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:18:41,143 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:18:41,451 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.936 2026-03-07 10:18:41,451 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:18:41,482 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:18:41,521 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:18:41,521 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:18:41,885 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:18:41,889 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 2s. 2026-03-07 10:18:42,418 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:18:42,429 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 3s. 2026-03-07 10:18:43,823 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:18:44,719 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:18:45,650 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:18:46,074 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:18:46,141 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:18:46,761 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:18:46,761 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:18:46,761 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:46,761 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:18:46,762 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:18:46,762 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:46,762 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:18:46,762 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:18:46,762 - afe.backends.mpk.interface - INFO - EV74: 7 2026-03-07 10:18:46,762 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:18:46,762 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:46,762 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_pre_layer14_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_pre_layer14_mpk.json, LFM2.5-1.2B-Instruct_language_n128_pre_layer14_stage1_mla.elf, llima-compile 2026-03-07 10:18:47,001 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:18:47,001 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:18:47,001 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:47,001 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:18:47,001 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:18:47,001 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:47,001 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:18:47,001 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:18:47,001 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:18:47,001 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:18:47,001 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:47,001 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_layer13_conv_mpk.json, LFM2.5-1.2B-Instruct_language_n1_layer13_conv_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_layer13_conv_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:18:47,032 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_post_layer14.sima 2026-03-07 10:18:47,078 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:18:47,078 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_post_layer14" 2026-03-07 10:18:47,080 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:18:47,080 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:18:47,091 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:18:47,091 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:18:47,091 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:18:47,091 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:18:47,132 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:18:47,137 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:18:47,148 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:18:47,210 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_layer15_conv.sima 2026-03-07 10:18:47,242 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:18:47,242 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_layer15_conv" 2026-03-07 10:18:47,244 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:18:47,244 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:18:47,249 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:18:47,250 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:18:47,250 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:18:47,250 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:18:47,292 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:18:47,293 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:18:47,302 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:18:47,483 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:18:47,483 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:18:47,483 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:47,483 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:18:47,483 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:18:47,483 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:47,483 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:18:47,483 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:18:47,483 - afe.backends.mpk.interface - INFO - EV74: 3 2026-03-07 10:18:47,484 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:18:47,484 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:47,484 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_post_layer12_mpk.json, LFM2.5-1.2B-Instruct_language_n128_post_layer12_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_post_layer12_stage1_mla.elf, llima-compile 2026-03-07 10:18:47,696 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_layer15_conv.sima 2026-03-07 10:18:47,720 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:18:47,720 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_layer15_conv" 2026-03-07 10:18:47,722 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:18:47,722 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:18:47,727 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:18:47,727 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:18:47,727 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:18:47,727 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:18:47,774 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:18:47,799 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:18:47,813 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:18:47,855 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:18:47,856 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:18:47,865 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:18:48,148 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:18:48,519 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:18:48,535 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:18:48,709 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:18:48,710 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:18:49,082 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:18:49,575 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:18:49,588 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:18:50,659 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:18:51,140 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:18:51,201 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:18:51,211 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:18:52,168 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:18:52,194 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:18:52,376 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 0s. 2026-03-07 10:18:53,022 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:18:53,025 - mlc.test_util.test_context - INFO - Compression done in 1s. Compression ratio: 0.978 2026-03-07 10:18:53,025 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:18:53,068 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:18:53,070 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:18:53,107 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:18:53,107 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:18:53,356 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:18:53,356 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:18:53,356 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:53,356 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:18:53,356 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:18:53,356 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:53,356 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:18:53,356 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:18:53,357 - afe.backends.mpk.interface - INFO - EV74: 7 2026-03-07 10:18:53,357 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:18:53,357 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:18:53,357 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_pre_layer14_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n1_pre_layer14_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_pre_layer14_mpk.json, llima-compile 2026-03-07 10:18:53,633 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token0.sima 2026-03-07 10:18:53,644 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:18:53,644 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_cache_token0" 2026-03-07 10:18:53,648 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:18:53,648 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:18:53,649 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:18:53,649 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:18:53,649 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:18:53,649 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:18:54,495 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:18:55,170 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:18:55,308 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:18:55,334 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:18:57,481 - mlc.test_util.test_context - INFO - Compression done in 11s. Compression ratio: 0.958 2026-03-07 10:18:57,481 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:18:57,511 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:18:57,568 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:18:57,603 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:18:57,778 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:18:57,789 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:18:57,827 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:18:57,827 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:18:57,996 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:18:57,996 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:18:58,001 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:18:58,036 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:18:58,089 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:18:58,283 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:18:58,309 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:19:01,811 - mlc.test_util.test_context - INFO - Compression done in 3s. Compression ratio: 0.957 2026-03-07 10:19:01,812 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:19:01,837 - mlc.test_util.test_context - INFO - Compression done in 6s. Compression ratio: 0.978 2026-03-07 10:19:01,837 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:19:01,852 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:19:01,939 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:19:01,939 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:19:01,981 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:19:02,067 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:19:02,067 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:19:07,014 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 2s. 2026-03-07 10:19:08,338 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:19:08,338 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:19:08,338 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:19:08,338 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:19:08,338 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:19:08,338 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:19:08,338 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:19:08,338 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:19:08,338 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:19:08,338 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:19:08,338 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:19:08,338 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_layer15_conv_mpk.json, LFM2.5-1.2B-Instruct_language_n1_layer15_conv_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_layer15_conv_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:19:08,615 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token128.sima 2026-03-07 10:19:08,626 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:19:08,626 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_cache_token128" 2026-03-07 10:19:08,629 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:19:08,630 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:19:08,631 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:19:08,631 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:19:08,631 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:19:08,631 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:19:11,118 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:19:11,185 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:19:11,844 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:19:11,851 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 5s. 2026-03-07 10:19:12,577 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:19:12,808 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:19:12,820 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:19:13,090 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:19:13,091 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:19:13,096 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:19:15,223 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:19:15,223 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:19:15,223 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:19:15,223 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:19:15,223 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:19:15,223 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:19:15,223 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:19:15,223 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:19:15,223 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:19:15,223 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:19:15,223 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:19:15,223 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_layer15_conv_mpk.json, LFM2.5-1.2B-Instruct_language_n128_layer15_conv_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n128_layer15_conv_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:19:15,495 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token256.sima 2026-03-07 10:19:15,505 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:19:15,505 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_cache_token256" 2026-03-07 10:19:15,508 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:19:15,508 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:19:15,509 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:19:15,510 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:19:15,510 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:19:15,510 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:19:16,799 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:19:17,394 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:19:17,647 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:19:17,663 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:19:17,664 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:19:17,675 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 1s. 2026-03-07 10:19:17,680 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.555 2026-03-07 10:19:17,681 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:19:17,683 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:19:17,741 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:19:17,741 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:19:18,915 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:19:18,917 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 0s. 2026-03-07 10:19:18,988 - mlc.compiler.model_graph.l1_based - INFO - Trigger large tensor support due to: MLA_0/tuple_einsum_1, 522 2026-03-07 10:19:18,988 - mlc.compiler.model_graph.l1_based - INFO - Segmenting layers by spatial/channel dimension 2026-03-07 10:19:19,193 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:19:19,193 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 12s. 2026-03-07 10:19:20,958 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:19:20,958 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:19:20,958 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:19:20,958 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:19:20,958 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:19:20,958 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:19:20,959 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:19:20,959 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:19:20,959 - afe.backends.mpk.interface - INFO - EV74: 4 2026-03-07 10:19:20,959 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:19:20,959 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:19:20,959 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_cache_token0_mpk.json, LFM2.5-1.2B-Instruct_language_n128_cache_token0_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n128_cache_token0_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:19:21,109 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:19:21,109 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:19:21,109 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:19:21,109 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:19:21,109 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:19:21,109 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:19:21,109 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:19:21,109 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:19:21,109 - afe.backends.mpk.interface - INFO - EV74: 3 2026-03-07 10:19:21,109 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:19:21,109 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:19:21,109 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_post_layer14_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_post_layer14_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n1_post_layer14_mpk.json, llima-compile 2026-03-07 10:19:21,229 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token384.sima 2026-03-07 10:19:21,240 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:19:21,240 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_cache_token384" 2026-03-07 10:19:21,242 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:19:21,242 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:19:21,243 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:19:21,243 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:19:21,243 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:19:21,243 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:19:21,383 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token512.sima 2026-03-07 10:19:21,392 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:19:21,393 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_cache_token512" 2026-03-07 10:19:21,396 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:19:21,396 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:19:21,397 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:19:21,397 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:19:21,397 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:19:21,397 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:19:24,406 - mlc.compiler.model_graph.l1_based - INFO - Trigger large tensor support due to: MLA_0/tuple_einsum_1, 415 2026-03-07 10:19:24,406 - mlc.compiler.model_graph.l1_based - INFO - Segmenting layers by spatial/channel dimension 2026-03-07 10:19:24,773 - mlc.compiler.model_graph.l1_based - INFO - Trigger large tensor support due to: MLA_0/tuple_einsum_1, 397 2026-03-07 10:19:24,773 - mlc.compiler.model_graph.l1_based - INFO - Segmenting layers by spatial/channel dimension 2026-03-07 10:19:25,525 - mlc.compiler.model_graph.large_tensor_helper - INFO - Setting tile layouts 2026-03-07 10:19:25,721 - mlc.compiler.model_graph.large_tensor_helper - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:19:25,737 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:19:26,092 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:19:26,093 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:19:26,100 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:19:28,622 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:19:28,622 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 8s. 2026-03-07 10:19:28,885 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:19:28,886 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:19:28,886 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:19:28,886 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:19:28,886 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:19:28,886 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:19:28,886 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:19:28,886 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:19:28,886 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:19:28,886 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:19:28,886 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:19:28,886 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_layer13_conv_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n128_layer13_conv_mpk.json, LFM2.5-1.2B-Instruct_language_n128_layer13_conv_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:19:29,158 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token640.sima 2026-03-07 10:19:29,169 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:19:29,169 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_cache_token640" 2026-03-07 10:19:29,173 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:19:29,173 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:19:29,174 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:19:29,174 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:19:29,174 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:19:29,174 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:19:31,280 - mlc.compiler.model_graph.large_tensor_helper - INFO - Setting tile layouts 2026-03-07 10:19:31,410 - mlc.compiler.model_graph.large_tensor_helper - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:19:31,427 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:19:31,852 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:19:31,853 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:19:31,860 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:19:32,094 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:19:32,199 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:19:32,667 - mlc.compiler.model_graph.l1_based - INFO - Trigger large tensor support due to: MLA_0/tuple_einsum_1, 309 2026-03-07 10:19:32,668 - mlc.compiler.model_graph.l1_based - INFO - Segmenting layers by spatial/channel dimension 2026-03-07 10:19:33,289 - mlc.compiler.model_graph.large_tensor_helper - INFO - Setting tile layouts 2026-03-07 10:19:33,420 - mlc.compiler.model_graph.large_tensor_helper - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:19:33,441 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:19:33,931 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:19:33,932 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:19:33,941 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:19:36,749 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:19:36,750 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:19:36,750 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:19:36,750 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:19:36,750 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:19:36,750 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:19:36,750 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:19:36,750 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:19:36,750 - afe.backends.mpk.interface - INFO - EV74: 3 2026-03-07 10:19:36,750 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:19:36,750 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:19:36,750 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_post_layer14_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_post_layer14_mpk.json, LFM2.5-1.2B-Instruct_language_n128_post_layer14_stage1_mla.elf, llima-compile 2026-03-07 10:19:37,020 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token768.sima 2026-03-07 10:19:37,042 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:19:37,042 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_cache_token768" 2026-03-07 10:19:37,049 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:19:37,049 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:19:37,050 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:19:37,050 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:19:37,050 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:19:37,050 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:19:40,510 - mlc.compiler.model_graph.l1_based - INFO - Trigger large tensor support due to: MLA_0/tuple_einsum_1, 344 2026-03-07 10:19:40,511 - mlc.compiler.model_graph.l1_based - INFO - Segmenting layers by spatial/channel dimension 2026-03-07 10:19:41,914 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:19:42,635 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:19:43,061 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:19:43,098 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:19:43,118 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.391 2026-03-07 10:19:43,118 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:19:43,124 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:19:43,240 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:19:43,240 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:19:43,625 - mlc.compiler.model_graph.large_tensor_helper - INFO - Setting tile layouts 2026-03-07 10:19:43,915 - mlc.compiler.model_graph.large_tensor_helper - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:19:43,944 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:19:44,504 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:19:44,505 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:19:44,515 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:19:45,644 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:19:45,646 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 1s. 2026-03-07 10:19:48,470 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:19:48,582 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:19:49,702 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:19:49,702 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:19:49,702 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:19:49,702 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:19:49,702 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:19:49,702 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:19:49,702 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:19:49,702 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:19:49,702 - afe.backends.mpk.interface - INFO - EV74: 4 2026-03-07 10:19:49,702 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:19:49,702 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:19:49,702 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_cache_token128_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_cache_token128_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n128_cache_token128_mpk.json, llima-compile 2026-03-07 10:19:49,975 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token896.sima 2026-03-07 10:19:49,986 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:19:49,986 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_cache_token896" 2026-03-07 10:19:49,988 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:19:49,988 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:19:49,989 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:19:49,989 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:19:49,989 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:19:49,989 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:19:50,998 - mlc.compiler.model_graph.large_tensor_helper - INFO - Setting tile layouts 2026-03-07 10:19:51,240 - mlc.compiler.model_graph.large_tensor_helper - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:19:51,269 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:19:51,892 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:19:51,893 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:19:51,904 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:19:53,089 - mlc.compiler.model_graph.l1_based - INFO - Trigger large tensor support due to: MLA_0/tuple_einsum_1, 304 2026-03-07 10:19:53,089 - mlc.compiler.model_graph.l1_based - INFO - Segmenting layers by spatial/channel dimension 2026-03-07 10:19:56,806 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:19:56,945 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:19:58,287 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:19:58,835 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:19:58,985 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:19:59,060 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:19:59,502 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:19:59,541 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:19:59,564 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.318 2026-03-07 10:19:59,564 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:19:59,570 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:19:59,692 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:19:59,692 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:20:02,252 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:20:02,253 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 2s. 2026-03-07 10:20:03,206 - mlc.compiler.model_graph.large_tensor_helper - INFO - Setting tile layouts 2026-03-07 10:20:03,384 - mlc.compiler.model_graph.large_tensor_helper - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:20:03,415 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:20:04,106 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:20:04,107 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:20:04,129 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:20:06,722 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:20:06,722 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:20:06,722 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:20:06,722 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:20:06,722 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:20:06,722 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:20:06,722 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:20:06,722 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:20:06,722 - afe.backends.mpk.interface - INFO - EV74: 4 2026-03-07 10:20:06,722 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:20:06,722 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:20:06,722 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_cache_token256_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_cache_token256_mpk.json, LFM2.5-1.2B-Instruct_language_n128_cache_token256_stage1_mla.elf, llima-compile 2026-03-07 10:20:06,994 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token1024.sima 2026-03-07 10:20:07,004 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:20:07,004 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_cache_token1024" 2026-03-07 10:20:07,006 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:20:07,006 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:20:07,007 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:20:07,007 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:20:07,007 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:20:07,007 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:20:09,418 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:20:10,006 - mlc.compiler.model_graph.l1_based - INFO - Trigger large tensor support due to: MLA_0/tuple_einsum_1, 560 2026-03-07 10:20:10,006 - mlc.compiler.model_graph.l1_based - INFO - Segmenting layers by spatial/channel dimension 2026-03-07 10:20:10,211 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:20:10,766 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:20:10,812 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:20:10,839 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.292 2026-03-07 10:20:10,839 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:20:10,846 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:20:10,999 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:20:10,999 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:20:11,211 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:20:12,048 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:20:12,663 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:20:12,715 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:20:12,746 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.27 2026-03-07 10:20:12,746 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:20:12,752 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:20:12,910 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:20:12,910 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:20:13,005 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:20:13,177 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:20:14,024 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:20:14,026 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 3s. 2026-03-07 10:20:16,995 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:20:16,996 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 3s. 2026-03-07 10:20:19,343 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:20:19,343 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:20:19,343 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:20:19,343 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:20:19,343 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:20:19,343 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:20:19,343 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:20:19,343 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:20:19,343 - afe.backends.mpk.interface - INFO - EV74: 4 2026-03-07 10:20:19,343 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:20:19,343 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:20:19,343 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_cache_token384_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_cache_token384_mpk.json, LFM2.5-1.2B-Instruct_language_n128_cache_token384_stage1_mla.elf, llima-compile 2026-03-07 10:20:19,615 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token1152.sima 2026-03-07 10:20:19,625 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:20:19,625 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_cache_token1152" 2026-03-07 10:20:19,627 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:20:19,627 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:20:19,628 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:20:19,628 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:20:19,628 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:20:19,628 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:20:19,694 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:20:19,860 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:20:22,306 - mlc.compiler.model_graph.l1_based - INFO - Trigger large tensor support due to: MLA_0/tuple_einsum_1, 435 2026-03-07 10:20:22,307 - mlc.compiler.model_graph.l1_based - INFO - Segmenting layers by spatial/channel dimension 2026-03-07 10:20:22,850 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:20:22,851 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:20:22,851 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:20:22,851 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:20:22,851 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:20:22,851 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:20:22,851 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:20:22,851 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:20:22,851 - afe.backends.mpk.interface - INFO - EV74: 4 2026-03-07 10:20:22,851 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:20:22,851 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:20:22,851 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_cache_token512_mpk.json, LFM2.5-1.2B-Instruct_language_n128_cache_token512_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n128_cache_token512_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:20:23,123 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token1280.sima 2026-03-07 10:20:23,133 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:20:23,133 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_cache_token1280" 2026-03-07 10:20:23,135 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:20:23,135 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:20:23,136 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:20:23,136 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:20:23,136 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:20:23,136 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:20:23,667 - mlc.compiler.model_graph.large_tensor_helper - INFO - Setting tile layouts 2026-03-07 10:20:23,840 - mlc.compiler.model_graph.large_tensor_helper - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:20:23,879 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:20:24,635 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:20:24,635 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:20:24,655 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:20:25,789 - mlc.compiler.model_graph.l1_based - INFO - Trigger large tensor support due to: MLA_0/tuple_einsum_1, 460 2026-03-07 10:20:25,789 - mlc.compiler.model_graph.l1_based - INFO - Segmenting layers by spatial/channel dimension 2026-03-07 10:20:28,365 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:20:29,273 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:20:29,987 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:20:30,049 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:20:30,083 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.258 2026-03-07 10:20:30,083 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:20:30,090 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:20:30,273 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:20:30,274 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:20:33,339 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:20:33,512 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:20:33,661 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:20:34,159 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:20:34,161 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 4s. 2026-03-07 10:20:34,521 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:20:35,185 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:20:35,246 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:20:35,284 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.246 2026-03-07 10:20:35,284 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:20:35,291 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:20:35,470 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:20:35,470 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:20:35,813 - mlc.compiler.model_graph.large_tensor_helper - INFO - Setting tile layouts 2026-03-07 10:20:35,949 - mlc.compiler.model_graph.large_tensor_helper - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:20:35,988 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:20:36,839 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:20:36,839 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:20:36,870 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:20:39,251 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:20:39,254 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 5s. 2026-03-07 10:20:39,541 - mlc.compiler.model_graph.large_tensor_helper - INFO - Setting tile layouts 2026-03-07 10:20:39,652 - mlc.compiler.model_graph.large_tensor_helper - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:20:39,692 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:20:40,583 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:20:40,583 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:20:40,616 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:20:41,001 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:20:41,001 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:20:41,001 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:20:41,001 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:20:41,001 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:20:41,001 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:20:41,001 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:20:41,001 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:20:41,001 - afe.backends.mpk.interface - INFO - EV74: 4 2026-03-07 10:20:41,001 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:20:41,001 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:20:41,001 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_cache_token640_mpk.json, LFM2.5-1.2B-Instruct_language_n128_cache_token640_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_cache_token640_stage1_mla.elf, llima-compile 2026-03-07 10:20:41,275 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token1408.sima 2026-03-07 10:20:41,290 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:20:41,290 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_cache_token1408" 2026-03-07 10:20:41,292 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:20:41,292 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:20:41,295 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:20:41,295 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:20:41,295 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:20:41,295 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:20:43,877 - mlc.compiler.model_graph.l1_based - INFO - Trigger large tensor support due to: MLA_0/tuple_einsum_1, 367 2026-03-07 10:20:43,877 - mlc.compiler.model_graph.l1_based - INFO - Segmenting layers by spatial/channel dimension 2026-03-07 10:20:46,163 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:20:46,163 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:20:46,163 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:20:46,163 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:20:46,163 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:20:46,163 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:20:46,163 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:20:46,163 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:20:46,163 - afe.backends.mpk.interface - INFO - EV74: 4 2026-03-07 10:20:46,163 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:20:46,163 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:20:46,163 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_cache_token768_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n128_cache_token768_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_cache_token768_mpk.json, llima-compile 2026-03-07 10:20:46,822 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token1536.sima 2026-03-07 10:20:46,833 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:20:46,833 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_cache_token1536" 2026-03-07 10:20:46,834 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:20:46,835 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:20:46,836 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:20:46,836 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:20:46,836 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:20:46,836 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:20:48,656 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:20:49,306 - mlc.compiler.model_graph.l1_based - INFO - Trigger large tensor support due to: MLA_0/slice_concat_0, 401 2026-03-07 10:20:49,307 - mlc.compiler.model_graph.l1_based - INFO - Segmenting layers by spatial/channel dimension 2026-03-07 10:20:49,581 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:20:50,295 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:20:50,359 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:20:50,400 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.241 2026-03-07 10:20:50,400 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:20:50,416 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:20:50,605 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:20:50,606 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:20:54,592 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:20:54,596 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 6s. 2026-03-07 10:21:01,872 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:21:01,872 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:21:01,872 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:21:01,872 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:21:01,872 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:21:01,873 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:21:01,873 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:21:01,873 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:21:01,873 - afe.backends.mpk.interface - INFO - EV74: 4 2026-03-07 10:21:01,873 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:21:01,873 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:21:01,873 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_cache_token896_mpk.json, LFM2.5-1.2B-Instruct_language_n128_cache_token896_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_cache_token896_stage1_mla.elf, llima-compile 2026-03-07 10:21:01,888 - mlc.compiler.model_graph.large_tensor_helper - INFO - Setting tile layouts 2026-03-07 10:21:02,124 - mlc.compiler.model_graph.large_tensor_helper - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:21:02,144 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token1664.sima 2026-03-07 10:21:02,161 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:21:02,161 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_cache_token1664" 2026-03-07 10:21:02,163 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:21:02,163 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:21:02,164 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:21:02,164 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:21:02,164 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:21:02,164 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:21:02,183 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:21:03,168 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:21:03,170 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:21:03,239 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:21:04,683 - mlc.compiler.model_graph.l1_based - INFO - Trigger large tensor support due to: MLA_0/slice_concat_0, 361 2026-03-07 10:21:04,683 - mlc.compiler.model_graph.l1_based - INFO - Segmenting layers by spatial/channel dimension 2026-03-07 10:21:04,955 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:21:05,187 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:21:12,890 - mlc.compiler.model_graph.large_tensor_helper - INFO - Setting tile layouts 2026-03-07 10:21:13,216 - mlc.compiler.model_graph.large_tensor_helper - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:21:13,300 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:21:14,349 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:21:14,350 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:21:14,450 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:21:20,807 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:21:21,147 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:21:26,065 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:21:26,508 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:21:26,838 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:21:27,270 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:21:28,221 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:21:28,301 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:21:28,346 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.233 2026-03-07 10:21:28,346 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:21:28,349 - mlc.compiler.model_graph.large_tensor_helper - INFO - Setting tile layouts 2026-03-07 10:21:28,359 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:21:28,602 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:21:28,602 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:21:28,654 - mlc.compiler.model_graph.large_tensor_helper - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:21:28,741 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:21:29,856 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:21:29,857 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:21:29,965 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:21:33,645 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:21:33,649 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 7s. 2026-03-07 10:21:41,530 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:21:41,884 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:21:42,409 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:21:42,409 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:21:42,409 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:21:42,409 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:21:42,409 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:21:42,409 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:21:42,409 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:21:42,409 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:21:42,409 - afe.backends.mpk.interface - INFO - EV74: 4 2026-03-07 10:21:42,409 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:21:42,409 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:21:42,409 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_cache_token1024_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_cache_token1024_mpk.json, LFM2.5-1.2B-Instruct_language_n128_cache_token1024_stage1_mla.elf, llima-compile 2026-03-07 10:21:42,681 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token1792.sima 2026-03-07 10:21:42,694 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:21:42,694 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_cache_token1792" 2026-03-07 10:21:42,696 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:21:42,696 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:21:42,697 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:21:42,698 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:21:42,698 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:21:42,698 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:21:44,807 - mlc.compiler.model_graph.l1_based - INFO - Trigger large tensor support due to: MLA_0/slice_concat_0, 630 2026-03-07 10:21:44,808 - mlc.compiler.model_graph.l1_based - INFO - Segmenting layers by spatial/channel dimension 2026-03-07 10:21:50,253 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:21:51,814 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:21:53,194 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:21:53,299 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:21:53,347 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.23 2026-03-07 10:21:53,347 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:21:53,367 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:21:53,666 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:21:53,666 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:21:54,408 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:21:55,969 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:21:57,356 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:21:57,462 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:21:57,513 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.225 2026-03-07 10:21:57,513 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:21:57,541 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:21:57,860 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:21:57,861 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:21:59,854 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:21:59,865 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 8s. 2026-03-07 10:22:03,756 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:22:04,126 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:22:04,629 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:22:04,635 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 8s. 2026-03-07 10:22:08,332 - mlc.compiler.model_graph.large_tensor_helper - INFO - Setting tile layouts 2026-03-07 10:22:08,603 - mlc.compiler.model_graph.large_tensor_helper - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:22:08,689 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:22:09,873 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:22:09,875 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:22:09,954 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:22:10,202 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:22:10,202 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:22:10,202 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:22:10,202 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:22:10,202 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:22:10,202 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:22:10,202 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:22:10,202 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:22:10,202 - afe.backends.mpk.interface - INFO - EV74: 4 2026-03-07 10:22:10,202 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:22:10,202 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:22:10,202 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_cache_token1152_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n128_cache_token1152_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_cache_token1152_mpk.json, llima-compile 2026-03-07 10:22:10,478 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n128_cache_token1920.sima 2026-03-07 10:22:10,491 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:22:10,491 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n128_cache_token1920" 2026-03-07 10:22:10,493 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:22:10,493 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:22:10,494 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:22:10,495 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:22:10,495 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:22:10,495 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:22:12,648 - mlc.compiler.model_graph.l1_based - INFO - Trigger large tensor support due to: MLA_0/slice_concat_0, 505 2026-03-07 10:22:12,648 - mlc.compiler.model_graph.l1_based - INFO - Segmenting layers by spatial/channel dimension 2026-03-07 10:22:14,369 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:22:16,053 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:22:16,053 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:22:16,053 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:22:16,053 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:22:16,053 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:22:16,053 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:22:16,053 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:22:16,053 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:22:16,053 - afe.backends.mpk.interface - INFO - EV74: 4 2026-03-07 10:22:16,053 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:22:16,053 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:22:16,054 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_cache_token1280_mpk.json, LFM2.5-1.2B-Instruct_language_n128_cache_token1280_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_cache_token1280_stage1_mla.elf, llima-compile 2026-03-07 10:22:16,315 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:22:16,337 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token127.sima 2026-03-07 10:22:16,346 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:22:16,346 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_cache_token127" 2026-03-07 10:22:16,348 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:22:16,348 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:22:16,349 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:22:16,349 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:22:16,349 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:22:16,349 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:22:17,784 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:22:17,903 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:22:17,916 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:22:17,971 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.223 2026-03-07 10:22:17,971 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:22:18,014 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:22:18,029 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:22:18,041 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:22:18,121 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:22:18,122 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:22:18,127 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:22:18,400 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:22:18,400 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:22:22,245 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:22:22,632 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:22:26,491 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:22:26,499 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 10s. 2026-03-07 10:22:26,907 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:22:26,955 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:22:30,938 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:22:31,330 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:22:31,494 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:22:31,504 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:22:31,508 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.404 2026-03-07 10:22:31,508 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:22:31,508 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:22:31,544 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:22:31,544 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:22:32,307 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:22:34,502 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 0s. 2026-03-07 10:22:35,406 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:22:35,406 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:22:35,406 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:22:35,406 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:22:35,406 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:22:35,406 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:22:35,406 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:22:35,406 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:22:35,406 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:22:35,406 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:22:35,406 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:22:35,406 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_cache_token127_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n1_cache_token127_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_cache_token127_mpk.json, llima-compile 2026-03-07 10:22:35,689 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token255.sima 2026-03-07 10:22:35,699 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:22:35,699 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_cache_token255" 2026-03-07 10:22:35,701 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:22:35,701 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:22:35,702 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:22:35,702 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:22:35,702 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:22:35,702 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:22:36,377 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:22:37,402 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:22:37,534 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:22:37,546 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:22:37,688 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:22:37,689 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:22:37,695 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:22:37,860 - mlc.compiler.model_graph.large_tensor_helper - INFO - Setting tile layouts 2026-03-07 10:22:38,365 - mlc.compiler.model_graph.large_tensor_helper - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:22:38,462 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:22:39,193 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:22:39,717 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:22:39,718 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:22:39,795 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:22:39,795 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:22:39,795 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:22:39,795 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:22:39,795 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:22:39,795 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:22:39,795 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:22:39,795 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:22:39,795 - afe.backends.mpk.interface - INFO - EV74: 4 2026-03-07 10:22:39,795 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:22:39,795 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:22:39,795 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_cache_token1408_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n128_cache_token1408_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_cache_token1408_mpk.json, llima-compile 2026-03-07 10:22:39,846 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:22:40,077 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token383.sima 2026-03-07 10:22:40,087 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:22:40,087 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_cache_token383" 2026-03-07 10:22:40,089 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:22:40,089 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:22:40,090 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:22:40,090 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:22:40,090 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:22:40,090 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:22:40,719 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:22:40,855 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:22:40,923 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.219 2026-03-07 10:22:40,923 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:22:40,997 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:22:41,101 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:22:41,227 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:22:41,239 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:22:41,380 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:22:41,380 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:22:41,451 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:22:41,452 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:22:41,457 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:22:49,251 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:22:49,260 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 11s. 2026-03-07 10:22:51,447 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:22:51,534 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:22:54,242 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:22:54,317 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:22:55,313 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:22:58,334 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:22:59,262 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:22:59,791 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:22:59,990 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:23:00,122 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:23:00,126 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:23:00,151 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:23:00,154 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.404 2026-03-07 10:23:00,154 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:23:00,154 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:23:00,188 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.218 2026-03-07 10:23:00,188 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:23:00,245 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:23:00,245 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:23:00,270 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:23:00,669 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:23:00,669 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:23:00,895 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:23:00,908 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:23:01,272 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:23:01,428 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:23:01,712 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:23:01,737 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:23:01,740 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.404 2026-03-07 10:23:01,740 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:23:01,740 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:23:01,822 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:23:01,822 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:23:02,187 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:23:02,189 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 0s. 2026-03-07 10:23:02,258 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:23:02,259 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:23:02,259 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:02,259 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:23:02,259 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:23:02,259 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:02,259 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:23:02,259 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:23:02,259 - afe.backends.mpk.interface - INFO - EV74: 4 2026-03-07 10:23:02,259 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:23:02,259 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:02,259 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_cache_token1536_mpk.json, LFM2.5-1.2B-Instruct_language_n128_cache_token1536_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_cache_token1536_stage1_mla.elf, llima-compile 2026-03-07 10:23:02,541 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token511.sima 2026-03-07 10:23:02,551 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:23:02,551 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_cache_token511" 2026-03-07 10:23:02,553 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:23:02,553 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:23:02,554 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:23:02,554 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:23:02,554 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:23:02,554 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:23:03,579 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:23:03,580 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 0s. 2026-03-07 10:23:03,615 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:23:03,746 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:23:03,758 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:23:04,034 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:23:04,034 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:23:04,041 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:23:05,080 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:23:05,080 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:23:05,080 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:05,080 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:23:05,081 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:23:05,081 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:05,081 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:23:05,081 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:23:05,081 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:23:05,081 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:23:05,081 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:05,081 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_cache_token255_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_cache_token255_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n1_cache_token255_mpk.json, llima-compile 2026-03-07 10:23:05,367 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token639.sima 2026-03-07 10:23:05,382 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:23:05,382 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_cache_token639" 2026-03-07 10:23:05,384 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:23:05,384 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:23:05,385 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:23:05,385 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:23:05,385 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:23:05,385 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:23:06,297 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:23:06,297 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:23:06,297 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:06,297 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:23:06,297 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:23:06,297 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:06,297 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:23:06,297 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:23:06,297 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:23:06,297 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:23:06,297 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:06,297 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_cache_token383_mpk.json, LFM2.5-1.2B-Instruct_language_n1_cache_token383_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_cache_token383_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:23:06,580 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token767.sima 2026-03-07 10:23:06,591 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:23:06,591 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_cache_token767" 2026-03-07 10:23:06,593 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:23:06,593 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:23:06,594 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:23:06,594 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:23:06,594 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:23:06,594 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:23:07,672 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:23:07,784 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:23:07,797 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:23:08,139 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:23:08,140 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:23:08,145 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:23:08,531 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:23:08,652 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:23:08,665 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:23:08,974 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:23:08,984 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 12s. 2026-03-07 10:23:09,087 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:23:09,091 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:23:09,099 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:23:18,301 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:23:18,397 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:23:23,163 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:23:23,163 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:23:23,163 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:23,163 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:23:23,163 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:23:23,163 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:23,163 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:23:23,163 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:23:23,163 - afe.backends.mpk.interface - INFO - EV74: 4 2026-03-07 10:23:23,163 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:23:23,163 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:23,163 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_cache_token1664_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n128_cache_token1664_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_cache_token1664_mpk.json, llima-compile 2026-03-07 10:23:23,180 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:23:23,261 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:23:23,451 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token895.sima 2026-03-07 10:23:23,461 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:23:23,461 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_cache_token895" 2026-03-07 10:23:23,462 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:23:23,463 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:23:23,464 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:23:23,464 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:23:23,464 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:23:23,464 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:23:25,058 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:23:25,157 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:23:25,169 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:23:25,636 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:23:25,637 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:23:25,642 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:23:25,852 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:23:25,948 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:23:26,592 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:23:27,186 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:23:27,555 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:23:27,587 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:23:27,590 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.404 2026-03-07 10:23:27,590 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:23:27,591 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:23:27,688 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:23:27,688 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:23:29,701 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:23:29,703 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 0s. 2026-03-07 10:23:30,570 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:23:30,905 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:23:31,108 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:23:31,190 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:23:31,304 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:23:31,427 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:23:31,454 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:23:31,456 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.404 2026-03-07 10:23:31,457 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:23:31,457 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:23:31,545 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:23:31,546 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:23:33,007 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:23:33,007 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:23:33,007 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:33,007 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:23:33,007 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:23:33,007 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:33,007 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:23:33,007 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:23:33,007 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:23:33,007 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:23:33,007 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:33,007 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_cache_token511_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_cache_token511_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n1_cache_token511_mpk.json, llima-compile 2026-03-07 10:23:33,289 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token1023.sima 2026-03-07 10:23:33,300 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:23:33,300 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_cache_token1023" 2026-03-07 10:23:33,302 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:23:33,302 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:23:33,303 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:23:33,303 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:23:33,303 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:23:33,303 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:23:33,354 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:23:33,355 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 0s. 2026-03-07 10:23:34,506 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:23:34,539 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:23:35,154 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:23:35,228 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:23:35,327 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:23:35,339 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:23:35,524 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:23:35,555 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:23:35,557 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.404 2026-03-07 10:23:35,558 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:23:35,558 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:23:35,656 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:23:35,656 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:23:35,874 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:23:35,875 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:23:35,882 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:23:36,014 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:23:36,142 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:23:36,208 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.215 2026-03-07 10:23:36,208 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:23:36,391 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:23:36,391 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:23:36,391 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:36,391 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:23:36,391 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:23:36,391 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:36,391 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:23:36,391 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:23:36,391 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:23:36,391 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:23:36,391 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:36,391 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_cache_token639_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_cache_token639_mpk.json, LFM2.5-1.2B-Instruct_language_n1_cache_token639_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:23:36,675 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token1151.sima 2026-03-07 10:23:36,684 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:23:36,684 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_cache_token1151" 2026-03-07 10:23:36,686 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:23:36,686 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:23:36,687 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:23:36,687 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:23:36,687 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:23:36,687 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:23:37,739 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:23:37,741 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 0s. 2026-03-07 10:23:37,986 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:23:38,227 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:23:38,310 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:23:38,322 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:23:38,361 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:23:38,361 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:23:38,924 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:23:38,925 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:23:38,931 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:23:40,782 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:23:40,863 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:23:41,200 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:23:41,200 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:23:41,200 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:41,200 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:23:41,200 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:23:41,200 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:41,200 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:23:41,200 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:23:41,200 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:23:41,200 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:23:41,200 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:41,200 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_cache_token767_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_cache_token767_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n1_cache_token767_mpk.json, llima-compile 2026-03-07 10:23:41,481 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token1279.sima 2026-03-07 10:23:41,490 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:23:41,490 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_cache_token1279" 2026-03-07 10:23:41,492 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:23:41,492 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:23:41,493 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:23:41,493 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:23:41,493 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:23:41,493 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:23:43,075 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:23:43,168 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:23:43,181 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:23:43,851 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:23:43,851 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:23:43,857 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:23:46,151 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:23:46,163 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 13s. 2026-03-07 10:23:48,117 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:23:48,649 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:23:48,968 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:23:48,996 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:23:48,998 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.404 2026-03-07 10:23:48,998 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:23:48,998 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:23:49,087 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:23:49,087 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:23:50,992 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:23:50,993 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 0s. 2026-03-07 10:23:52,495 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:23:52,592 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:23:54,848 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:23:54,848 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:23:54,849 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:54,849 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:23:54,849 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:23:54,849 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:54,849 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:23:54,849 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:23:54,849 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:23:54,849 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:23:54,849 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:54,849 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_cache_token895_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_cache_token895_mpk.json, LFM2.5-1.2B-Instruct_language_n1_cache_token895_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:23:55,131 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token1407.sima 2026-03-07 10:23:55,140 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:23:55,141 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_cache_token1407" 2026-03-07 10:23:55,142 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:23:55,142 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:23:55,143 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:23:55,143 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:23:55,143 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:23:55,143 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:23:55,717 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:23:55,800 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:23:56,660 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:23:56,736 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:23:56,748 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:23:57,473 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:23:57,474 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:23:57,479 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:23:59,623 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:23:59,623 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:23:59,623 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:59,623 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:23:59,623 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:23:59,623 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:59,623 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:23:59,623 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:23:59,623 - afe.backends.mpk.interface - INFO - EV74: 4 2026-03-07 10:23:59,623 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:23:59,623 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:23:59,623 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_cache_token1792_mpk.json, LFM2.5-1.2B-Instruct_language_n128_cache_token1792_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n128_cache_token1792_stage1_mla.elf, llima-compile 2026-03-07 10:23:59,913 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token1535.sima 2026-03-07 10:23:59,922 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:23:59,922 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_cache_token1535" 2026-03-07 10:23:59,924 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:23:59,924 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:23:59,925 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:23:59,925 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:23:59,925 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:23:59,925 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:24:01,099 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:24:01,110 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:24:01,174 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:24:01,186 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:24:01,756 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:24:01,993 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:24:01,994 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:24:02,001 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:24:02,133 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:24:02,166 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:24:02,169 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.404 2026-03-07 10:24:02,169 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:24:02,170 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:24:02,270 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:24:02,270 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:24:02,438 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:24:02,536 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:24:03,121 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:24:03,689 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:24:04,015 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:24:04,042 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:24:04,045 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.404 2026-03-07 10:24:04,045 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:24:04,045 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:24:04,132 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:24:04,132 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:24:04,349 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:24:04,351 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 0s. 2026-03-07 10:24:05,552 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:24:05,999 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:24:06,000 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 0s. 2026-03-07 10:24:07,946 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:24:07,947 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:24:07,947 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:24:07,947 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:24:07,947 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:24:07,947 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:24:07,947 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:24:07,947 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:24:07,947 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:24:07,947 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:24:07,947 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:24:07,947 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_cache_token1023_mpk.json, LFM2.5-1.2B-Instruct_language_n1_cache_token1023_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_cache_token1023_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:24:08,225 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token1663.sima 2026-03-07 10:24:08,235 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:24:08,235 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_cache_token1663" 2026-03-07 10:24:08,236 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:24:08,236 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:24:08,237 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:24:08,238 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:24:08,238 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:24:08,238 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:24:09,256 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:24:09,360 - mlc.compiler.model_graph.l1_based - INFO - Trigger large tensor support due to: MLA_0/slice_concat_0, 581 2026-03-07 10:24:09,360 - mlc.compiler.model_graph.l1_based - INFO - Segmenting layers by spatial/channel dimension 2026-03-07 10:24:09,380 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:24:09,380 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:24:09,380 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:24:09,380 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:24:09,380 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:24:09,380 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:24:09,380 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:24:09,380 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:24:09,380 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:24:09,380 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:24:09,380 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:24:09,380 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_cache_token1151_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_cache_token1151_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n1_cache_token1151_mpk.json, llima-compile 2026-03-07 10:24:09,668 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token1791.sima 2026-03-07 10:24:09,681 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:24:09,681 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_cache_token1791" 2026-03-07 10:24:09,682 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:24:09,682 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:24:09,683 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:24:09,684 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:24:09,684 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:24:09,684 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:24:10,547 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:24:10,950 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:24:11,098 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:24:11,167 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.214 2026-03-07 10:24:11,167 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:24:11,174 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:24:11,198 - mlc.compiler.model_graph.l1_based - INFO - Trigger large tensor support due to: MLA_0/slice_concat_0, 606 2026-03-07 10:24:11,198 - mlc.compiler.model_graph.l1_based - INFO - Segmenting layers by spatial/channel dimension 2026-03-07 10:24:11,269 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:24:11,566 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:24:11,598 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:24:11,601 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.404 2026-03-07 10:24:11,601 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:24:11,601 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:24:11,700 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:24:11,700 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:24:11,701 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:24:11,701 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:24:13,459 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:24:13,535 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:24:13,817 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:24:13,819 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 1s. 2026-03-07 10:24:17,007 - mlc.compiler.model_graph.large_tensor_helper - INFO - Setting tile layouts 2026-03-07 10:24:17,234 - mlc.compiler.model_graph.large_tensor_helper - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:24:17,267 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:24:17,533 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:24:17,533 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:24:17,533 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:24:17,533 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:24:17,533 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:24:17,533 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:24:17,533 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:24:17,533 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:24:17,533 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:24:17,533 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:24:17,533 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:24:17,533 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_cache_token1279_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_cache_token1279_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n1_cache_token1279_mpk.json, llima-compile 2026-03-07 10:24:18,147 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:24:18,148 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:24:18,161 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:24:18,169 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token1919.sima 2026-03-07 10:24:18,182 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:24:18,182 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_cache_token1919" 2026-03-07 10:24:18,184 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:24:18,184 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:24:18,185 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:24:18,185 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:24:18,185 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:24:18,185 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:24:19,022 - mlc.compiler.model_graph.large_tensor_helper - INFO - Setting tile layouts 2026-03-07 10:24:19,260 - mlc.compiler.model_graph.large_tensor_helper - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:24:19,293 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:24:19,327 - mlc.compiler.model_graph.l1_based - INFO - Trigger large tensor support due to: MLA_0/slice_concat_0, 527 2026-03-07 10:24:19,327 - mlc.compiler.model_graph.l1_based - INFO - Segmenting layers by spatial/channel dimension 2026-03-07 10:24:20,147 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:24:20,240 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:24:20,241 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:24:20,242 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:24:20,253 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:24:20,365 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:24:20,377 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 14s. 2026-03-07 10:24:20,745 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:24:21,275 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:24:21,604 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:24:21,637 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:24:21,639 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.404 2026-03-07 10:24:21,639 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:24:21,640 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:24:21,752 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:24:21,753 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:24:24,188 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:24:24,189 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 1s. 2026-03-07 10:24:27,121 - mlc.compiler.model_graph.large_tensor_helper - INFO - Setting tile layouts 2026-03-07 10:24:27,352 - mlc.compiler.model_graph.large_tensor_helper - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:24:27,386 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:24:28,399 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:24:28,400 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:24:28,412 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:24:28,869 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:24:29,239 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:24:29,239 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:24:29,239 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:24:29,239 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:24:29,239 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:24:29,239 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:24:29,239 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:24:29,239 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:24:29,239 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:24:29,239 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:24:29,239 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:24:29,239 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_cache_token1407_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n1_cache_token1407_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_cache_token1407_mpk.json, llima-compile 2026-03-07 10:24:29,499 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:24:29,524 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_cache_token2047.sima 2026-03-07 10:24:29,534 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:24:29,534 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_cache_token2047" 2026-03-07 10:24:29,536 - afe.core.compile_networks - INFO - The model is split into 1 segments for MLA and APU 2026-03-07 10:24:29,536 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 1: compiling for MLA 2026-03-07 10:24:29,537 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:24:29,537 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:24:29,537 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:24:29,537 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:24:29,894 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:24:29,932 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:24:29,936 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.404 2026-03-07 10:24:29,936 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:24:29,936 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:24:30,063 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:24:30,063 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:24:31,101 - mlc.compiler.model_graph.l1_based - INFO - Trigger large tensor support due to: MLA_0/slice_concat_0, 541 2026-03-07 10:24:31,101 - mlc.compiler.model_graph.l1_based - INFO - Segmenting layers by spatial/channel dimension 2026-03-07 10:24:32,710 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:24:32,711 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 1s. 2026-03-07 10:24:35,746 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:24:35,746 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:24:35,746 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:24:35,746 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:24:35,746 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:24:35,746 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:24:35,746 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:24:35,746 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:24:35,746 - afe.backends.mpk.interface - INFO - EV74: 4 2026-03-07 10:24:35,746 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:24:35,746 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:24:35,746 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n128_cache_token1920_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n128_cache_token1920_mpk.json, LFM2.5-1.2B-Instruct_language_n128_cache_token1920_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:24:36,449 - afe.ir.serializer.api - INFO - Loading model from file: CompiledModels/LFM2.5-1.2B-Instruct/sima_files/sdk/LFM2.5-1.2B-Instruct_language_n1_post_layer15_conv_final.sima 2026-03-07 10:24:36,605 - afe.apis.model - WARNING - A deepcopy is disabled hence model graph will be mutated and no other APIs can be called after the compile step. 2026-03-07 10:24:36,605 - afe.apis.model - INFO - Compiling quantized net "LFM2.5-1.2B-Instruct_language_n1_post_layer15_conv_final" 2026-03-07 10:24:36,607 - afe.core.compile_networks - INFO - The model is split into 2 segments for MLA and APU 2026-03-07 10:24:36,607 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 1 of 2: compiling for MLA 2026-03-07 10:24:36,675 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties 2026-03-07 10:24:36,675 - mlc.compiler.model_graph.l1_based - INFO - Checking if layers fit into memory without splitting 2026-03-07 10:24:36,676 - mlc.compiler.model_graph.l1_based - INFO - Generating the list of feasible layouts 2026-03-07 10:24:36,676 - mlc.compiler.model_graph.l1_based - INFO - Setting layer parameters 2026-03-07 10:24:36,718 - mlc.compiler.model_graph.l1_based - INFO - Setting tile layouts 2026-03-07 10:24:36,725 - mlc.compiler.model_graph.l1_based - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:24:36,738 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:24:37,210 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:24:37,210 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:24:37,218 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:24:37,652 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:24:37,652 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:24:37,652 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:24:37,652 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:24:37,652 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:24:37,652 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:24:37,653 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:24:37,653 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:24:37,653 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:24:37,653 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:24:37,653 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:24:37,653 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_cache_token1535_mpk.json, LFM2.5-1.2B-Instruct_language_n1_cache_token1535_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_cache_token1535_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:24:39,075 - mlc.compiler.model_graph.large_tensor_helper - INFO - Setting tile layouts 2026-03-07 10:24:39,311 - mlc.compiler.model_graph.large_tensor_helper - INFO - Allocating memory for IFM/OFM tensors 2026-03-07 10:24:39,345 - mlc.compiler.model_graph.l1_based - INFO - Get model compilation properties done 2026-03-07 10:24:40,421 - afe.backends.mla.afe_to_n2a_compiler.n2a_backend_runner - INFO - Start evaluate process to generate check file 2026-03-07 10:24:40,422 - mlc.kernel.layout - INFO - L2 caching mode: L2CachingMode.NONE 2026-03-07 10:24:40,435 - mlc.compiler.model_graph.l1_based - INFO - Generating model code 2026-03-07 10:24:42,608 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:24:42,742 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:24:46,429 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:24:46,566 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:24:52,525 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:24:52,602 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:24:52,965 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:24:53,099 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:24:54,028 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:24:54,719 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:24:55,267 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:24:55,315 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:24:55,317 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.404 2026-03-07 10:24:55,317 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:24:55,318 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:24:55,466 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:24:55,466 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:24:57,813 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:24:58,479 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:24:58,481 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 1s. 2026-03-07 10:24:58,491 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:24:59,045 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:24:59,091 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:24:59,094 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.404 2026-03-07 10:24:59,094 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:24:59,095 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:24:59,239 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:24:59,239 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:25:00,257 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:25:02,068 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:25:02,265 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:25:02,266 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 1s. 2026-03-07 10:25:02,486 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:25:02,563 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:25:03,713 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:25:03,713 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:25:03,713 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:25:03,713 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:25:03,713 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:25:03,713 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:25:03,713 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:25:03,713 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:25:03,713 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:25:03,713 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:25:03,713 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:25:03,713 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_cache_token1663_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n1_cache_token1663_mpk.json, LFM2.5-1.2B-Instruct_language_n1_cache_token1663_stage1_mla.elf, llima-compile 2026-03-07 10:25:04,461 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:25:05,139 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:25:05,686 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:25:05,733 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:25:05,736 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.404 2026-03-07 10:25:05,736 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:25:05,736 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:25:05,818 - mlc.compiler.model_graph.l1_based - INFO - Generating model code done 2026-03-07 10:25:05,882 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:25:05,882 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:25:05,953 - mlc.test_util.test_context - INFO - Scheduling instructions 2026-03-07 10:25:07,525 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:25:07,526 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:25:07,526 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:25:07,526 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:25:07,526 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:25:07,526 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:25:07,526 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:25:07,526 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:25:07,526 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:25:07,526 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:25:07,526 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:25:07,526 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_cache_token1791_mpk.json, LFM2.5-1.2B-Instruct_language_n1_cache_token1791_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_cache_token1791_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:25:08,859 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:25:08,860 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 1s. 2026-03-07 10:25:14,176 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:25:14,176 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:25:14,176 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:25:14,176 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:25:14,176 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:25:14,176 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:25:14,176 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:25:14,176 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:25:14,176 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:25:14,176 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:25:14,176 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:25:14,176 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_cache_token1919_mpk.json, LFM2.5-1.2B-Instruct_language_n1_cache_token1919_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_cache_token1919_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:25:17,014 - mlc.test_util.test_context - INFO - Performing DRAM/L2 synchronization 2026-03-07 10:25:17,772 - mlc.test_util.test_context - INFO - Inserting and merging NOPs 2026-03-07 10:25:18,328 - mlc.test_util.test_context - INFO - Setting IQ sync bits 2026-03-07 10:25:18,376 - mlc.test_util.test_context - INFO - Run compression 2026-03-07 10:25:18,379 - mlc.test_util.test_context - INFO - Compression done in 0s. Compression ratio: 0.404 2026-03-07 10:25:18,380 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:25:18,380 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:25:18,528 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:25:18,528 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:25:21,497 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:25:21,499 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 1s. 2026-03-07 10:25:22,504 - mlc.test_util.test_context - INFO - Compression done in 19s. Compression ratio: 0.966 2026-03-07 10:25:22,504 - mlc.test_util.test_context - INFO - Re-allocate dram memory 2026-03-07 10:25:23,112 - mlc.test_util.test_context - INFO - Generating metrics 2026-03-07 10:25:23,360 - mlc.test_util.test_context - INFO - Writing report to MLC file 2026-03-07 10:25:23,361 - mlc.test_util.test_context - INFO - Writing instructions to MLC file 2026-03-07 10:25:26,932 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:25:26,932 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:25:26,932 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:25:26,932 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:25:26,932 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:25:26,932 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:25:26,932 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:25:26,932 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:25:26,932 - afe.backends.mpk.interface - INFO - EV74: 5 2026-03-07 10:25:26,932 - afe.backends.mpk.interface - INFO - A65 : 0 2026-03-07 10:25:26,932 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:25:26,932 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_cache_token2047_stage1_mla_stats.yaml, LFM2.5-1.2B-Instruct_language_n1_cache_token2047_mpk.json, LFM2.5-1.2B-Instruct_language_n1_cache_token2047_stage1_mla.elf, llima-compile 2026-03-07 10:26:14,485 - mlc.test_util.test_context - INFO - Code generation done 2026-03-07 10:26:14,511 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Evaluate model and writing chk file done in 2s. 2026-03-07 10:26:17,783 - afe.backends.mla.afe_to_n2a_compiler.n2a_compiler_operations - INFO - Stage 2 of 2: backend is APU 2026-03-07 10:26:17,785 - afe.core.compile_networks - INFO - Stage 2 of 2: compiling for APU 2026-03-07 10:26:28,694 - afe.backends.mpk.interface - INFO - ============================== 2026-03-07 10:26:28,694 - afe.backends.mpk.interface - INFO - Compilation summary: 2026-03-07 10:26:28,694 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:26:28,694 - afe.backends.mpk.interface - INFO - Desired batch size: 1 2026-03-07 10:26:28,694 - afe.backends.mpk.interface - INFO - Achieved batch size: 1 2026-03-07 10:26:28,694 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:26:28,694 - afe.backends.mpk.interface - INFO - Plugin distribution per backend: 2026-03-07 10:26:28,694 - afe.backends.mpk.interface - INFO - MLA : 1 2026-03-07 10:26:28,694 - afe.backends.mpk.interface - INFO - EV74: 2 2026-03-07 10:26:28,694 - afe.backends.mpk.interface - INFO - A65 : 1 2026-03-07 10:26:28,694 - afe.backends.mpk.interface - INFO - ------------------------------ 2026-03-07 10:26:28,694 - afe.backends.mpk.interface - INFO - Generated files: LFM2.5-1.2B-Instruct_language_n1_post_layer15_conv_final_stage1_mla.elf, LFM2.5-1.2B-Instruct_language_n1_post_layer15_conv_final_mpk.json, LFM2.5-1.2B-Instruct_language_n1_post_layer15_conv_final_stage2_a65.so, LFM2.5-1.2B-Instruct_language_n1_post_layer15_conv_final_stage1_mla_stats.yaml, llima-compile 2026-03-07 10:26:30,951 - sima_lmm.model.vision_language_model - INFO - FileGenMode.MODEL_SDK_COMPILE files generation completed. Generated all mode=MODEL_SDK_COMPILE files