update readme and add a new config

Files changed (2) hide show

README.md +400 -0
VideoX-Fun/pulsar2_configs/transformers_subgraph_512x512.json +207 -0

README.md CHANGED Viewed

@@ -1,3 +1,403 @@
 ---
 license: bsd-3-clause
 ---

 ---
 license: bsd-3-clause
 ---
+# Z-Image-Turbo on AXERA AX650N
+This project provides a complete implementation for deploying the Z-Image-Turbo diffusion model on AXERA AX650N NPU hardware. Z-Image-Turbo is a high-performance text-to-image generation model that leverages advanced diffusion techniques to produce high-quality images with fast inference speed.
+## Table of Contents
+- [Overview](#overview)
+- [Requirements](#requirements)
+- [Project Structure](#project-structure)
+- [Model Components](#model-components)
+  - [1. Transformer Module](#1-transformer-module)
+  - [2. VAE Decoder Module](#2-vae-decoder-module)
+- [Complete Inference Pipeline](#complete-inference-pipeline)
+- [Advanced Usage](#advanced-usage)
+- [Technical Support](#technical-support)
+## Overview
+The Z-Image-Turbo model consists of three main components:
+1. **Text Encoder**: Converts text prompts into embeddings
+2. **Transformer**: Core diffusion model that processes latent representations
+3. **VAE (Variational Autoencoder)**: Encodes/decodes between pixel space and latent space
+### Deployment Strategy
+The deployment architecture is optimized for AXERA AX650N with the following design decisions:
+- **Text Encoder**: Currently runs on PyTorch for simplicity and faster development iteration. This component uses the Qwen3 model and can be converted to axmodel format for pure NPU inference in future releases to achieve end-to-end NPU acceleration.
+- **Transformer**: Fully converted to axmodel format and runs on NPU through model partitioning and subgraph optimization, achieving optimal performance on the target hardware.
+- **VAE**: Both encoder and decoder are converted to axmodel format for complete NPU acceleration, enabling fast image encoding and decoding operations.
+## Requirements
+This project requires the following Python environment and dependencies:
+```sh
+Python 3.9.20
+torch 2.7.0
+torchvision 0.22.0
+transformers 4.53.1
+diffusers 0.32.1
+```
+**Additional Dependencies:**
+- ONNX Runtime (for ONNX model inference and validation)
+- onnxslim (for ONNX model optimization)
+- numpy (for numerical operations and calibration data handling)
+- Pulsar2 toolchain (for AXERA AX650N model compilation)
+**Hardware Requirements:**
+- AXERA AX650N development board for deployment
+- x86/ARM Linux system for model conversion and compilation
+## Project Structure
+```sh
+Z-Image-Turbo/
+├── original_onnx/              # Exported ONNX models (original format)
+│   ├── vae_decoder_simp_slim.onnx
+│   ├── vae_encoder_simp_slim.onnx
+│   └── z_image_transformer_body_only_simp_slim.onnx
+├── text_encoder_axmodel/       # Text encoder models in axmodel format
+│   ├── model.embed_tokens.weight.npy
+│   ├── qwen3_p128_l0_together.axmodel
+│   ├── qwen3_p128_l1_together.axmodel
+│   └── ... (36 layer models for Qwen3)
+├── transformer_axmodel/        # Transformer subgraph models in axmodel format
+│   ├── auto_00_model_layers_29_Add_4_output_0_to_sample_auto.axmodel
+│   ├── cfg_00_timestep_to_model_t_embedder_mlp_mlp_2_Gemm_output_0_config.axmodel
+│   └── ... (compiled subgraph models)
+├── transformer_onnx/           # Transformer models in ONNX format
+├── vae_model/                  # VAE models (both ONNX and axmodel formats)
+├── VideoX-Fun/                 # Main conversion and inference code
+└── README.md                   # This documentation
+```
+## Model Components
+### 1. Transformer Module
+The transformer module is the core component responsible for the diffusion process. It iteratively processes latent representations to generate high-quality images from noise. Due to the model's complexity and size, we employ a subgraph partitioning strategy to optimize deployment on the AX650N NPU.
+#### Step 1: Export to ONNX Format
+First, export the transformer model to ONNX format (without ControlNet support):
+```sh
+python scripts/z_image/export_transformer_body_onnx.py \
+        --output onnx-models-512x512/z_image_transformer_body_only_512x512.onnx \
+        --height 512 --width 512 --sequence-length 128 \
+        --latent-downsample-factor 8 \
+        --dtype fp32 \
+        --skip-slim
+```
+**Parameters:**
+- `--output`: Output path for the ONNX model
+- `--height`, `--width`: Target image dimensions (512x512)
+- `--sequence-length`: Maximum sequence length for text embeddings (128 tokens)
+- `--latent-downsample-factor`: VAE downsample factor (8x)
+- `--dtype`: Data type (fp32 for highest accuracy)
+- `--skip-slim`: Skip ONNX simplification (optional)
+> **Note:** If you don't use `--skip-slim`, the model will be automatically simplified and the output will be named: `z_image_transformer_body_only_512x512_simp_slim.onnx`
+#### Step 2: Collect Calibration Data
+Collect calibration dataset from the original model for quantization. This step generates representative input data that will be used during the quantization process:
+```sh
+python ./examples/z_image_fun/collect_onnx_inputs.py \
+    --model_name models/Diffusion_Transformer/Z-Image-Turbo/ \
+    --output_dir transformer_body_only_512x512_simp_slim/calibration \
+    --height 512 --width 512 \
+    --max_sequence_length 128
+```
+This command generates calibration data by running the model with various prompts and diffusion steps, capturing the actual input distributions that the model will encounter during inference.
+#### Step 3: Split ONNX Model into Subgraphs
+Split the monolithic ONNX model into multiple subgraphs for better memory management and compilation optimization:
+```sh
+python ./scripts/split_onnx_by_subconfig.py \
+    --model ./onnx-models-512x512/z_image_transformer_body_only_512x512_simp_slim.onnx \
+    --config ./pulsar2_configs/transformers_subgraph_512x512.json \
+    --output-dir ./transformers_body_only_512_512_split_onnx \
+    --verify \
+    --input-data ./transformer_body_only_512x512_simp_slim/calibration/transformer_inputs_prompt000_step00.npy \
+    --providers CPUExecutionProvider
+```
+The subgraph configuration file (`transformers_subgraph_512x512.json`) defines the splitting strategy, determining how the model is partitioned into smaller, manageable pieces that fit within the NPU's constraints.
+#### Step 4: Collect Subgraph Calibration Data
+After splitting, collect calibration data for each individual subgraph:
+```sh
+python examples/z_image_fun/collect_subgraph_inputs.py \
+  --onnx ./onnx-models-512x512/z_image_transformer_body_only_512x512_simp_slim.onnx \
+  --subgraph-config ./pulsar2_configs/transformers_subgraph_512x512.json \
+  --output-dir ./transformer_body_only_512x512_simp_slim/subgraph-calib \
+  --tar-list-file ./transformer_body_only_512x512_simp_slim/subgraph-calib/paths.txt \
+  --skip-existing
+```
+For collecting additional calibration data with different resolutions (for instance: 1728x992):
+```sh
+python examples/z_image_fun/collect_subgraph_inputs.py \
+    --onnx ./onnx-models-1728x992/z_image_transformer_body_only_1728x992_simp_slim.onnx \
+    --subgraph-config ./pulsar2_configs/transformers_subgraph_1728x992.json \
+    --output-dir ./transformer_body_only_1728x992_simp_slim/subgraph-calib \
+    --tar-list-file ./transformer_body_only_1728x992_simp_slim/subgraph-calib/paths.txt  \
+    --sample-size 1728 992 \
+    --max-seq-len 256
+```
+#### Step 5: Generate Compilation Configuration Files
+Automatically generate individual compilation configuration files for each subgraph:
+```sh
+python ./scripts/generate_subgraph_configs.py \
+    --tar-list-file ./transformer_body_only_512x512_simp_slim/subgraph-calib/paths.txt \
+    --output-config-dir pulsar2_configs/subgraphs_512x512
+```
+This step creates tailored configuration files for each subgraph, specifying quantization settings, calibration data paths, and compilation options.
+> **Important:** After generating the sub-ONNX files, you need to apply ONNX simplification (`onnxslim`) to each subgraph for optimal performance.
+#### Step 6: Compile All Subgraphs
+Compile all subgraphs using the Pulsar2 toolchain:
+```sh
+./compile_all_subgraphs.sh \
+    --onnx-dir ./transformers_body_only_512_512_split_onnx \
+    --config-dir pulsar2_configs/subgraphs_512x512 \
+    --output-base-dir ./compiled_transformers_body_only_512x512/out_all \
+    --final-output-dir ./compiled_transformers_body_only_512x512/out_final
+```
+**Output Directories:**
+- `out_all`: Contains compilation logs and intermediate files for all subgraphs
+- `out_final`: Contains only the successfully compiled axmodel files, ready for deployment
+The compilation process converts each ONNX subgraph into an optimized axmodel format that can run efficiently on the AX650N NPU.
+### 2. VAE Decoder Module
+The Variational Autoencoder (VAE) is responsible for converting between the latent space representation and pixel space. The decoder takes the denoised latent representation from the transformer and generates the final RGB image.
+#### Step 1: Export VAE to ONNX Format
+Export both the VAE encoder and decoder to ONNX format:
+```sh
+python scripts/z_image_fun/export_vae_onnx.py \
+        --model-root models/Diffusion_Transformer/Z-Image-Turbo/ \
+        --height 512 --width 512 \
+        --encoder-output onnx-models-512x512/vae_encoder.onnx \
+        --decoder-output onnx-models-512x512/vae_decoder.onnx \
+        --dtype fp32 \
+        --save-calib-inputs \
+        --calib-dir onnx-calibration-512x512 \
+        --skip-ort-check
+```
+**Parameters:**
+- `--model-root`: Path to the Z-Image-Turbo model
+- `--encoder-output`, `--decoder-output`: Output paths for the encoder and decoder ONNX models
+- `--save-calib-inputs`: Save calibration inputs for quantization
+- `--calib-dir`: Directory to store calibration data
+- `--skip-ort-check`: Skip ONNX Runtime validation (useful when ORT has compatibility issues)
+#### Step 2: Create Compilation Configuration
+Create a configuration file for the VAE decoder compilation. Example configuration file: `pulsar2_configs/vae_decoder.json`
+This configuration should specify:
+- Input/output tensor names and shapes
+- Quantization strategy (e.g., int8, mixed precision)
+- Calibration data paths
+- Hardware target (AX650)
+#### Step 3: Compile VAE Decoder
+Compile the ONNX model to axmodel format using Pulsar2:
+```sh
+pulsar2 build \
+    --output_dir ./compiled_output_vae_decoder \
+    --config pulsar2_configs/vae_decoder.json \
+    --npu_mode NPU3 \
+    --input onnx-models/vae_decoder_simp_slim.onnx \
+    --target_hardware AX650
+```
+**Parameters:**
+- `--output_dir`: Output directory for compiled models
+- `--config`: Path to the compilation configuration file
+- `--npu_mode`: NPU mode (NPU3 for maximum performance on AX650N)
+- `--target_hardware`: Target hardware platform (AX650)
+The compiled VAE decoder will be saved in the output directory and can be deployed to the AX650N board.
+## Complete Inference Pipeline
+After compiling all components, you can run the complete text-to-image inference pipeline on the AXERA AX650N development board.
+### Running on the Development Board
+1. Transfer all compiled axmodel files to the development board
+2. Ensure all dependencies are installed
+3. Run the inference script:
+```sh
+python3 examples/z_image_fun/launcher_axmodel.py \
+    --transformer-config pulsar2_configs/transformers_subgraph.json \
+    --transformer-subgraph-dir ../transformer_axmodel \
+    --vae-axmodel ../vae_model/vae_decoder.axmodel
+```
+**Parameters:**
+- `--transformer-config`: Configuration file that defines the subgraph structure
+- `--transformer-subgraph-dir`: Directory containing all compiled transformer subgraph axmodels
+- `--vae-axmodel`: Path to the compiled VAE decoder axmodel
+The launcher script will:
+1. Load the text encoder (PyTorch)
+2. Process input prompts into embeddings
+3. Run the transformer subgraphs sequentially on NPU
+4. Decode the latent representation using VAE decoder on NPU
+5. Output the final generated image
+### Example Output
+Here's an example of the inference process running on the AX650N development board:
+```sh
+root@ax650 Z-Image-Turbo/VideoX-Fun $ python3 examples/z_image_fun/launcher_axmodel.py \
+    --transformer-config pulsar2_configs/transformers_subgraph.json \
+    --transformer-subgraph-dir ../transformer_axmodel \
+    --vae-axmodel ../vae_model/vae_decoder.axmodel
+[INFO] Available providers:  ['AxEngineExecutionProvider']
+/root/yongqiang/push_hugging_face/Z-Image-Turbo/VideoX-Fun/videox_fun/dist/wan_xfuser.py:22: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
+  @amp.autocast(enabled=False)
+...
+/root/yongqiang/push_hugging_face/Z-Image-Turbo/VideoX-Fun/videox_fun/models/wan_audio_injector.py:114: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
+  @amp.autocast(enabled=False)
+/root/yongqiang/push_hugging_face/Z-Image-Turbo/VideoX-Fun/videox_fun/models/wan_transformer3d_s2v.py:55: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
+  @amp.autocast(enabled=False)
+2026-01-15 15:55:55.577 | INFO     | __main__:main:425 - 使用的 prompt: sunrise over alpine mountains, low clouds in valleys, god rays, ultra-detailed landscape
+`torch_dtype` is deprecated! Use `dtype` instead!
+Loading checkpoint shards: 100%|████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  2.26it/s]
+The module name  (originally ) is not a valid Python identifier. Please rename the original module to avoid import issues.
+^@^@^@[INFO] Using provider: AxEngineExecutionProvider
+[INFO] Chip type: ChipType.MC50
+[INFO] VNPU type: VNPUType.DISABLED
+[INFO] Engine version: 2.12.0s
+[INFO] Model type: 2 (triple core)
+[INFO] Compiler version: 5.1-patch1-dirty 5c5e711b-dirty
+AX Denoising:   0%|                                                                             | 0/9 [00:00<?, ?it/s][INFO] Using provider: AxEngineExecutionProvider
+[INFO] Model type: 2 (triple core)
+[INFO] Compiler version: 5.1-patch1-dirty 5c5e711b-dirty
+2026-01-15 15:58:44.111 | INFO     | __main__:_get_session:301 - 加载子图 session: cfg_00 from cfg_00_timestep_to_model_t_embedder_mlp_mlp_2_Gemm_output_0_config.axmodel
+[INFO] Using provider: AxEngineExecutionProvider
+[INFO] Model type: 2 (triple core)
+[INFO] Compiler version: 5.1-patch1-dirty 5c5e711b-dirty
+2026-01-15 15:58:48.882 | INFO     | __main__:_get_session:301 - 加载子图 session: cfg_01 from cfg_01_prompt_embeds_to_model_Slice_1_output_0_config.axmodel
+[INFO] Using provider: AxEngineExecutionProvider
+[INFO] Model type: 2 (triple core)
+[INFO] Compiler version: 5.1-patch1-dirty 5c5e711b-dirty
+...
+2026-01-15 16:00:08.612 | INFO     | __main__:_get_session:301 - 加载子图 session: cfg_30 from cfg_30_model_layers_26_Add_4_output_0_to_model_layers_27_Add_4_output_0_config.axmodel
+[INFO] Using provider: AxEngineExecutionProvider
+[INFO] Model type: 2 (triple core)
+[INFO] Compiler version: 5.1-patch1 5c5e711b
+2026-01-15 16:00:11.179 | INFO     | __main__:_get_session:301 - 加载子图 session: cfg_31 from cfg_31_model_layers_27_Add_4_output_0_to_model_layers_28_Add_4_output_0_config.axmodel
+[INFO] Using provider: AxEngineExecutionProvider
+[INFO] Model type: 2 (triple core)
+[INFO] Compiler version: 5.1-patch1 5c5e711b
+2026-01-15 16:00:13.868 | INFO     | __main__:_get_session:301 - 加载子图 session: cfg_32 from cfg_32_model_layers_28_Add_4_output_0_to_model_layers_29_Add_4_output_0_config.axmodel
+AX Denoising:  22%|███████████████▎                                                     | 2/9 [01:36<04:45, 40.84s/it]AX Denoising: 100%|█████████████████████████████████████████████████████████████████████| 9/9 [02:20<00:00, 15.60s/it]
+[INFO] Using provider: AxEngineExecutionProvider
+[INFO] Model type: 2 (triple core)
+[INFO] Compiler version: 5.1-patch1 5c5e711b
+2026-01-15 16:01:06.972 | INFO     | __main__:main:537 - AXModel 推理完成，结果保存到 /root/yongqiang/push_hugging_face/Z-Image-Turbo/VideoX-Fun/samples/z-image-t2i-axmodel/z_image_axmodel_2.png
+```
+The inference process demonstrates the complete pipeline working on the hardware, including:
+- Model loading and initialization (~3 minutes for all 33 subgraphs)
+- Denoising iterations (9 steps, ~2 minutes 20 seconds total)
+- Final image generation and saving
+### Known Limitations
+**Quantization Accuracy**: Unfortunately, due to quantization precision limitations, the axmodel inference results show some differences compared to the original ONNX model outputs. This is a trade-off between inference speed and numerical precision when deploying on NPU hardware. Future work may include:
+- Fine-tuning quantization parameters to improve accuracy
+- Exploring mixed-precision quantization strategies
+- Implementing calibration with more diverse datasets
+## Advanced Usage
+### Frontend-Only Export for Graph Analysis
+For debugging and graph analysis, you can export only the frontend graph without compilation:
+```sh
+ENABLE_COMPILER=0 DUMP_FRONTEND_GRAPH=1 \
+pulsar2 build \
+    --output_dir ./compiled_output_trans_body_only_frontend \
+    --config pulsar2_configs/config_controlnet.json \
+    --npu_mode NPU3 \
+    --input ../original_onnx/z_image_transformer_body_only_simp_slim.onnx \
+    --target_hardware AX650
+```
+This is useful for:
+- Analyzing the graph structure before compilation
+- Debugging subgraph partitioning strategies
+- Verifying model transformations
+### Compile from Quantized ONNX
+If you already have a quantized ONNX model, you can compile it directly:
+```sh
+pulsar2 build \
+    --input compiled_output_trans_body_only_use_calibration/quant/quant_axmodel.onnx \
+    --model_type QuantAxModel \
+    --output_dir compiled_subgraph_from_quant_onnx \
+    --output_name transformers.axmodel \
+    --config pulsar2_configs/transformers_subgraph.json \
+    --target_hardware AX650 \
+    --npu_mode NPU3
+```
+## Technical Support
+If you encounter any issues or have questions about the implementation:
+- **GitHub Issues**: [Create an issue](https://github.com/AXERA-TECH) for bug reports and feature requests
+- **QQ Group**: 139953715 (Chinese community support)
+## License
+This project is licensed under the BSD-3-Clause License. See the LICENSE file for details.
+---
+**Note:** This implementation is optimized for AXERA AX650N hardware. Performance and compatibility may vary on other platforms.

VideoX-Fun/pulsar2_configs/transformers_subgraph_512x512.json ADDED Viewed

	@@ -0,0 +1,207 @@

+{
+  "model_type": "ONNX",
+  "npu_mode": "NPU3",
+  "quant": {
+    "input_configs": [
+      {
+        "tensor_name": "DEFAULT",
+        "calibration_dataset": "./onnx-calibration-no-controlnet/transformer.tar",
+        "calibration_size": 4,
+        "calibration_format": "NumpyObject"
+      }
+    ],
+    "calibration_method": "MinMax",
+    "precision_analysis": true,
+    "precision_analysis_method": "EndToEnd",
+    "layer_configs": [
+      {
+        "start_tensor_names": [
+          "DEFAULT"
+        ],
+        "end_tensor_names": [
+          "DEFAULT"
+        ],
+        "data_type": "U16"
+      }
+    ]
+  },
+  "input_processors": [
+    {
+      "tensor_name": "DEFAULT",
+      "tensor_format": "AutoColorSpace",
+      "tensor_layout": "NCHW"
+    }
+  ],
+  "compiler": {
+    "check": 0,
+    "sub_configs": [
+      {
+        "start_tensor_names": ["timestep"],
+        "end_tensor_names": ["/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["prompt_embeds"],
+        "end_tensor_names": ["/model/Slice_1_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["latent_model_input", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/Slice_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/Slice_1_output_0", "/model/Slice_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.0/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.0/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.1/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.1/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.2/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.2/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.3/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.3/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.4/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.4/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.5/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.5/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.6/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.6/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.7/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.7/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.8/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.8/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.9/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.9/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.10/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.10/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.11/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.11/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.12/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.12/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.13/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.13/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.14/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.14/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.15/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.15/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.16/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.16/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.17/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.17/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.18/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.18/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.19/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.19/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.20/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.20/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.21/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.21/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.22/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.22/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.23/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.23/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.24/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.24/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.25/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.25/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.26/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.26/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.27/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.27/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.28/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      },
+      {
+        "start_tensor_names": ["/model/layers.28/Add_4_output_0", "/model/t_embedder/mlp/mlp.2/Gemm_output_0"],
+        "end_tensor_names": ["/model/layers.29/Add_4_output_0"],
+        "check_mode": "CheckPerLayer"
+      }
+    ]
+  }
+}