rtrevinnoc
/

penumbra

mask-optimization

inverse-lithography

Model card Files Files and versions

penumbra / README.md

rtrevinnoc's picture

Add files using upload-large-folder tool

221f0b2 verified about 2 months ago

|

history blame contribute delete

3.29 kB

	---
	language:
	- en
	library_name: hls4ml
	datasets:
	- lithobench
	tags:
	- pytorch
	- hls4ml
	- fpga
	- neural-network
	- quantization
	- xilinx
	- mask-optimization
	- lithography
	- inverse-lithography
	license: gpl
	---

	# Penumbra UNet: FPGA-Accelerated Mask Optimization

	A compressed U-Net neural network for on-chip FPGA acceleration of Inverse Lithography Technology (ILT) mask optimization, targeting the Xilinx VU47P (AWS F2).

	## Overview

	Penumbra UNet compresses a full-size teacher network by 64× (7.8M → 122K parameters) to fit entirely in on-chip BRAM, enabling a fully on-chip dataflow that eliminates external DRAM access.

	## Architecture

	### Network Structure

	U-Net encoder-decoder with extreme parameter compression:

	Encoder:
	- Conv 1→8 channels, 64×64 + MaxPool → 32×32
	- Conv 8→16 channels, 32×32 + MaxPool → 16×16
	- Conv 16→32 channels, 16×16 + MaxPool → 8×8

	Bottleneck:
	- Conv 32→64 channels, 8×8

	Decoder:
	- Upsample + skip concatenation + Conv 96→32 channels, 16×16
	- Upsample + skip concatenation + Conv 48→16 channels, 32×32
	- Upsample + skip concatenation + Conv 24→8 channels, 64×64

	Output:
	- Conv 1×1 + Sigmoid → 64×64

	Compression summary:
	\| Metric \| Full model \| Penumbra UNet \|
	\|--------\|-----------\|----------\|
	\| Parameters \| 7.8M \| 122K \|
	\| Input tile \| 512×512 \| 64×64 \|
	\| Max channels \| 512 \| 64 \|

	### Tiling & Reassembly

	Input 512×512 masks are decomposed into 16×16 grid of 64×64 tiles (256 total):
	- Overlap: 16-pixel reflection padding for boundary handling
	- Usable core: 32×32 center pixels per tile
	- Batch processing: 256 tiles → 4 sequential batches of 64

	Reassembly uses only differentiable operations (slice, reshape, permute) to enable end-to-end gradient flow:
	```
	(256, 1, 64, 64) [all tiles]
	↓ center-crop
	(256, 1, 32, 32) [usable cores]
	↓ reshape + permute
	(1, 1, 512, 512) [full mask]
	```

	## Training

	### Phase 1: Knowledge Distillation
	- Epochs: 16
	- Input: 64×64 crops
	- Loss: α-blended (α decays 0.7→0)
	```
	L = α·MSE(student, teacher) + (1-α)·MSE(student, ground_truth)
	```
	- Optimizer: Adam (lr=1e-3), cosine-annealing schedule
	- Teacher: Frozen full-size NeuralILT model

	### Phase 2: Physics-Informed Fine-Tuning
	- Epochs: 4
	- Pipeline: Full tiled forward pass through differentiable lithography simulator
	- Loss: Print fidelity + process variation
	```
	L = MSE(P_nom, target) + MSE(P_max, P_min)
	```
	- Optimizer: Adam (lr=1e-4), StepLR (γ=0.1 at epoch 2)
	- Gradients: Flow through tiled reassembly to all network weights

	## Code Organization

	```
	hls4ml_penumbra/
	├── firmware/ # Generated HLS C++ project
	│ ├── myproject.cpp # Top-level module
	│ ├── myproject.h # Interface & config
	│ ├── weights/ # Quantized weights
	│ ├── ap_types/ # Xilinx AP types (ap_fixed, ap_int)
	│ └── utils/ # HLS utilities
	├── myproject_prj/ # Vivado HLS project
	│ └── solution1/
	│ └── impl/ # Implementation artifacts
	├── logs/ # Build logs
	└── [HLS build outputs]
	```

	---

	Author: Roberto Treviño Cervantes