# Differences Between Reference Code and Current Implementation ## Critical Differences Affecting Results ### 1. **First Iteration Handling** ⚠️ **CRITICAL** **Reference Code:** ```python if itr == 0: # Don't add priors or diffusion noise to the first iteration output = model(image_tensor) # ... just get predictions, no gradient update else: # Calculate loss and gradients if loss_infer == 'PGDD': loss = torch.nn.functional.mse_loss(features, noisy_features) grad = torch.autograd.grad(loss, image_tensor)[0] adjusted_grad = inferstep.step(image_tensor, grad) # ... apply gradient and noise ``` **Current Implementation:** - **MISSING**: No check for `itr == 0` or `i == 0` - Applies gradients and diffusion noise from the very first iteration - This causes different starting behavior ### 2. **Model Extraction for PGDD** **Reference Code:** ```python new_model = extract_middle_layers(model.module, top_layer) ``` **Current Implementation:** - Complex logic to handle Sequential models with normalizers - Extracts from `model[1]` if Sequential, otherwise from `model` - May handle DataParallel differently ### 3. **Gradient Calculation** **Reference Code:** ```python grad = torch.autograd.grad(loss, image_tensor)[0] # No retain_graph for PGDD ``` **Current Implementation:** - Same for PGDD (no retain_graph) - But uses `retain_graph=True` for IncreaseConfidence ### 4. **Normalization Handling** **Reference Code:** - Normalization is applied in the transform at the beginning - `inference_normalization` controls whether transform includes normalization - Model forward pass uses the already-normalized tensor **Current Implementation:** - Complex logic checking if model is Sequential with NormalizeByChannelMeanStd - May apply normalization multiple times or inconsistently - Different paths for sequential vs non-sequential models ### 5. **Variable Naming and Structure** **Reference Code:** - Uses `image_tensor` throughout the loop - Directly modifies `image_tensor` with `requires_grad=True` **Current Implementation:** - Creates separate `x = image_tensor.clone().detach().requires_grad_(True)` - Uses `x` in the loop instead of `image_tensor` ### 6. **Loss Function for IncreaseConfidence** **Reference Code:** ```python loss = calculate_loss(features, least_confident_classes[0], loss_function) # Uses CrossEntropyLoss or MSELoss based on loss_function ``` **Current Implementation:** ```python # Creates one-hot targets and uses MSE on softmax outputs loss = loss + F.mse_loss(F.softmax(output, dim=1), one_hot) ``` - Different loss calculation method - Uses MSE on softmax probabilities vs CrossEntropy on logits ### 7. **Diffusion Noise Application** **Reference Code:** ```python if itr == 0: # Skip noise else: diffusion_noise = diffusion_noise_ratio * torch.randn_like(image_tensor).cuda() if loss_infer == 'GradModulation': image_tensor = inferstep.project( image_tensor.clone() + adjusted_grad * grad_modulation + diffusion_noise * grad_modulation ) else: image_tensor = inferstep.project( image_tensor.clone() + adjusted_grad + diffusion_noise ) ``` **Current Implementation:** - Always applies diffusion noise (no `itr == 0` check) - Applies noise in all iterations including the first ### 8. **Model Forward Pass in Loop** **Reference Code:** ```python if inference_config['misc_info'].get('smooth_inference', False): # Smooth inference logic else: new_model.zero_grad() features = new_model(image_tensor) ``` **Current Implementation:** ```python x.grad = None # Instead of new_model.zero_grad() if config['loss_infer'] == 'Prior-Guided Drift Diffusion' and layer_model is not None: output = layer_model(x) else: output = model(x) ``` ## Summary of Impact 1. **First iteration difference**: Most critical - reference skips gradient update on iteration 0 2. **Normalization**: Different application may cause numerical differences 3. **Loss calculation**: Different methods for IncreaseConfidence 4. **Model extraction**: May extract different layers due to Sequential handling ## Recommended Fixes 1. Add `if i == 0:` check to skip gradient update on first iteration 2. Simplify model extraction to match reference: `extract_middle_layers(model.module, top_layer)` 3. Align loss calculation for IncreaseConfidence with reference 4. Ensure normalization is applied consistently