# Differences Between Reference Code and Current Implementation

## Critical Differences Affecting Results

### 1. **First Iteration Handling** ⚠️ **CRITICAL**
**Reference Code:**
```python
if itr == 0:
    # Don't add priors or diffusion noise to the first iteration
    output = model(image_tensor)
    # ... just get predictions, no gradient update
else:
    # Calculate loss and gradients
    if loss_infer == 'PGDD':
        loss = torch.nn.functional.mse_loss(features, noisy_features)
        grad = torch.autograd.grad(loss, image_tensor)[0]
        adjusted_grad = inferstep.step(image_tensor, grad)
    # ... apply gradient and noise
```

**Current Implementation:**
- **MISSING**: No check for `itr == 0` or `i == 0`
- Applies gradients and diffusion noise from the very first iteration
- This causes different starting behavior

### 2. **Model Extraction for PGDD**
**Reference Code:**
```python
new_model = extract_middle_layers(model.module, top_layer)
```

**Current Implementation:**
- Complex logic to handle Sequential models with normalizers
- Extracts from `model[1]` if Sequential, otherwise from `model`
- May handle DataParallel differently

### 3. **Gradient Calculation**
**Reference Code:**
```python
grad = torch.autograd.grad(loss, image_tensor)[0]  # No retain_graph for PGDD
```

**Current Implementation:**
- Same for PGDD (no retain_graph)
- But uses `retain_graph=True` for IncreaseConfidence

### 4. **Normalization Handling**
**Reference Code:**
- Normalization is applied in the transform at the beginning
- `inference_normalization` controls whether transform includes normalization
- Model forward pass uses the already-normalized tensor

**Current Implementation:**
- Complex logic checking if model is Sequential with NormalizeByChannelMeanStd
- May apply normalization multiple times or inconsistently
- Different paths for sequential vs non-sequential models

### 5. **Variable Naming and Structure**
**Reference Code:**
- Uses `image_tensor` throughout the loop
- Directly modifies `image_tensor` with `requires_grad=True`

**Current Implementation:**
- Creates separate `x = image_tensor.clone().detach().requires_grad_(True)`
- Uses `x` in the loop instead of `image_tensor`

### 6. **Loss Function for IncreaseConfidence**
**Reference Code:**
```python
loss = calculate_loss(features, least_confident_classes[0], loss_function)
# Uses CrossEntropyLoss or MSELoss based on loss_function
```

**Current Implementation:**
```python
# Creates one-hot targets and uses MSE on softmax outputs
loss = loss + F.mse_loss(F.softmax(output, dim=1), one_hot)
```
- Different loss calculation method
- Uses MSE on softmax probabilities vs CrossEntropy on logits

### 7. **Diffusion Noise Application**
**Reference Code:**
```python
if itr == 0:
    # Skip noise
else:
    diffusion_noise = diffusion_noise_ratio * torch.randn_like(image_tensor).cuda()
    if loss_infer == 'GradModulation':
        image_tensor = inferstep.project(
            image_tensor.clone() +
            adjusted_grad * grad_modulation +
            diffusion_noise * grad_modulation
        )
    else:
        image_tensor = inferstep.project(
            image_tensor.clone() + adjusted_grad + diffusion_noise
        )
```

**Current Implementation:**
- Always applies diffusion noise (no `itr == 0` check)
- Applies noise in all iterations including the first

### 8. **Model Forward Pass in Loop**
**Reference Code:**
```python
if inference_config['misc_info'].get('smooth_inference', False):
    # Smooth inference logic
else:
    new_model.zero_grad()
    features = new_model(image_tensor)
```

**Current Implementation:**
```python
x.grad = None  # Instead of new_model.zero_grad()
if config['loss_infer'] == 'Prior-Guided Drift Diffusion' and layer_model is not None:
    output = layer_model(x)
else:
    output = model(x)
```

## Summary of Impact

1. **First iteration difference**: Most critical - reference skips gradient update on iteration 0
2. **Normalization**: Different application may cause numerical differences
3. **Loss calculation**: Different methods for IncreaseConfidence
4. **Model extraction**: May extract different layers due to Sequential handling

## Recommended Fixes

1. Add `if i == 0:` check to skip gradient update on first iteration
2. Simplify model extraction to match reference: `extract_middle_layers(model.module, top_layer)`
3. Align loss calculation for IncreaseConfidence with reference
4. Ensure normalization is applied consistently