File size: 4,499 Bytes
420f791
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
# Differences Between Reference Code and Current Implementation

## Critical Differences Affecting Results

### 1. **First Iteration Handling** ⚠️ **CRITICAL**
**Reference Code:**
```python
if itr == 0:
    # Don't add priors or diffusion noise to the first iteration
    output = model(image_tensor)
    # ... just get predictions, no gradient update
else:
    # Calculate loss and gradients
    if loss_infer == 'PGDD':
        loss = torch.nn.functional.mse_loss(features, noisy_features)
        grad = torch.autograd.grad(loss, image_tensor)[0]
        adjusted_grad = inferstep.step(image_tensor, grad)
    # ... apply gradient and noise
```

**Current Implementation:**
- **MISSING**: No check for `itr == 0` or `i == 0`
- Applies gradients and diffusion noise from the very first iteration
- This causes different starting behavior

### 2. **Model Extraction for PGDD**
**Reference Code:**
```python
new_model = extract_middle_layers(model.module, top_layer)
```

**Current Implementation:**
- Complex logic to handle Sequential models with normalizers
- Extracts from `model[1]` if Sequential, otherwise from `model`
- May handle DataParallel differently

### 3. **Gradient Calculation**
**Reference Code:**
```python
grad = torch.autograd.grad(loss, image_tensor)[0]  # No retain_graph for PGDD
```

**Current Implementation:**
- Same for PGDD (no retain_graph)
- But uses `retain_graph=True` for IncreaseConfidence

### 4. **Normalization Handling**
**Reference Code:**
- Normalization is applied in the transform at the beginning
- `inference_normalization` controls whether transform includes normalization
- Model forward pass uses the already-normalized tensor

**Current Implementation:**
- Complex logic checking if model is Sequential with NormalizeByChannelMeanStd
- May apply normalization multiple times or inconsistently
- Different paths for sequential vs non-sequential models

### 5. **Variable Naming and Structure**
**Reference Code:**
- Uses `image_tensor` throughout the loop
- Directly modifies `image_tensor` with `requires_grad=True`

**Current Implementation:**
- Creates separate `x = image_tensor.clone().detach().requires_grad_(True)`
- Uses `x` in the loop instead of `image_tensor`

### 6. **Loss Function for IncreaseConfidence**
**Reference Code:**
```python
loss = calculate_loss(features, least_confident_classes[0], loss_function)
# Uses CrossEntropyLoss or MSELoss based on loss_function
```

**Current Implementation:**
```python
# Creates one-hot targets and uses MSE on softmax outputs
loss = loss + F.mse_loss(F.softmax(output, dim=1), one_hot)
```
- Different loss calculation method
- Uses MSE on softmax probabilities vs CrossEntropy on logits

### 7. **Diffusion Noise Application**
**Reference Code:**
```python
if itr == 0:
    # Skip noise
else:
    diffusion_noise = diffusion_noise_ratio * torch.randn_like(image_tensor).cuda()
    if loss_infer == 'GradModulation':
        image_tensor = inferstep.project(
            image_tensor.clone() +
            adjusted_grad * grad_modulation +
            diffusion_noise * grad_modulation
        )
    else:
        image_tensor = inferstep.project(
            image_tensor.clone() + adjusted_grad + diffusion_noise
        )
```

**Current Implementation:**
- Always applies diffusion noise (no `itr == 0` check)
- Applies noise in all iterations including the first

### 8. **Model Forward Pass in Loop**
**Reference Code:**
```python
if inference_config['misc_info'].get('smooth_inference', False):
    # Smooth inference logic
else:
    new_model.zero_grad()
    features = new_model(image_tensor)
```

**Current Implementation:**
```python
x.grad = None  # Instead of new_model.zero_grad()
if config['loss_infer'] == 'Prior-Guided Drift Diffusion' and layer_model is not None:
    output = layer_model(x)
else:
    output = model(x)
```

## Summary of Impact

1. **First iteration difference**: Most critical - reference skips gradient update on iteration 0
2. **Normalization**: Different application may cause numerical differences
3. **Loss calculation**: Different methods for IncreaseConfidence
4. **Model extraction**: May extract different layers due to Sequential handling

## Recommended Fixes

1. Add `if i == 0:` check to skip gradient update on first iteration
2. Simplify model extraction to match reference: `extract_middle_layers(model.module, top_layer)`
3. Align loss calculation for IncreaseConfidence with reference
4. Ensure normalization is applied consistently