Spaces:
Sleeping
Sleeping
File size: 4,499 Bytes
420f791 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
# Differences Between Reference Code and Current Implementation
## Critical Differences Affecting Results
### 1. **First Iteration Handling** ⚠️ **CRITICAL**
**Reference Code:**
```python
if itr == 0:
# Don't add priors or diffusion noise to the first iteration
output = model(image_tensor)
# ... just get predictions, no gradient update
else:
# Calculate loss and gradients
if loss_infer == 'PGDD':
loss = torch.nn.functional.mse_loss(features, noisy_features)
grad = torch.autograd.grad(loss, image_tensor)[0]
adjusted_grad = inferstep.step(image_tensor, grad)
# ... apply gradient and noise
```
**Current Implementation:**
- **MISSING**: No check for `itr == 0` or `i == 0`
- Applies gradients and diffusion noise from the very first iteration
- This causes different starting behavior
### 2. **Model Extraction for PGDD**
**Reference Code:**
```python
new_model = extract_middle_layers(model.module, top_layer)
```
**Current Implementation:**
- Complex logic to handle Sequential models with normalizers
- Extracts from `model[1]` if Sequential, otherwise from `model`
- May handle DataParallel differently
### 3. **Gradient Calculation**
**Reference Code:**
```python
grad = torch.autograd.grad(loss, image_tensor)[0] # No retain_graph for PGDD
```
**Current Implementation:**
- Same for PGDD (no retain_graph)
- But uses `retain_graph=True` for IncreaseConfidence
### 4. **Normalization Handling**
**Reference Code:**
- Normalization is applied in the transform at the beginning
- `inference_normalization` controls whether transform includes normalization
- Model forward pass uses the already-normalized tensor
**Current Implementation:**
- Complex logic checking if model is Sequential with NormalizeByChannelMeanStd
- May apply normalization multiple times or inconsistently
- Different paths for sequential vs non-sequential models
### 5. **Variable Naming and Structure**
**Reference Code:**
- Uses `image_tensor` throughout the loop
- Directly modifies `image_tensor` with `requires_grad=True`
**Current Implementation:**
- Creates separate `x = image_tensor.clone().detach().requires_grad_(True)`
- Uses `x` in the loop instead of `image_tensor`
### 6. **Loss Function for IncreaseConfidence**
**Reference Code:**
```python
loss = calculate_loss(features, least_confident_classes[0], loss_function)
# Uses CrossEntropyLoss or MSELoss based on loss_function
```
**Current Implementation:**
```python
# Creates one-hot targets and uses MSE on softmax outputs
loss = loss + F.mse_loss(F.softmax(output, dim=1), one_hot)
```
- Different loss calculation method
- Uses MSE on softmax probabilities vs CrossEntropy on logits
### 7. **Diffusion Noise Application**
**Reference Code:**
```python
if itr == 0:
# Skip noise
else:
diffusion_noise = diffusion_noise_ratio * torch.randn_like(image_tensor).cuda()
if loss_infer == 'GradModulation':
image_tensor = inferstep.project(
image_tensor.clone() +
adjusted_grad * grad_modulation +
diffusion_noise * grad_modulation
)
else:
image_tensor = inferstep.project(
image_tensor.clone() + adjusted_grad + diffusion_noise
)
```
**Current Implementation:**
- Always applies diffusion noise (no `itr == 0` check)
- Applies noise in all iterations including the first
### 8. **Model Forward Pass in Loop**
**Reference Code:**
```python
if inference_config['misc_info'].get('smooth_inference', False):
# Smooth inference logic
else:
new_model.zero_grad()
features = new_model(image_tensor)
```
**Current Implementation:**
```python
x.grad = None # Instead of new_model.zero_grad()
if config['loss_infer'] == 'Prior-Guided Drift Diffusion' and layer_model is not None:
output = layer_model(x)
else:
output = model(x)
```
## Summary of Impact
1. **First iteration difference**: Most critical - reference skips gradient update on iteration 0
2. **Normalization**: Different application may cause numerical differences
3. **Loss calculation**: Different methods for IncreaseConfidence
4. **Model extraction**: May extract different layers due to Sequential handling
## Recommended Fixes
1. Add `if i == 0:` check to skip gradient update on first iteration
2. Simplify model extraction to match reference: `extract_middle_layers(model.module, top_layer)`
3. Align loss calculation for IncreaseConfidence with reference
4. Ensure normalization is applied consistently
|