Spaces:

bhsinghgrid
/

DevaFlow-space

Sleeping

App Files Files Community

DevaFlow-space / analysis /reports /task4_step_ablation_report.md

bhsinghgrid

Upgrade UI: model selection + tasks 1-5 + analysis modules

29e5bf8 verified 6 days ago

preview code

raw

history blame contribute delete

3.7 kB

A newer version of the Gradio SDK is available: 6.10.0

Upgrade

Task 4 Report: Diffusion Step Ablation

1. Objective

Task 4 studies how the number of diffusion steps affects meaning preservation, speed, and robustness. The hypothesis is that fewer denoising steps may improve speed, but too few steps may reduce output quality. This type of ablation is important for mentor evaluation because it tests a core design parameter of the D3PM model.

Unlike the earlier tasks, this one requires retraining separate checkpoints for each step count. This is not optional. A model trained at T=128 cannot be evaluated fairly at T=4 or T=8 without retraining, because the timestep distribution seen during training changes fundamentally.

2. Implementation Approach

The implementation is in analysis/step_ablation.py. I patched the workflow so it is safe for this repository:

it no longer overwrites config.py
it uses environment variables for DIFFUSION_STEPS
each training run writes directly to ablation_results/T*

Training Script Generation Snippet

f.write(
    f"MODEL_TYPE=\"$MODEL_TYPE\" INCLUDE_NEG=\"$INCLUDE_NEG\" "
    f"TRAIN_DEVICE=\"$TRAIN_DEVICE\" "
    f"DIFFUSION_STEPS={T} INFERENCE_NUM_STEPS={T} "
    f"TRAIN_OUTPUT_DIR=\"ablation_results/T{T}\" "
    f"python train.py\n\n"
)

This makes the ablation workflow reproducible without mutating repository files between runs.

3. Current Workflow

Task 4 now supports the following sequence:

uv run --active analysis/run_analysis.py --task 4 --phase generate_configs
bash ablation_configs/train_all.sh
uv run --active analysis/run_analysis.py --task 4 --phase analyze

Generated script:

ablation_configs/train_all.sh

This script trains:

T=4
T=8
T=16
T=32
T=64

with outputs saved to ablation_results/T4, T8, T16, T32, and T64.

4. Current Result Status

At the moment, no trained ablation checkpoints exist in ablation_results/T*/best_model.pt. Therefore, the analysis phase has no quantitative result yet. That means Task 4 currently has a correct implementation pipeline, but not a completed experiment.

This distinction matters for evaluation:

the workflow is correct
the experiment has not yet produced final numbers

5. Evaluation Value

For mentor evaluation, Task 4 can still be included, but it should be presented as:

a completed experimental setup
a validated retraining workflow
pending final quantitative results

This is still useful because ablation design is part of research rigor. It shows that the project is set up to test the effect of a critical modeling choice instead of assuming the default step count is optimal.

6. Benefits

Once the checkpoints are trained, this task will answer:

how much generation speed improves as diffusion steps decrease
how meaning preservation changes with fewer steps
where the best quality-speed tradeoff lies
whether the current choice of diffusion steps is over- or under-provisioned

7. Limitations

The limitation is straightforward: there are no ablation checkpoints yet, so there are no real results to defend. It should not be presented as a finished evaluation experiment at this stage.

8. Conclusion

Task 4 is structurally correct and now safe to run in this repository. It is suitable for mentor evaluation as an experimental design and workflow contribution, but not yet as a result section. The next milestone is to train the five ablation checkpoints and run the analysis phase to generate the actual CER-speed comparison.