nvidia
/

finite-difference-flow-optimization

reinforcement-learning

stable-diffusion

Model card Files Files and versions

tkarras commited on Mar 16

Commit

2efa950

·

verified ·

1 Parent(s): 2b13d1b

Update ArXiv link.

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -40,13 +40,13 @@ The checkpoints are intended for academic researchers who want to reproduce the
 **Finite Difference Flow Optimization for RL Post-Training of Text-to-Image Models**<br/>
 David McAllister, Miika Aittala, Tero Karras, Janne Hellsten, Angjoo Kanazawa, Timo Aila, Samuli Laine<br/>
-https://arxiv.org/abs/TODO.TODO
 ## Release date
-TODO
 ## References
-**Research paper:** https://arxiv.org/abs/TODO.TODO<br/>
 **Source code:** https://github.com/NVlabs/finite-difference-flow-optimization<br/>
 **Checkpoints:** https://huggingface.co/nvidia/finite-difference-flow-optimization<br/>
@@ -55,7 +55,7 @@ TODO
 **Network architecture:** Low-rank adapter for Stable Diffusion 3.5 Medium<br/>
 **Number of model parameters:** 1.9*10^7<br/>
-The low-rank adapter was initialized to zero and trained using Finite Difference Flow Optimization for 1000 RL epochs, where one RL epoch corresponds to 864 reward evaluations. See the associated [research paper](https://arxiv.org/abs/TODO.TODO) for further details.
 ## Input
 **Input type:** Text<br/>

 **Finite Difference Flow Optimization for RL Post-Training of Text-to-Image Models**<br/>
 David McAllister, Miika Aittala, Tero Karras, Janne Hellsten, Angjoo Kanazawa, Timo Aila, Samuli Laine<br/>
+https://arxiv.org/abs/2603.12893
 ## Release date
+March 16, 2026
 ## References
+**Research paper:** https://arxiv.org/abs/2603.12893<br/>
 **Source code:** https://github.com/NVlabs/finite-difference-flow-optimization<br/>
 **Checkpoints:** https://huggingface.co/nvidia/finite-difference-flow-optimization<br/>
 **Network architecture:** Low-rank adapter for Stable Diffusion 3.5 Medium<br/>
 **Number of model parameters:** 1.9*10^7<br/>
+The low-rank adapter was initialized to zero and trained using Finite Difference Flow Optimization for 1000 RL epochs, where one RL epoch corresponds to 864 reward evaluations. See the associated [research paper](https://arxiv.org/abs/2603.12893) for further details.
 ## Input
 **Input type:** Text<br/>