Update ArXiv link.
Browse files
README.md
CHANGED
|
@@ -40,13 +40,13 @@ The checkpoints are intended for academic researchers who want to reproduce the
|
|
| 40 |
|
| 41 |
**Finite Difference Flow Optimization for RL Post-Training of Text-to-Image Models**<br/>
|
| 42 |
David McAllister, Miika Aittala, Tero Karras, Janne Hellsten, Angjoo Kanazawa, Timo Aila, Samuli Laine<br/>
|
| 43 |
-
https://arxiv.org/abs/
|
| 44 |
|
| 45 |
## Release date
|
| 46 |
-
|
| 47 |
|
| 48 |
## References
|
| 49 |
-
**Research paper:** https://arxiv.org/abs/
|
| 50 |
**Source code:** https://github.com/NVlabs/finite-difference-flow-optimization<br/>
|
| 51 |
**Checkpoints:** https://huggingface.co/nvidia/finite-difference-flow-optimization<br/>
|
| 52 |
|
|
@@ -55,7 +55,7 @@ TODO
|
|
| 55 |
**Network architecture:** Low-rank adapter for Stable Diffusion 3.5 Medium<br/>
|
| 56 |
**Number of model parameters:** 1.9*10^7<br/>
|
| 57 |
|
| 58 |
-
The low-rank adapter was initialized to zero and trained using Finite Difference Flow Optimization for 1000 RL epochs, where one RL epoch corresponds to 864 reward evaluations. See the associated [research paper](https://arxiv.org/abs/
|
| 59 |
|
| 60 |
## Input
|
| 61 |
**Input type:** Text<br/>
|
|
|
|
| 40 |
|
| 41 |
**Finite Difference Flow Optimization for RL Post-Training of Text-to-Image Models**<br/>
|
| 42 |
David McAllister, Miika Aittala, Tero Karras, Janne Hellsten, Angjoo Kanazawa, Timo Aila, Samuli Laine<br/>
|
| 43 |
+
https://arxiv.org/abs/2603.12893
|
| 44 |
|
| 45 |
## Release date
|
| 46 |
+
March 16, 2026
|
| 47 |
|
| 48 |
## References
|
| 49 |
+
**Research paper:** https://arxiv.org/abs/2603.12893<br/>
|
| 50 |
**Source code:** https://github.com/NVlabs/finite-difference-flow-optimization<br/>
|
| 51 |
**Checkpoints:** https://huggingface.co/nvidia/finite-difference-flow-optimization<br/>
|
| 52 |
|
|
|
|
| 55 |
**Network architecture:** Low-rank adapter for Stable Diffusion 3.5 Medium<br/>
|
| 56 |
**Number of model parameters:** 1.9*10^7<br/>
|
| 57 |
|
| 58 |
+
The low-rank adapter was initialized to zero and trained using Finite Difference Flow Optimization for 1000 RL epochs, where one RL epoch corresponds to 864 reward evaluations. See the associated [research paper](https://arxiv.org/abs/2603.12893) for further details.
|
| 59 |
|
| 60 |
## Input
|
| 61 |
**Input type:** Text<br/>
|