--- license: cc-by-nc-4.0 datasets: - salamnocap/ml-figs language: - en pipeline_tag: text-to-image --- # [ML-FIGS-LDM](https://github.com/salamnocap/ml-figs-ldm) ## EDUCATIONAL FIGURE GENERATION USING TEXT PERCEPTUAL LOSS ML-FIGS-LDM is a Latent Diffusion Model (LDM) for generating educational figures. The AutoencoderKL is trained using a Text Perceptual Loss to reconstruct more readable text within the figures. ### Autoencoder Performance Comparison: **Note**: - `AutoencoderKL_SD`: Stable Diffusion v1-4 autoencoder trained on LAION. - `AutoencoderKL_TPL`: Autoencoders trained with Text Perceptual Loss (TPL). - [Model A](https://huggingface.co/salamnocap/ml-figs-ldm/blob/main/AutoencoderKL_MlFigs.ckpt) is trained on **ML-Figs**. - [Model B](https://huggingface.co/salamnocap/ml-figs-ldm/blob/main/AutoencoderKL_MlFigsSciCap.ckpt) is trained on **ML-Figs + SciCap**. | Dataset | Method | PSNR ↑ | SSIM ↑ | FID ↓ | LPIPS ↓ | MSE ↓ | TPL ↓ | |--------------------------|---------------------------|--------|--------|---------|---------|--------|--------| | **ML-Figs Test** | `AutoencoderKL_SD` | 33.01 | 0.970 | 20.51 | 0.022 | 0.003 | 0.043 | | | `AutoencoderKL_TPL` A | 30.71 | 0.954 | 16.13 | 0.056 | 0.002 | 0.017 | | **ML-Figs + SciCap Test**| `AutoencoderKL_SD` | 32.60 | 0.970 | 12.69 | 0.023 | 0.004 | 0.061 | | | `AutoencoderKL_TPL` A | 29.94 | 0.954 | 9.235 | 0.057 | 0.003 | 0.028 | | | `AutoencoderKL_TPL` B | 31.47 | 0.979 | 6.256 | 0.016 | 0.001 | 0.010 | ### Latent Diffusion Model: [MlFigs_LDM_12.ckpt](https://huggingface.co/salamnocap/ml-figs-ldm/blob/main/MlFigs_LDM_12.ckpt)