---
license: cc-by-nc-4.0
datasets:
- salamnocap/ml-figs
language:
- en
pipeline_tag: text-to-image
---

# [ML-FIGS-LDM](https://github.com/salamnocap/ml-figs-ldm)
## EDUCATIONAL FIGURE GENERATION USING TEXT PERCEPTUAL LOSS

ML-FIGS-LDM is a Latent Diffusion Model (LDM) for generating educational figures. The AutoencoderKL is trained using a Text Perceptual Loss to reconstruct more readable text within the figures.

### Autoencoder Performance Comparison:
**Note**:  
- `AutoencoderKL_SD`: Stable Diffusion v1-4 autoencoder trained on LAION.  
- `AutoencoderKL_TPL`: Autoencoders trained with Text Perceptual Loss (TPL).  
- [Model A](https://huggingface.co/salamnocap/ml-figs-ldm/blob/main/AutoencoderKL_MlFigs.ckpt) is trained on **ML-Figs**.  
- [Model B](https://huggingface.co/salamnocap/ml-figs-ldm/blob/main/AutoencoderKL_MlFigsSciCap.ckpt) is trained on **ML-Figs + SciCap**.  

| Dataset                  | Method                    | PSNR ↑ | SSIM ↑ | FID ↓   | LPIPS ↓ | MSE ↓  | TPL ↓  |
|--------------------------|---------------------------|--------|--------|---------|---------|--------|--------|
| **ML-Figs Test**         | `AutoencoderKL_SD`        | 33.01  | 0.970  | 20.51   | 0.022   | 0.003  | 0.043  |
|                          | `AutoencoderKL_TPL` A     | 30.71  | 0.954  | 16.13   | 0.056   | 0.002  | 0.017  |
| **ML-Figs + SciCap Test**| `AutoencoderKL_SD`        | 32.60  | 0.970  | 12.69   | 0.023   | 0.004  | 0.061  |
|                          | `AutoencoderKL_TPL` A     | 29.94  | 0.954  | 9.235   | 0.057   | 0.003  | 0.028  |
|                          | `AutoencoderKL_TPL` B     | 31.47  | 0.979  | 6.256   | 0.016   | 0.001  | 0.010  |

### Latent Diffusion Model:
[MlFigs_LDM_12.ckpt](https://huggingface.co/salamnocap/ml-figs-ldm/blob/main/MlFigs_LDM_12.ckpt)