ml-figs-ldm / README.md
salamnocap's picture
Update README.md
c1813ca verified
---
license: cc-by-nc-4.0
datasets:
- salamnocap/ml-figs
language:
- en
pipeline_tag: text-to-image
---
# [ML-FIGS-LDM](https://github.com/salamnocap/ml-figs-ldm)
## EDUCATIONAL FIGURE GENERATION USING TEXT PERCEPTUAL LOSS
ML-FIGS-LDM is a Latent Diffusion Model (LDM) for generating educational figures. The AutoencoderKL is trained using a Text Perceptual Loss to reconstruct more readable text within the figures.
### Autoencoder Performance Comparison:
**Note**:
- `AutoencoderKL_SD`: Stable Diffusion v1-4 autoencoder trained on LAION.
- `AutoencoderKL_TPL`: Autoencoders trained with Text Perceptual Loss (TPL).
- [Model A](https://huggingface.co/salamnocap/ml-figs-ldm/blob/main/AutoencoderKL_MlFigs.ckpt) is trained on **ML-Figs**.
- [Model B](https://huggingface.co/salamnocap/ml-figs-ldm/blob/main/AutoencoderKL_MlFigsSciCap.ckpt) is trained on **ML-Figs + SciCap**.
| Dataset | Method | PSNR ↑ | SSIM ↑ | FID ↓ | LPIPS ↓ | MSE ↓ | TPL ↓ |
|--------------------------|---------------------------|--------|--------|---------|---------|--------|--------|
| **ML-Figs Test** | `AutoencoderKL_SD` | 33.01 | 0.970 | 20.51 | 0.022 | 0.003 | 0.043 |
| | `AutoencoderKL_TPL` A | 30.71 | 0.954 | 16.13 | 0.056 | 0.002 | 0.017 |
| **ML-Figs + SciCap Test**| `AutoencoderKL_SD` | 32.60 | 0.970 | 12.69 | 0.023 | 0.004 | 0.061 |
| | `AutoencoderKL_TPL` A | 29.94 | 0.954 | 9.235 | 0.057 | 0.003 | 0.028 |
| | `AutoencoderKL_TPL` B | 31.47 | 0.979 | 6.256 | 0.016 | 0.001 | 0.010 |
### Latent Diffusion Model:
[MlFigs_LDM_12.ckpt](https://huggingface.co/salamnocap/ml-figs-ldm/blob/main/MlFigs_LDM_12.ckpt)