|
|
---
|
|
|
license: cc-by-nc-4.0
|
|
|
datasets:
|
|
|
- salamnocap/ml-figs
|
|
|
language:
|
|
|
- en
|
|
|
pipeline_tag: text-to-image
|
|
|
---
|
|
|
|
|
|
# [ML-FIGS-LDM](https://github.com/salamnocap/ml-figs-ldm) |
|
|
## EDUCATIONAL FIGURE GENERATION USING TEXT PERCEPTUAL LOSS |
|
|
|
|
|
ML-FIGS-LDM is a Latent Diffusion Model (LDM) for generating educational figures. The AutoencoderKL is trained using a Text Perceptual Loss to reconstruct more readable text within the figures. |
|
|
|
|
|
### Autoencoder Performance Comparison: |
|
|
**Note**: |
|
|
- `AutoencoderKL_SD`: Stable Diffusion v1-4 autoencoder trained on LAION. |
|
|
- `AutoencoderKL_TPL`: Autoencoders trained with Text Perceptual Loss (TPL). |
|
|
- [Model A](https://huggingface.co/salamnocap/ml-figs-ldm/blob/main/AutoencoderKL_MlFigs.ckpt) is trained on **ML-Figs**. |
|
|
- [Model B](https://huggingface.co/salamnocap/ml-figs-ldm/blob/main/AutoencoderKL_MlFigsSciCap.ckpt) is trained on **ML-Figs + SciCap**. |
|
|
|
|
|
| Dataset | Method | PSNR β | SSIM β | FID β | LPIPS β | MSE β | TPL β | |
|
|
|--------------------------|---------------------------|--------|--------|---------|---------|--------|--------| |
|
|
| **ML-Figs Test** | `AutoencoderKL_SD` | 33.01 | 0.970 | 20.51 | 0.022 | 0.003 | 0.043 | |
|
|
| | `AutoencoderKL_TPL` A | 30.71 | 0.954 | 16.13 | 0.056 | 0.002 | 0.017 | |
|
|
| **ML-Figs + SciCap Test**| `AutoencoderKL_SD` | 32.60 | 0.970 | 12.69 | 0.023 | 0.004 | 0.061 | |
|
|
| | `AutoencoderKL_TPL` A | 29.94 | 0.954 | 9.235 | 0.057 | 0.003 | 0.028 | |
|
|
| | `AutoencoderKL_TPL` B | 31.47 | 0.979 | 6.256 | 0.016 | 0.001 | 0.010 | |
|
|
|
|
|
### Latent Diffusion Model: |
|
|
[MlFigs_LDM_12.ckpt](https://huggingface.co/salamnocap/ml-figs-ldm/blob/main/MlFigs_LDM_12.ckpt) |