salamnocap
/

ml-figs-ldm

Model card Files Files and versions

ml-figs-ldm / README.md

salamnocap's picture

Update README.md

c1813ca verified 9 months ago

|

history blame contribute delete

1.83 kB

	---
	license: cc-by-nc-4.0
	datasets:
	- salamnocap/ml-figs
	language:
	- en
	pipeline_tag: text-to-image
	---

	# [ML-FIGS-LDM](https://github.com/salamnocap/ml-figs-ldm)
	## EDUCATIONAL FIGURE GENERATION USING TEXT PERCEPTUAL LOSS

	ML-FIGS-LDM is a Latent Diffusion Model (LDM) for generating educational figures. The AutoencoderKL is trained using a Text Perceptual Loss to reconstruct more readable text within the figures.

	### Autoencoder Performance Comparison:
	Note:
	- `AutoencoderKL_SD`: Stable Diffusion v1-4 autoencoder trained on LAION.
	- `AutoencoderKL_TPL`: Autoencoders trained with Text Perceptual Loss (TPL).
	- [Model A](https://huggingface.co/salamnocap/ml-figs-ldm/blob/main/AutoencoderKL_MlFigs.ckpt) is trained on ML-Figs.
	- [Model B](https://huggingface.co/salamnocap/ml-figs-ldm/blob/main/AutoencoderKL_MlFigsSciCap.ckpt) is trained on ML-Figs + SciCap.

	\| Dataset \| Method \| PSNR ↑ \| SSIM ↑ \| FID ↓ \| LPIPS ↓ \| MSE ↓ \| TPL ↓ \|
	\|--------------------------\|---------------------------\|--------\|--------\|---------\|---------\|--------\|--------\|
	\| ML-Figs Test \| `AutoencoderKL_SD` \| 33.01 \| 0.970 \| 20.51 \| 0.022 \| 0.003 \| 0.043 \|
	\| \| `AutoencoderKL_TPL` A \| 30.71 \| 0.954 \| 16.13 \| 0.056 \| 0.002 \| 0.017 \|
	\| ML-Figs + SciCap Test\| `AutoencoderKL_SD` \| 32.60 \| 0.970 \| 12.69 \| 0.023 \| 0.004 \| 0.061 \|
	\| \| `AutoencoderKL_TPL` A \| 29.94 \| 0.954 \| 9.235 \| 0.057 \| 0.003 \| 0.028 \|
	\| \| `AutoencoderKL_TPL` B \| 31.47 \| 0.979 \| 6.256 \| 0.016 \| 0.001 \| 0.010 \|

	### Latent Diffusion Model:
	[MlFigs_LDM_12.ckpt](https://huggingface.co/salamnocap/ml-figs-ldm/blob/main/MlFigs_LDM_12.ckpt)