Diffusers
Safetensors
English
vae
convolutional
generative
fquattrini commited on
Commit
cb7d4bc
·
1 Parent(s): 5f51095

fixed normalization mistake in the README, added links

Browse files
README.md CHANGED
@@ -13,6 +13,7 @@ metrics:
13
  - MAE
14
  - KL
15
  - CER
 
16
  library_name: diffusers
17
  ---
18
 
@@ -22,7 +23,8 @@ library_name: diffusers
22
 
23
  ![Image 2](samples/lam_sample_reconstructed.png)
24
 
25
- This repository hosts the **Emuru Convolutional VAE**, described in our paper. The model features a convolutional encoder and decoder, each with four layers. The output channels for these layers are 32, 64, 128, and 256, respectively. The encoder downsamples an input RGB image (with three channels and dimensions width and height) to a latent representation with a single channel and spatial dimensions that are one-eighth of the original height and width. This design compresses the style information in the image, allowing a lightweight Transformer Decoder to efficiently process the latent features.
 
26
 
27
  ### Training Details
28
 
 
13
  - MAE
14
  - KL
15
  - CER
16
+ - CE
17
  library_name: diffusers
18
  ---
19
 
 
23
 
24
  ![Image 2](samples/lam_sample_reconstructed.png)
25
 
26
+ This repository hosts the **Emuru Convolutional VAE**, described in our [paper](https://arxiv.org/pdf/2503.17074). The model features a convolutional encoder and decoder, each with four layers. The output channels for these layers are 32, 64, 128, and 256, respectively. The encoder downsamples an input RGB image (with three channels and dimensions width and height) to a latent representation with a single channel and spatial dimensions that are one-eighth of the original height and width. This design compresses the style information in the image, allowing a lightweight Transformer Decoder to efficiently process the latent features.
27
+ Training code is released on [GitHub](https://github.com/aimagelab/Emuru).
28
 
29
  ### Training Details
30
 
samples/lam_sample_reconstructed.png CHANGED