blowing-up-groundhogs
/

emuru_vae

Model card Files Files and versions

Vittorio Pippi commited on Feb 5, 2025

Commit

dd90e2c

·

1 Parent(s): 9b27178

Fix the YAML metadata

Files changed (1) hide show

README.md +1 -5

README.md CHANGED Viewed

@@ -1,6 +1,3 @@
-# Emuru Convolutional VAE
-```yaml
 ---
 language:
   - "en"
@@ -18,9 +15,8 @@ metrics:
   - CER
 library_name: diffusers
 ---
-```
-## Model Description
 This repository hosts the **Emuru Convolutional VAE**, described in our paper. The model features a convolutional encoder and decoder, each with four layers. The output channels for these layers are 32, 64, 128, and 256, respectively. The encoder downsamples an input RGB image \( I \in \mathbb{R}^{3 \times W \times H} \) to a latent representation with a single channel and spatial dimensions \( h \times w \) (where \( h = H/8 \) and \( w = W/8 \)). This design compresses the style information in the image, allowing a lightweight Transformer Decoder to efficiently process the latent features.

 ---
 language:
   - "en"
   - CER
 library_name: diffusers
 ---
+## Emuru Convolutional VAE
 This repository hosts the **Emuru Convolutional VAE**, described in our paper. The model features a convolutional encoder and decoder, each with four layers. The output channels for these layers are 32, 64, 128, and 256, respectively. The encoder downsamples an input RGB image \( I \in \mathbb{R}^{3 \times W \times H} \) to a latent representation with a single channel and spatial dimensions \( h \times w \) (where \( h = H/8 \) and \( w = W/8 \)). This design compresses the style information in the image, allowing a lightweight Transformer Decoder to efficiently process the latent features.