shreenithi20 commited on
Commit
c588c3e
·
verified ·
1 Parent(s): b8ad30c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +61 -0
README.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Fashion MNIST Text-to-Image Diffusion Model
2
+
3
+ A transformer-based diffusion model trained on Fashion MNIST latent representations for text-to-image generation.
4
+
5
+ ## Model Information
6
+
7
+ - **Architecture**: Transformer-based diffusion model
8
+ - **Input**: 8×8×4 VAE latents
9
+ - **Conditioning**: Text embeddings (class labels)
10
+ - **Training Steps**: 8,500
11
+ - **Dataset**: [Fashion MNIST 8×8 Latents](https://huggingface.co/datasets/shreenithi20/fmnist-8x8-latents)
12
+ - **Framework**: PyTorch
13
+
14
+ ## Checkpoints
15
+
16
+ - `model-1000.safetensors`: Early training (1k steps)
17
+ - `model-3000.safetensors`: Mid training (3k steps)
18
+ - `model-5000.safetensors`: Advanced training (5k steps)
19
+ - `model-8500.safetensors`: Final model (8.5k steps)
20
+
21
+ ## Usage
22
+
23
+ ```python
24
+ from transformers import AutoConfig, AutoModel
25
+ import torch
26
+
27
+ # Load model
28
+ model = AutoModel.from_pretrained("shreenithi20/fmnist-t2i-diffusion")
29
+ model.eval()
30
+
31
+ # Generate images
32
+ with torch.no_grad():
33
+ generated_latents = model.generate(
34
+ text_embeddings=class_labels,
35
+ num_inference_steps=25,
36
+ guidance_scale=7.5
37
+ )
38
+ ```
39
+
40
+ ## Model Architecture
41
+
42
+ - **Patch Size**: 1×1
43
+ - **Embedding Dimension**: 384
44
+ - **Transformer Layers**: 12
45
+ - **Attention Heads**: 6
46
+ - **Cross Attention Heads**: 4
47
+ - **MLP Multiplier**: 4
48
+ - **Timesteps**: Continuous (beta distribution)
49
+ - **Beta Distribution**: a=1.0, b=2.5
50
+
51
+ ## Training Details
52
+
53
+ - **Learning Rate**: 1e-3 (Constant)
54
+ - **Batch Size**: 128
55
+ - **Optimizer**: AdamW
56
+ - **Mixed Precision**: Yes
57
+ - **Gradient Accumulation**: 1
58
+
59
+ ## Results
60
+
61
+ The model generates high-quality Fashion MNIST images conditioned on class labels, with 8×8 latent resolution that can be decoded to 64×64 pixel images.