shreenithi20
/

fmnist-t2i-diffusion

fmnist_t2i_diffusion

Model card Files Files and versions

xet

Community

shreenithi20 commited on Aug 19, 2025

Commit

b3bf2eb

verified ·

1 Parent(s): c93e607

Update README.md

Browse files

Files changed (1) hide show

README.md +64 -60

README.md CHANGED Viewed

@@ -1,61 +1,65 @@
-# Fashion MNIST Text-to-Image Diffusion Model
-A transformer-based diffusion model trained on Fashion MNIST latent representations for text-to-image generation.
-## Model Information
-- **Architecture**: Transformer-based diffusion model
-- **Input**: 8×8×4 VAE latents
-- **Conditioning**: Text embeddings (class labels)
-- **Training Steps**: 8,500
-- **Dataset**: [Fashion MNIST 8×8 Latents](https://huggingface.co/datasets/shreenithi20/fmnist-8x8-latents)
-- **Framework**: PyTorch
-## Checkpoints
-- `model-1000.safetensors`: Early training (1k steps)
-- `model-3000.safetensors`: Mid training (3k steps)
-- `model-5000.safetensors`: Advanced training (5k steps)
-- `model-8500.safetensors`: Final model (8.5k steps)
-## Usage
-```python
-from transformers import AutoConfig, AutoModel
-import torch
-# Load model
-model = AutoModel.from_pretrained("shreenithi20/fmnist-t2i-diffusion")
-model.eval()
-# Generate images
-with torch.no_grad():
-    generated_latents = model.generate(
-        text_embeddings=class_labels,
-        num_inference_steps=25,
-        guidance_scale=7.5
-    )
-```
-## Model Architecture
-- **Patch Size**: 1×1
-- **Embedding Dimension**: 384
-- **Transformer Layers**: 12
-- **Attention Heads**: 6
-- **Cross Attention Heads**: 4
-- **MLP Multiplier**: 4
-- **Timesteps**: Continuous (beta distribution)
-- **Beta Distribution**: a=1.0, b=2.5
-## Training Details
-- **Learning Rate**: 1e-3 (Constant)
-- **Batch Size**: 128
-- **Optimizer**: AdamW
-- **Mixed Precision**: Yes
-- **Gradient Accumulation**: 1
-## Results
 The model generates high-quality Fashion MNIST images conditioned on class labels, with 8×8 latent resolution that can be decoded to 64×64 pixel images.

+---
+datasets:
+- shreenithi20/fmnist-8x8-latents
+---
+# Fashion MNIST Text-to-Image Diffusion Model
+A transformer-based diffusion model trained on Fashion MNIST latent representations for text-to-image generation.
+## Model Information
+- **Architecture**: Transformer-based diffusion model
+- **Input**: 8×8×4 VAE latents
+- **Conditioning**: Text embeddings (class labels)
+- **Training Steps**: 8,500
+- **Dataset**: [Fashion MNIST 8×8 Latents](https://huggingface.co/datasets/shreenithi20/fmnist-8x8-latents)
+- **Framework**: PyTorch
+## Checkpoints
+- `model-1000.safetensors`: Early training (1k steps)
+- `model-3000.safetensors`: Mid training (3k steps)
+- `model-5000.safetensors`: Advanced training (5k steps)
+- `model-8500.safetensors`: Final model (8.5k steps)
+## Usage
+```python
+from transformers import AutoConfig, AutoModel
+import torch
+# Load model
+model = AutoModel.from_pretrained("shreenithi20/fmnist-t2i-diffusion")
+model.eval()
+# Generate images
+with torch.no_grad():
+    generated_latents = model.generate(
+        text_embeddings=class_labels,
+        num_inference_steps=25,
+        guidance_scale=7.5
+    )
+```
+## Model Architecture
+- **Patch Size**: 1×1
+- **Embedding Dimension**: 384
+- **Transformer Layers**: 12
+- **Attention Heads**: 6
+- **Cross Attention Heads**: 4
+- **MLP Multiplier**: 4
+- **Timesteps**: Continuous (beta distribution)
+- **Beta Distribution**: a=1.0, b=2.5
+## Training Details
+- **Learning Rate**: 1e-3 (Constant)
+- **Batch Size**: 128
+- **Optimizer**: AdamW
+- **Mixed Precision**: Yes
+- **Gradient Accumulation**: 1
+## Results
 The model generates high-quality Fashion MNIST images conditioned on class labels, with 8×8 latent resolution that can be decoded to 64×64 pixel images.