channudam
/

stable-diffusion-khm-53

StableDiffusionPipeline

Model card Files Files and versions

channudam commited on Apr 30, 2025

Commit

f954cd8

·

verified ·

1 Parent(s): d444cfa

Update README.md

Files changed (1) hide show

README.md +33 -7

README.md CHANGED Viewed

@@ -4,14 +4,40 @@ language:
 - km
 pipeline_tag: text-to-image
 ---
-## Model Details
-### Model Description
-This project focuses on generating images from Khmer text. Inspired by the Stable Diffusion architecture, we enhance a base model called channudam/unet2dcon-khm-35 by integrating key components from Stable Diffusion on top of it. This approach helps improve the quality and control of the generated images and also more flexibility on downstream tasks.
-- **Developed by:** Mr. Channudam Ray
-- **Funded by:** Factory.io
-- **Model type:** StableDiffusion
-- **Language:** Khmer Central

 - km
 pipeline_tag: text-to-image
 ---
+## Model Description
+This project explores Khmer text-to-image generation, inspired by the architecture of Stable Diffusion. It builds upon the base model [`channudam/unet2dcon-khm-35`](https://huggingface.co/channudam/unet2dcon-khm-35) by integrating key components from the Stable Diffusion framework. This setup enhances image quality, provides better control, and offers more flexibility for downstream tasks.
+- **Developed by:** Mr. Channudam Ray
+- **Funded by:** Factory.io
+- **Model Type:** Stable Diffusion-based
+- **Language:** Khmer (Central dialect)
+## Fine-Tuning
+This is a base model and is intended to be fine-tuned for specific tasks or datasets. The model was trained on images with a resolution of **128×64**, but this can be adjusted during fine-tuning to match your desired output size.
+For best results, it is recommended to fine-tune the following three main components rather than just the core UNet model:
+- **Text Encoder** – [`RobertaModel`]
+- **Variational Autoencoder** – [`AutoencoderKL`]
+- **Image Generation Model** – [`UNet2DConditionModel`]
+## Usage (with GPU)
+```python
+from diffusers import StableDiffusionPipeline
+from khmernltk import word_tokenize
+import matplotlib.pyplot as plt
+from PIL import Image
+import torch
+pipe = StableDiffusionPipeline.from_pretrained(
+    "stable_diffusion_v1",
+    torch_dtype=torch.float16,
+).to("cuda")
+images = pipe("បាត់ដំបង", guidance_scale=2).images[0]
+plt.imshow(images)
+plt.show()