channudam commited on
Commit
f954cd8
·
verified ·
1 Parent(s): d444cfa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -7
README.md CHANGED
@@ -4,14 +4,40 @@ language:
4
  - km
5
  pipeline_tag: text-to-image
6
  ---
 
7
 
8
- ## Model Details
9
 
10
- ### Model Description
 
 
 
11
 
12
- This project focuses on generating images from Khmer text. Inspired by the Stable Diffusion architecture, we enhance a base model called channudam/unet2dcon-khm-35 by integrating key components from Stable Diffusion on top of it. This approach helps improve the quality and control of the generated images and also more flexibility on downstream tasks.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
- - **Developed by:** Mr. Channudam Ray
15
- - **Funded by:** Factory.io
16
- - **Model type:** StableDiffusion
17
- - **Language:** Khmer Central
 
4
  - km
5
  pipeline_tag: text-to-image
6
  ---
7
+ ## Model Description
8
 
9
+ This project explores Khmer text-to-image generation, inspired by the architecture of Stable Diffusion. It builds upon the base model [`channudam/unet2dcon-khm-35`](https://huggingface.co/channudam/unet2dcon-khm-35) by integrating key components from the Stable Diffusion framework. This setup enhances image quality, provides better control, and offers more flexibility for downstream tasks.
10
 
11
+ - **Developed by:** Mr. Channudam Ray
12
+ - **Funded by:** Factory.io
13
+ - **Model Type:** Stable Diffusion-based
14
+ - **Language:** Khmer (Central dialect)
15
 
16
+ ## Fine-Tuning
17
+
18
+ This is a base model and is intended to be fine-tuned for specific tasks or datasets. The model was trained on images with a resolution of **128×64**, but this can be adjusted during fine-tuning to match your desired output size.
19
+
20
+ For best results, it is recommended to fine-tune the following three main components rather than just the core UNet model:
21
+
22
+ - **Text Encoder** – [`RobertaModel`]
23
+ - **Variational Autoencoder** – [`AutoencoderKL`]
24
+ - **Image Generation Model** – [`UNet2DConditionModel`]
25
+
26
+ ## Usage (with GPU)
27
+
28
+ ```python
29
+ from diffusers import StableDiffusionPipeline
30
+ from khmernltk import word_tokenize
31
+ import matplotlib.pyplot as plt
32
+ from PIL import Image
33
+ import torch
34
+
35
+ pipe = StableDiffusionPipeline.from_pretrained(
36
+ "stable_diffusion_v1",
37
+ torch_dtype=torch.float16,
38
+ ).to("cuda")
39
+
40
+ images = pipe("បាត់ដំបង", guidance_scale=2).images[0]
41
+ plt.imshow(images)
42
+ plt.show()
43