--- license: apache-2.0 language: - en pipeline_tag: text-to-image library_name: diffusers tags: - diffusion - text-to-image - photoroom - prx - open-source - image-generation - flow-matching base_model: - Photoroom/prx-1024-t2i-beta inference: true widget: - text: "A woman wearing a red baseball cap, an eye-catching top hat, and has a striking pink bust. She is playfully posing with one finger under her chin, suggesting a casual and relaxed atmosphere. She appears to be sitting or standing in a dimly lit environment" output: url: >- ./content (1).png --- ![content (1)](https://cdn-uploads.huggingface.co/production/uploads/632fa63d2636f057d5896af3/vErUVwioThtqU-gnJ06bo.jpeg) **Kreamy** is a lightweight text-to-image model derived from Photoroom/prx-1024-t2i-beta It is designed to be compact and fast, prioritizing practical usability and efficient inference. While prompt responses may occasionally show some inaccuracies or deviations, Kreamy has only 1.3 billion parameters, making it suitable for users who need a smaller and faster model compared to larger diffusion models. Kreamy was trained using a limited amount of curated data. Although the dataset size is relatively small, the model meets the author’s practical needs and provides usable results for real-world generation scenarios. The model supports direct anime-style image generation (SFW/NSFW) using concise and relatively short prompts, making it convenient for users who prefer simple prompt workflows. Feedback and encouragement are highly appreciated and will help guide further improvements and future development of this model. ### Model description ### **Kreamy** is designed to be: - Lightweight and fast for inference - Practical for everyday usage - Easy to experiment with and extend - Optimized for short and concise prompts - Capable of anime-style image generation (SFW/NSFW) - It inherits core characteristics from the PRX family while being adapted and fine-tuned for the author’s specific use cases and preferences. ## Model details ## This checkpoint is based on Photoroom/prx-1024-t2i-beta with additional fine-tuning: Base model: Photoroom/prx-1024-t2i-beta Resolution: W=896,H=1152 pixels Architecture: PRX (MMDiT-like diffusion transformer variant) Latent backbone: Flux VAE Text encoder: T5-Gemma-2B-2B-UL2 Training stage: Fine-tuning on a limited custom dataset Parameters: ~1.3B License: Inherited from base model (Apache 2.0) ```python from diffusers import PRXPipeline import torch pipe = PRXPipeline.from_pretrained( "kpsss34/Kreamy", torch_dtype=torch.bfloat16 ).to("cuda") prompt = "A woman wearing a red baseball cap, an eye-catching top hat, and has a striking pink bust. She is playfully posing with one finger under her chin, suggesting a casual and relaxed atmosphere. She appears to be sitting or standing in a dimly lit environment" negative_prompt = "blurry, low quality, low resolution, out of focus, distorted face, extra limbs, extra fingers, missing fingers, mutated hands, deformed, disfigured, asymmetrical face, bad anatomy, unnatural body, unnatural pose, duplicate person, extra arms, extra legs, broken limbs, incorrect proportions, bad perspective, warped body, glitch, grainy, noisy, pixelated, water mark" image = pipe(prompt, negative_prompt=negative_prompt, num_inference_steps=30, width=896, height=1152, guidance_scale=7.0).images[0] image.save("sample.png") ```