GLM-IMAGE-PRO

Paused

App Files Files Community

fantos commited on Jan 19

Commit

ac28453

verified ·

1 Parent(s): f0ba3c7

Update README.md

Browse files

Files changed (1) hide show

README.md +19 -135

README.md CHANGED Viewed

@@ -1,153 +1,37 @@
 ---
-title: FLUX Fast & Furious
-emoji: 🖼🏆
 colorFrom: purple
 colorTo: red
 sdk: gradio
-sdk_version: 5.35.0
 app_file: app.py
 pinned: false
 license: openrail++
 short_description: 'FLUX 8 Step Fast & High Quality Mode'
 ---
-I'll create comprehensive documentation for this FLUX Fast & Furious image generation code in both English and Korean.
-## English Documentation
-### FLUX: Fast & Furious - Hyper-Speed Image Generation
-This application implements an accelerated version of the FLUX.1-dev image generation model, optimized by ByteDance's AutoML team using their Hyper-SD technology to achieve high-quality image generation in just 8 steps instead of the typical 20-50 steps.
-#### Key Features
-1. **Hyper-Speed Generation**
-   - Utilizes Hyper-SD LoRA (Low-Rank Adaptation) technology from ByteDance
-   - Reduces inference steps from 20-50 to just 6-25 steps (default: 8)
-   - Maintains high image quality while dramatically reducing generation time
-   - Optimized for CUDA with TF32 precision enabled for maximum performance
-2. **Neon-Themed User Interface**
-   - Custom cyberpunk-inspired design with glowing neon effects
-   - Animated hover effects and dynamic visual feedback
-   - Dark theme with blue, cyan, and magenta color accents
-   - Responsive layout optimized for both desktop and mobile devices
-3. **User-Friendly Features**
-   - **Example Prompts**: Five pre-written creative prompts covering various genres:
-     - Cyberpunk cityscapes
-     - Fantasy fairy scenes
-     - Epic dragon imagery
-     - Sci-fi space stations
-     - Underwater ancient cities
-   - **Click-to-Use Examples**: Simply click any example to instantly populate the prompt field
-   - **Advanced Settings**: Collapsible panel for fine-tuning generation parameters
-4. **Customizable Generation Parameters**
-   - **Image Dimensions**: Adjustable width and height (256-1152 pixels)
-   - **Inference Steps**: Control speed vs. quality trade-off (6-25 steps)
-   - **Guidance Scale**: Adjust prompt adherence (0.0-5.0)
-   - **Seed Control**: Reproducible results with manual seed input
-#### Technical Implementation
-The application leverages cutting-edge technologies:
-- **FLUX.1-dev**: State-of-the-art diffusion model from Black Forest Labs
-- **Hyper-SD LoRA**: ByteDance's acceleration technology achieving 5-10x speedup
-- **BFloat16 Precision**: Reduced memory usage while maintaining quality
-- **Gradio Spaces**: GPU-accelerated deployment with automatic resource management
-- **Custom CSS**: Neon-themed styling with glow effects and animations
-The generation pipeline:
-1. Loads the base FLUX.1-dev model in bfloat16 precision
-2. Applies Hyper-SD LoRA weights with 0.125 scaling factor
-3. Fuses LoRA weights for optimal performance
-4. Generates images using accelerated inference with custom parameters
-5. Outputs high-quality 1024x1024 images (default) in seconds
-#### Performance Optimization
-- **GPU Acceleration**: Automatic CUDA optimization with @spaces.GPU decorator
-- **Memory Efficiency**: BFloat16 precision reduces VRAM usage by 50%
-- **Inference Mode**: Torch inference mode and autocast for maximum speed
-- **TF32 Support**: Enabled for compatible GPUs for additional speedup
-- **Cached Models**: Local model caching to reduce loading times
-#### Use Cases
-Perfect for:
-- Rapid prototyping of visual concepts
-- Creative exploration with instant feedback
-- Production of high-quality images for various projects
-- Testing different artistic styles and compositions
-- Educational purposes to understand AI image generation
----
-## 한글 설명서
-### FLUX: Fast & Furious - 초고속 이미지 생성기
-이 애플리케이션은 ByteDance의 AutoML 팀이 개발한 Hyper-SD 기술을 활용하여 FLUX.1-dev 이미지 생성 모델을 가속화한 버전으로, 기존 20-50단계가 필요했던 과정을 단 8단계로 줄여 고품질 이미지를 생성합니다.
-#### 주요 기능
-1. **초고속 생성**
-   - ByteDance의 Hyper-SD LoRA(Low-Rank Adaptation) 기술 활용
-   - 추론 단계를 20-50단계에서 6-25단계로 대폭 축소 (기본값: 8단계)
-   - 생성 시간을 획기적으로 단축하면서도 높은 이미지 품질 유지
-   - 최대 성능을 위한 TF32 정밀도가 활성화된 CUDA 최적화
-2. **네온 테마 사용자 인터페이스**
-   - 발광 네온 효과가 적용된 사이버펑크 스타일의 맞춤형 디자인
-   - 애니메이션 호버 효과와 동적 시각 피드백
-   - 파란색, 청록색, 마젠타 색상 악센트가 있는 다크 테마
-   - 데스크톱과 모바일 기기 모두에 최적화된 반응형 레이아웃
-3. **사용자 친화적 기능**
-   - **예시 프롬프트**: 다양한 장르를 다루는 5개의 창의적인 프롬프트 제공:
-     - 사이버펑크 도시 풍경
-     - 판타지 요정 장면
-     - 웅장한 드래곤 이미지
-     - SF 우주 정거장
-     - 수중 고대 도시
-   - **클릭하여 사용**: 예시를 클릭하면 즉시 프롬프트 필드에 입력
-   - **고급 설정**: 생성 매개변수 미세 조정을 위한 접을 수 있는 패널
-4. **맞춤형 생성 매개변수**
-   - **이미지 크기**: 조정 가능한 너비와 높이 (256-1152 픽셀)
-   - **추론 단계**: 속도 대 품질 균형 조절 (6-25단계)
-   - **가이던스 스케일**: 프롬프트 준수도 조정 (0.0-5.0)
-   - **시드 제어**: 수동 시드 입력으로 재현 가능한 결과
-#### 기술적 구현
-애플리케이션은 최첨단 기술을 활용합니다:
-- **FLUX.1-dev**: Black Forest Labs의 최신 확산 모델
-- **Hyper-SD LoRA**: 5-10배 속도 향상을 달성하는 ByteDance의 가속 기술
-- **BFloat16 정밀도**: 품질을 유지하면서 메모리 사용량 감소
-- **Gradio Spaces**: 자동 리소스 관리가 포함된 GPU 가속 배포
-- **커스텀 CSS**: 발광 효과와 애니메이션이 있는 네온 테마 스타일링
-생성 파이프라인:
-1. bfloat16 정밀도로 기본 FLUX.1-dev 모델 로드
-2. 0.125 스케일링 팩터로 Hyper-SD LoRA 가중치 적용
-3. 최적 성능을 위한 LoRA 가중치 융합
-4. 사용자 정의 매개변수로 가속화된 추론을 사용하여 이미지 생성
-5. 몇 초 만에 고품질 1024x1024 이미지(기본값) 출력
-#### 성능 최적화
-- **GPU 가속**: @spaces.GPU 데코레이터로 자동 CUDA 최적화
-- **메모리 효율성**: BFloat16 정밀도로 VRAM 사용량 50% 감소
-- **추론 모드**: 최대 속도를 위한 Torch 추론 모드와 자동 캐스트
-- **TF32 지원**: 호환 GPU에서 추가 속도 향상을 위해 활성화
-- **캐시된 모델**: 로딩 시간 단축을 위한 로컬 모델 캐싱
-#### 사용 사례
-다음과 같은 용도에 적합합니다:
-- 시각적 컨셉의 신속한 프로토타이핑
-- 즉각적인 피드백으로 창의적 탐색
-- 다양한 프로젝트를 위한 고품질 이미지 제작
-- 다양한 예술적 스타일과 구성 테스트
-- AI 이미지 생성 이해를 위한 교육 목적

 ---
+title: GLM Image
+emoji: 🏆
 colorFrom: purple
 colorTo: red
 sdk: gradio
+sdk_version: 6.3.0
 app_file: app.py
 pinned: false
 license: openrail++
 short_description: 'FLUX 8 Step Fast & High Quality Mode'
 ---
+Introduction
+GLM-Image is an image generation model adopts a hybrid autoregressive + diffusion decoder architecture. In general image generation quality, GLM‑Image aligns with mainstream latent diffusion approaches, but it shows significant advantages in text-rendering and knowledge‑intensive generation scenarios. It performs especially well in tasks requiring precise semantic understanding and complex information expression, while maintaining strong capabilities in high‑fidelity and fine‑grained detail generation. In addition to text‑to‑image generation, GLM‑Image also supports a rich set of image‑to‑image tasks including image editing, style transfer, identity‑preserving generation, and multi‑subject consistency.
+Model architecture: a hybrid autoregressive + diffusion decoder design.
+Autoregressive generator: a 9B-parameter model initialized from GLM-4-9B-0414, with an expanded vocabulary to incorporate visual tokens. The model first generates a compact encoding of approximately 256 tokens, then expands to 1K–4K tokens, corresponding to 1K–2K high-resolution image outputs.
+Diffusion Decoder: a 7B-parameter decoder based on a single-stream DiT architecture for latent-space image decoding. It is equipped with a Glyph Encoder text module, significantly improving accurate text rendering within images.
+architecture_2
+Post-training with decoupled reinforcement learning: the model introduces a fine-grained, modular feedback strategy using the GRPO algorithm, substantially enhancing both semantic understanding and visual detail quality.
+Autoregressive module: provides low-frequency feedback signals focused on aesthetics and semantic alignment, improving instruction following and artistic expressiveness.
+Decoder module: delivers high-frequency feedback targeting detail fidelity and text accuracy, resulting in highly realistic textures as well as more precise text rendering.
+GLM-Image supports both text-to-image and image-to-image generation within a single model.
+Text-to-image: generates high-detail images from textual descriptions, with particularly strong performance in information-dense scenarios.
+Image-to-image: supports a wide range of tasks, including image editing, style transfer, multi-subject consistency, and identity-preserving generation for people and objects.
+License
+The overall GLM-Image model is released under the MIT License.
+This project incorporates the VQ tokenizer weights and VIT weights from X-Omni/X-Omni-En, which are licensed under the Apache License, Version 2.0.
+The VQ tokenizer and VIT weights remains subject to the original Apache-2.0 terms. Users should comply with the respective licenses when using this component.