Text-to-Image
Diffusers
Safetensors
English
ErnieImagePipeline
mistral
fp32
adamw
transformer
monte-carlo
dit
ernie
Instructions to use Felldude/ERNIE-Image with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Felldude/ERNIE-Image with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Felldude/ERNIE-Image", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
| license: apache-2.0 | |
| language: | |
| - en | |
| tags: | |
| - mistral | |
| - fp32 | |
| - adamw | |
| - transformer | |
| - monte-carlo | |
| - dit | |
| - ernie | |
| pipeline_tag: text-to-image | |
| # **Model Card** | |
| # **Overview** | |
| This repository documents two separate large language model training methodologies and precision strategies: | |
| --- | |
| # **Mistral LLM Training** | |
| - **Fully trained in native FP32 precision** | |
| - Optimization performed using standard **AdamW** | |
| - **No Adam8bit**, quantized optimizer states, or reduced-precision optimizer approximations were used during training | |
| - Intended to preserve **numerical stability** and **high-fidelity gradient accumulation** throughout all training phases | |
| --- | |
| # **DIT Ernie Model** | |
| - Uses a **Monte Carlo estimation** approach to approximate **FP32 behavior** | |
| --- | |
| # **Training Details** | |
| # **Mistral LLM** | |
| ## **Precision** | |
| - **Full FP32 training** | |
| - **FP32 activations** | |
| - **FP32 optimizer states** | |
| - **FP32 gradients** | |
| ## **Optimizer** | |
| - **AdamW** | |
| - Weight decay enabled | |
| - **No 8-bit optimizer compression** | |
| - **No low-rank optimizer approximation** | |
| ## **Notes** | |
| The Mistral configuration prioritizes: | |
| - **numerical consistency** | |
| - **deterministic convergence behavior** | |
| - **stable long-context optimization** | |
| - **reduced quantization-induced gradient noise** | |
| This setup is computationally expensive but provides **high-fidelity optimization dynamics** during pretraining and finetuning. | |
| --- | |
| # **DIT Ernie** | |
| ## **Precision Strategy** | |
| The DIT Ernie architecture utilizes: | |
| - **Monte Carlo estimation techniques** | |
| - **probabilistic FP32 approximation** | |
| - **stochastic numerical reconstruction** | |
| Rather than maintaining strict FP32 execution across the entire training stack, the model estimates FP32-equivalent statistical behavior through sampling-based computation. | |
| ## **Goals** | |
| - reduce memory bandwidth requirements | |
| - improve throughput efficiency | |
| - retain approximate FP32 convergence characteristics | |
| - balance numerical quality with hardware scalability | |
| ## **Notes** | |
| This methodology may introduce: | |
| - **stochastic variance between runs** | |
| - **approximation noise** | |
| - **non-deterministic optimization characteristics** | |
| However, it can significantly reduce training cost relative to native FP32 execution. | |
| --- | |
| # **Intended Use** | |
| This repository is intended for: | |
| - research documentation | |
| - training methodology comparison | |
| - optimizer precision analysis | |
| - numerical stability benchmarking | |
| - transformer architecture experimentation | |
| --- | |
| # **Limitations** | |
| Results can vary depending on: | |
| - sampling strategy | |
| - hardware backend | |
| - distributed training topology | |
| - random seed initialization | |
| --- | |
| # **License** | |
| **Apache License 2.0** |