Text-to-Image
Diffusers
Safetensors
English
ErnieImagePipeline
mistral
fp32
adamw
transformer
monte-carlo
dit
ernie
Instructions to use Felldude/ERNIE-Image with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Felldude/ERNIE-Image with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Felldude/ERNIE-Image", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
File size: 2,662 Bytes
c86b8f3 589247e eec04dc 589247e eec04dc c500fe0 c86b8f3 9e4b863 589247e 9e4b863 589247e 9e4b863 589247e 9e4b863 589247e 9e4b863 589247e 9e4b863 589247e 9e4b863 589247e 9e4b863 589247e 9e4b863 589247e 9e4b863 589247e 9e4b863 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 | ---
license: apache-2.0
language:
- en
tags:
- mistral
- fp32
- adamw
- transformer
- monte-carlo
- dit
- ernie
pipeline_tag: text-to-image
---
# **Model Card**
# **Overview**
This repository documents two separate large language model training methodologies and precision strategies:
---
# **Mistral LLM Training**
- **Fully trained in native FP32 precision**
- Optimization performed using standard **AdamW**
- **No Adam8bit**, quantized optimizer states, or reduced-precision optimizer approximations were used during training
- Intended to preserve **numerical stability** and **high-fidelity gradient accumulation** throughout all training phases
---
# **DIT Ernie Model**
- Uses a **Monte Carlo estimation** approach to approximate **FP32 behavior**
---
# **Training Details**
# **Mistral LLM**
## **Precision**
- **Full FP32 training**
- **FP32 activations**
- **FP32 optimizer states**
- **FP32 gradients**
## **Optimizer**
- **AdamW**
- Weight decay enabled
- **No 8-bit optimizer compression**
- **No low-rank optimizer approximation**
## **Notes**
The Mistral configuration prioritizes:
- **numerical consistency**
- **deterministic convergence behavior**
- **stable long-context optimization**
- **reduced quantization-induced gradient noise**
This setup is computationally expensive but provides **high-fidelity optimization dynamics** during pretraining and finetuning.
---
# **DIT Ernie**
## **Precision Strategy**
The DIT Ernie architecture utilizes:
- **Monte Carlo estimation techniques**
- **probabilistic FP32 approximation**
- **stochastic numerical reconstruction**
Rather than maintaining strict FP32 execution across the entire training stack, the model estimates FP32-equivalent statistical behavior through sampling-based computation.
## **Goals**
- reduce memory bandwidth requirements
- improve throughput efficiency
- retain approximate FP32 convergence characteristics
- balance numerical quality with hardware scalability
## **Notes**
This methodology may introduce:
- **stochastic variance between runs**
- **approximation noise**
- **non-deterministic optimization characteristics**
However, it can significantly reduce training cost relative to native FP32 execution.
---
# **Intended Use**
This repository is intended for:
- research documentation
- training methodology comparison
- optimizer precision analysis
- numerical stability benchmarking
- transformer architecture experimentation
---
# **Limitations**
Results can vary depending on:
- sampling strategy
- hardware backend
- distributed training topology
- random seed initialization
---
# **License**
**Apache License 2.0** |