โ๏ธ BwETAF-IID-400M โ Model Card
Boringโs Experimental Transformer for Autoregression (Flax)
A 378M parameter autoregressive transformer built with a custom training pipeline and questionable life choices.
Trained on determination, fueled by suffering, powered by free TPUs. ๐ฅ
๐ Model Overview
- Name: BwETAF-IID-400M
- Parameters: 378,769,408
- Tokens Seen: 6,200,754,176
- Training Time: 63,883.53 sec
- Framework: Flax + JAX
- Context Window: 512 tokens
- Tokenizer: GPT-2 BPE (50,257)
- Positional Encoding: Sin/Cos
- Activation Function: SwiGLU
- Final Validation Loss: ~3.4
๐ Training & Validation Loss
Training Loss

Validation Loss

๐ค Why BwETAF?
- โ๏ธ Built from scratch โ No hugging-face trainer shortcuts here.
- ๐ฌ Flexible architecture โ Swap blocks, change depths, scale it how you want.
- ๐งช Experimental core โ Try weird ideas without breaking a corporate repo.
- โก TPU-optimized โ Trained on free Google TPUs with custom memory-efficient formats.
- ๐ฆ Lightweight-ish โ You can actually run this model without a data center.
โก Quickstart
pip install BwETAF==0.4.2
import BwETAF
prompt = "The meaning of life is"
output = BwETAF.SetUpAPI(prompt, "WICKED4950/BwETAF-IID-400M")
print(output)
model = BwETAF.load_hf("WICKED4950/BwETAF-IID-400M")
BwETAF.load_model("path/to/model")
model.save_model("path/to/save")
params = model.trainable_variables
structure = model.model_struct
โ๏ธ Google collab notes not updated for now
๐ง Known Limitations
- Did not meet target benchmark (aimed for โค2.7, got ~3.4)
- No fine-tuning or task-specific optimization
- Early stopping due to saturation
- Works, but wonโt win any LLM trophies (yet)
๐ฉ Reach Out
Got questions, bugs, or chaos to share?
Ping me on Instagram: Here
I like weird LLM experiments and random ML convos ๐ฌ
๐ฎ Upcoming Experiments
- ๐ BwETAF-IID-1B: Scaling this mess further
- ๐งฌ Layer rewrite tests: Because the FFN deserves some drama
- ๐ Rotary + sparse attention tests
- ๐ง Trying norm variations for training stability