|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- allenai/c4 |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
Wigip-1: A 473M Parameter Language Model |
|
|
This repository contains the code and documentation for Wigip-1, a ~500M parameter GPT-style language model built from scratch in JAX/Flax. |
|
|
|
|
|
Project Overview |
|
|
This project was an end-to-end journey into building and training a large language model on public resources. It involved: |
|
|
|
|
|
Architecture: A 24-layer, 1280-embedding dimension Transformer. |
|
|
Training: Trained on the C4 dataset for over 500,000 steps (~8 hours on a TPU v3-8). |
|
|
Frameworks: Built with JAX, Flax, and Optax. |
|
|
Deployment: A live demo was created using Gradio. |
|
|
See this please: |
|
|
https://github.com/skibidibrainrot20times/wigip-1-jax-model.git |
|
|
|
|
|
My Journey |
|
|
This project was a deep dive into the real-world challenges of MLOps, including debugging file corruption, solving JAX compiler errors (XlaRuntimeError), and managing long-running jobs in a cloud environment. It was built with the help of an AI assistant for debugging and guidance. |