Nottybro commited on
Commit
b9b3ba6
·
verified ·
1 Parent(s): 62f56c3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -1
README.md CHANGED
@@ -3,4 +3,19 @@ license: mit
3
  datasets:
4
  - allenai/c4
5
  pipeline_tag: text-generation
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  datasets:
4
  - allenai/c4
5
  pipeline_tag: text-generation
6
+ ---
7
+ Wigip-1: A 473M Parameter Language Model
8
+ This repository contains the code and documentation for Wigip-1, a ~500M parameter GPT-style language model built from scratch in JAX/Flax.
9
+
10
+ Project Overview
11
+ This project was an end-to-end journey into building and training a large language model on public resources. It involved:
12
+
13
+ Architecture: A 24-layer, 1280-embedding dimension Transformer.
14
+ Training: Trained on the C4 dataset for over 500,000 steps (~8 hours on a TPU v3-8).
15
+ Frameworks: Built with JAX, Flax, and Optax.
16
+ Deployment: A live demo was created using Gradio.
17
+ The trained model weights are hosted separately on the Hugging Face Hub, as they are too large for a standard Git repository:
18
+ https://huggingface.co/Nottybro/wigip-1
19
+
20
+ My Journey
21
+ This project was a deep dive into the real-world challenges of MLOps, including debugging file corruption, solving JAX compiler errors (XlaRuntimeError), and managing long-running jobs in a cloud environment. It was built with the help of an AI assistant for debugging and guidance.