avneeshjadhav04 commited on
Commit
4032d74
ยท
verified ยท
1 Parent(s): 3d18b0d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -16,8 +16,11 @@ datasets:
16
  ---
17
  # LLM from Scratch (124M)
18
  A clean, **from-scratch implementation** of a 124M-parameter decoder-only Transformer in PyTorch. No `nn.Transformer`, no shortcuts โ€” every layer (attention, FFN, LayerNorm, embeddings) is built manually. Trained on **FineWeb-Edu** with mixed precision, gradient accumulation, and automatic checkpoint resume.
 
19
  ๐Ÿš€ **Live Demo**: [https://avneeshjadhav04--llm-api.modal.run](https://avneeshjadhav04--llm-api.modal.run)
 
20
  ๐Ÿ“‚ **Training Code**: [github.com/avneeshjadhav04/llm-from-scratch](https://github.com/avneeshjadhav04/llm-from-scratch)
 
21
  ## Model Description
22
  This is a **GPT-2-style causal language model** trained entirely from random initialization on the FineWeb-Edu dataset. It was built as an educational and portfolio project to demonstrate deep understanding of Transformer internals and large-scale training loops.
23
  | Property | Value |
@@ -78,10 +81,11 @@ If you use this model or code, please cite:
78
  journal = {GitHub repository},
79
  howpublished = {\url{https://github.com/avneeshjadhav04/llm-from-scratch}}
80
  }
81
- Acknowledgments
 
82
  - Inspired by Andrej Karpathy's nanoGPT (https://github.com/karpathy/nanoGPT)
83
  - Transformer architecture from Attention Is All You Need (https://arxiv.org/abs/1706.03762)
84
  - FineWeb-Edu dataset by HuggingFace (https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu)
85
- License
86
  - Code: MIT License
87
  - Dataset: FineWeb-Edu is released under ODC-By v1.0 (https://opendatacommons.org/licenses/by/1-0/)
 
16
  ---
17
  # LLM from Scratch (124M)
18
  A clean, **from-scratch implementation** of a 124M-parameter decoder-only Transformer in PyTorch. No `nn.Transformer`, no shortcuts โ€” every layer (attention, FFN, LayerNorm, embeddings) is built manually. Trained on **FineWeb-Edu** with mixed precision, gradient accumulation, and automatic checkpoint resume.
19
+
20
  ๐Ÿš€ **Live Demo**: [https://avneeshjadhav04--llm-api.modal.run](https://avneeshjadhav04--llm-api.modal.run)
21
+
22
  ๐Ÿ“‚ **Training Code**: [github.com/avneeshjadhav04/llm-from-scratch](https://github.com/avneeshjadhav04/llm-from-scratch)
23
+
24
  ## Model Description
25
  This is a **GPT-2-style causal language model** trained entirely from random initialization on the FineWeb-Edu dataset. It was built as an educational and portfolio project to demonstrate deep understanding of Transformer internals and large-scale training loops.
26
  | Property | Value |
 
81
  journal = {GitHub repository},
82
  howpublished = {\url{https://github.com/avneeshjadhav04/llm-from-scratch}}
83
  }
84
+ ```
85
+ # Acknowledgments
86
  - Inspired by Andrej Karpathy's nanoGPT (https://github.com/karpathy/nanoGPT)
87
  - Transformer architecture from Attention Is All You Need (https://arxiv.org/abs/1706.03762)
88
  - FineWeb-Edu dataset by HuggingFace (https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu)
89
+ # License
90
  - Code: MIT License
91
  - Dataset: FineWeb-Edu is released under ODC-By v1.0 (https://opendatacommons.org/licenses/by/1-0/)