allenporter commited on
Commit
3040360
·
verified ·
1 Parent(s): 33c0ee7

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +113 -0
README.md ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ library_name: pytorch
6
+ tags:
7
+ - nano-gpt
8
+ datasets:
9
+ - fineweb-edu
10
+ metrics:
11
+ - accuracy
12
+ ---
13
+ # Model Card for gpt2
14
+
15
+ <!-- Provide a quick summary of what the model is/does. -->
16
+
17
+ This model is a reproduction of gpt2 following Andrej Karpathy's GPT tutorial series.
18
+
19
+ The model was trained using the [nano-gpt[(https://github.com/allenporter/nano-gpt/) library
20
+ which follows the pattern from Karpathy's excellent content, with some additional
21
+ packaging and infrastructure work to make it more maintainable and reusable.
22
+
23
+ ## Model Details
24
+
25
+ ### Model Description
26
+
27
+ GPT-2 is a transformers model pretrained on a large corpus of english only text
28
+ with no labeling. This is the smallest version of GPT-2, with 124M parameters.
29
+
30
+ The model was trained using a sample of the FineWeb-EDU using 10B tokens. The
31
+ dataset contains educational web pages.
32
+
33
+ - **Developed by:** Allen Porter
34
+
35
+ ### Model Sources [optional]
36
+
37
+ <!-- Provide the basic links for the model. -->
38
+
39
+ - **Repository:** https://github.com/allenporter/nano-gpt/
40
+
41
+ ## How to Get Started with the Model
42
+
43
+ This model is stored in safetensors format and is in the same format used by
44
+ the gpt2 model released by OpenAI.
45
+
46
+ The easiest way to load this model is to use the `nano-gpt` command line
47
+ tool. You can install the package from pypi. Here is an example using
48
+ a virtual enviromnemt with `uv`:
49
+
50
+ ```bash
51
+ $ uv venv --python=3.13
52
+ $ source .venv/bin/activate
53
+ $ uv pip install nano-gpt
54
+ ```
55
+
56
+ You may then load this pretrained model:
57
+
58
+ ```bash
59
+ $ nano-gpt sample --pretrained=allenporter/gpt2
60
+ > Hello, I'm a language model, you're doing your application, I've put your main program and you want to model. Here are some things
61
+ > Hello, I'm a language model, so let's have a look at a few very old and popular dialects with some basic information about some of
62
+ > Hello, I'm a language model, but I also use a number of core vocabulary from the Python language and some data structures from
63
+ the web to
64
+ > Hello, I'm a language model, so this is about building a language to help my students to express themselves in all possible situations when they are in
65
+ > Hello, I'm a language model, who wrote my first 'hello' and never used it, but my first 'hello' can't be in
66
+ ```
67
+
68
+ ## Training Details
69
+
70
+ ### Training Data
71
+
72
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
73
+
74
+ This model was trained from https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu
75
+ from the 10B token sample.
76
+
77
+ ### Training Procedure
78
+
79
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
80
+
81
+ #### Preprocessing [optional]
82
+
83
+ The data was pre-tokenized using the `nano-gpt prepare_dataset` command
84
+ line tool.
85
+
86
+ #### Training Hyperparameters
87
+
88
+ - **Training regime:** See `train_config` in `config.json` for the hyper parameters.
89
+
90
+ #### Speeds, Sizes, Times
91
+
92
+ The model was trained using 8 x A100s. The model was run for one full epoch of
93
+ the 10B token dataset, which is 19072 steps. The model was trained for about 2
94
+ hours.
95
+
96
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
97
+
98
+ ## Evaluation
99
+
100
+ <!-- This section describes the evaluation protocols and provides the results. -->
101
+
102
+ The model was evaluated using hellaswag dataset. TBD results.
103
+
104
+ ## Environmental Impact
105
+
106
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
107
+
108
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
109
+
110
+ - **Hardware Type:** A100
111
+ - **Hours used:** 2 hours
112
+ - **Cloud Provider:** Lambda Labs
113
+ - **Compute Region:** Arizona