houcine-bdk commited on
Commit
e0cd09a
·
verified ·
1 Parent(s): 00b9b77

Update model card with proper metadata

Browse files
Files changed (1) hide show
  1. README.md +20 -3
README.md CHANGED
@@ -1,14 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # NanoGPT Personal Experiment
2
 
3
- This repository contains my personal experiment with training and fine-tuning a GPT-2 style language model. This project was undertaken as a learning exercise to understand transformer-based language models and explore the capabilities of modern AI architectures.
4
 
5
  ## Model Description
6
 
7
- The architecture follows the original GPT-2 design principles while being more accessible and easier to understand.
8
 
9
  ### Technical Details
10
 
11
  - Base Architecture: GPT-2
 
12
  - Training Infrastructure: 8x A100 80GB GPUs
13
  - Parameters: ~124M (similar to GPT-2 small)
14
 
@@ -16,7 +32,8 @@ The architecture follows the original GPT-2 design principles while being more a
16
 
17
  The model underwent a multi-stage training process:
18
  1. Initial training on a subset of the OpenWebText dataset
19
- 2. Experimentation with different hyperparameters and optimization techniques
 
20
 
21
  ### Features
22
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - pytorch
5
+ - gpt2
6
+ - text-generation
7
+ - nanoGPT
8
+ license: mit
9
+ datasets:
10
+ - custom
11
+ model-index:
12
+ - name: chatMachineProto
13
+ results: []
14
+ ---
15
+
16
  # NanoGPT Personal Experiment
17
 
18
+ This repository contains my personal experiment with training and fine-tuning a GPT-2 style language model using the nanoGPT architecture. This project was undertaken as a learning exercise to understand transformer-based language models and explore the capabilities of modern AI architectures.
19
 
20
  ## Model Description
21
 
22
+ This model is based on the nanoGPT implementation, which is a minimal, clean implementation of GPT-2 style models. The architecture follows the original GPT-2 design principles while being more accessible and easier to understand.
23
 
24
  ### Technical Details
25
 
26
  - Base Architecture: GPT-2
27
+ - Implementation: nanoGPT
28
  - Training Infrastructure: 8x A100 80GB GPUs
29
  - Parameters: ~124M (similar to GPT-2 small)
30
 
 
32
 
33
  The model underwent a multi-stage training process:
34
  1. Initial training on a subset of the OpenWebText dataset
35
+ 2. Fine-tuning experiments on various datasets including Shakespeare's works
36
+ 3. Experimentation with different hyperparameters and optimization techniques
37
 
38
  ### Features
39