itsme-nishanth commited on
Commit
d440d7b
Β·
1 Parent(s): e6c7f97

updated with stories train data

Browse files
README.md CHANGED
@@ -1,57 +1,55 @@
1
  ---
2
  library_name: transformers
3
- license: apache-2.0
 
 
 
 
 
 
4
  ---
5
- # 🧠 JAT-GPT: Just Another Tiny GPT
6
 
7
- Welcome to **JAT-GPT**, the world's most underwhelming large language model β€” clocking in at a mighty **74 million parameters** (yes, million, not billion β€” stop laughing).
 
8
 
9
- ## πŸ“¦ Model Details
10
 
11
- - **Model type**: GPT2-based decoder-only transformer
12
- - **Architecture**: GPT-2
13
- - **Library**: Hugging Face πŸ€— Transformers
14
- - **Parameters**: 74 million (size isn't everything... right?)
15
- - **Training Objective**: Learn to predict the next word β€” and sometimes even the *right* one!
16
- - **Pretrained on**: A secret* dataset (*"secret" means the dataset was just some text I could find lying around)
17
- - **Training Purpose**: Solely educational. Also for flexing on friends who haven’t trained a language model from scratch.
18
 
19
- ## πŸš€ Capabilities
20
 
21
- - Can generate small sentences
22
- - "Please lower your expectations."
23
- - Can hallucinate confidently, but in a very short and polite way.
24
- - Can generate random words after few tokens.
25
 
26
- ## πŸ™… Limitations
27
 
28
- - Not very smart.
29
- - Only Pretrained.
30
- - Understands context... if it fits within few tokens.
31
- - Cannot replace ChatGPT. (But look how cute it is!)
32
 
33
- ## 🀷 Why Train This?
34
 
35
- > "Because I could." – :-)
36
 
37
- - To understand the internals of language modeling.
38
- - To cry less when training real models later.
39
- - To appreciate just how powerful modern LLMs are by comparison.
40
 
41
- ## πŸ› οΈ Usage
42
 
43
- ```python
44
- # Load model directly
45
- from transformers import AutoTokenizer, AutoModelForCausalLM
 
 
 
 
 
 
 
46
 
47
- tokenizer = AutoTokenizer.from_pretrained("itsme-nishanth/JAT-GPT")
48
- model = AutoModelForCausalLM.from_pretrained("itsme-nishanth/JAT-GPT")
49
 
50
- input_ids = tokenizer.encode("Hi there,", return_tensors="pt")
51
- output = model.generate(input_ids, max_length=20, do_sample=True)
52
- print(tokenizer.decode(output[0]))
53
 
54
- # Use a pipeline as a high-level helper
55
- from transformers import pipeline
56
 
57
- pipe = pipeline("text-generation", model="itsme-nishanth/JAT-GPT")
 
 
 
 
 
 
1
  ---
2
  library_name: transformers
3
+ license: mit
4
+ base_model: gpt2
5
+ tags:
6
+ - generated_from_trainer
7
+ model-index:
8
+ - name: JAT-GPT2-trainer
9
+ results: []
10
  ---
 
11
 
12
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
+ should probably proofread and complete it, then remove this comment. -->
14
 
15
+ # JAT-GPT2-trainer
16
 
17
+ This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
 
 
 
 
 
 
18
 
19
+ ## Model description
20
 
21
+ More information needed
 
 
 
22
 
23
+ ## Intended uses & limitations
24
 
25
+ More information needed
 
 
 
26
 
27
+ ## Training and evaluation data
28
 
29
+ More information needed
30
 
31
+ ## Training procedure
 
 
32
 
33
+ ### Training hyperparameters
34
 
35
+ The following hyperparameters were used during training:
36
+ - learning_rate: 2e-05
37
+ - train_batch_size: 32
38
+ - eval_batch_size: 32
39
+ - seed: 42
40
+ - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
41
+ - lr_scheduler_type: linear
42
+ - lr_scheduler_warmup_steps: 500
43
+ - num_epochs: 10
44
+ - mixed_precision_training: Native AMP
45
 
46
+ ### Training results
 
47
 
 
 
 
48
 
 
 
49
 
50
+ ### Framework versions
51
+
52
+ - Transformers 4.53.2
53
+ - Pytorch 2.6.0+cu124
54
+ - Datasets 4.0.0
55
+ - Tokenizers 0.21.2
config.json CHANGED
@@ -1,5 +1,4 @@
1
  {
2
- "_name_or_path": "itsme-nishanth/JAT-GPT",
3
  "activation_function": "gelu_new",
4
  "architectures": [
5
  "GPT2LMHeadModel"
@@ -33,7 +32,7 @@
33
  }
34
  },
35
  "torch_dtype": "float32",
36
- "transformers_version": "4.47.1",
37
  "use_cache": true,
38
  "vocab_size": 50257
39
  }
 
1
  {
 
2
  "activation_function": "gelu_new",
3
  "architectures": [
4
  "GPT2LMHeadModel"
 
32
  }
33
  },
34
  "torch_dtype": "float32",
35
+ "transformers_version": "4.53.2",
36
  "use_cache": true,
37
  "vocab_size": 50257
38
  }
generation_config.json CHANGED
@@ -2,5 +2,5 @@
2
  "_from_model_config": true,
3
  "bos_token_id": 50256,
4
  "eos_token_id": 50256,
5
- "transformers_version": "4.47.1"
6
  }
 
2
  "_from_model_config": true,
3
  "bos_token_id": 50256,
4
  "eos_token_id": 50256,
5
+ "transformers_version": "4.53.2"
6
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6a299ba7ef71d2bf15a202ee86470aa5d88dc05117c5571fe9c56a207de1acb9
3
  size 71475528
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19fc462d738ec4f0753b036c27325368b17fd10d32e339cb41fdd1b7f6cec357
3
  size 71475528
optimizer.pth β†’ training_args.bin RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:337e38b1fca79700230e72b05b4e7e5df625a9ba37c8a3252400c9ecf1309844
3
- size 142980858
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4b260083aec1104bea74d678c6f37dc3f32586a5c95b740c77adc7d2de070456
3
+ size 5304