FarhanAK128 commited on
Commit
320c527
·
verified ·
1 Parent(s): f76e395

Update README.md

Browse files

![output](https://cdn-uploads.huggingface.co/production/uploads/65bc1af7ce846f8aa908a978/4MGZWJ10Y9CkrB-pzBDsD.png)

Files changed (1) hide show
  1. README.md +104 -3
README.md CHANGED
@@ -1,3 +1,104 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - text-generation
6
+ - gpt2
7
+ - transformers
8
+ - pytorch
9
+ - custom-architecture
10
+ - tiktoken
11
+ library_name: transformers
12
+ ---
13
+
14
+ # CustomGPT
15
+
16
+ ## Model Summary
17
+
18
+ **CustomGPT** is an LLM which is built, train, instruction-finetuned from scratch and evaluated using the LLM-as-a-judge method. This project shows my learning about developing a custom LLM architecture from scratch and its deployment for covnenient usage. It should be noted that this model is not to be used in production as it only for demo purpose which showcases my learning of LLM engineering. GPT pretrained weights have been used which are further fine-tuned on a small instruction dataset.
19
+
20
+ This model is fully compatible with the Hugging Face `transformers` ecosystem and can be loaded using `AutoModel.from_pretrained`.
21
+
22
+ ---
23
+
24
+ ## How to Get Started with the Model
25
+
26
+ ### Inference Example (Transformers + tiktoken)
27
+
28
+ ```python
29
+ from transformers import AutoModel
30
+ import tiktoken
31
+
32
+ # Load tokenizer
33
+ tokenizer = tiktoken.get_encoding("gpt2")
34
+
35
+ # Load model
36
+ model_id = "FarhanAK128/CustomGPT"
37
+ model = AutoModel.from_pretrained(
38
+ model_id,
39
+ trust_remote_code=True
40
+ )
41
+
42
+ # Example prediction
43
+ input = {'instruction': 'Rewrite the sentence using a simile.',
44
+ 'input': 'The car is very fast.'
45
+ }
46
+
47
+ response = model.generate_response(input, tokenizer)
48
+ print(response) # The car is as fast as a cheetah.
49
+ ```
50
+
51
+ **Note:** This model uses a custom `.generate()` method defined in the repository and requires `trust_remote_code=True` to function.
52
+
53
+ ---
54
+
55
+ ## Model Details
56
+
57
+ ### 📝 Model Description
58
+
59
+ - **Developed by:** Farhan Ali Khan
60
+ - **Model type:** GPT-2–based text generation model
61
+ - **Base architecture:** GPT-2 (OpenAI)
62
+ - **Framework:** PyTorch
63
+ - **Task:** Text Generation
64
+ - **Language:** English
65
+ - **License:** MIT
66
+
67
+ ---
68
+
69
+ ## Training Details
70
+ ### Training Data
71
+
72
+ The model was trained on a small instruction dataset having 1100 input-output pairs
73
+
74
+
75
+ ### Training Procedure
76
+
77
+ - **Base weights:** OpenAI GPT-2 (355 million parameters)
78
+ - **Fine-tuning strategy:** Full fine-tuning
79
+ - **Optimizer:** AdamW
80
+ - **Learning rate:** 0.00005
81
+ - **Weight decay:** 0.1
82
+ - **Epochs:** 3
83
+ - **Random seed:** 123
84
+ - **Loss function:** Cross-Entropy Loss
85
+ - **Training Strategy:** Mixprecision
86
+ - **Total training time:** ~4.82 minutes
87
+
88
+ ### 📈 Training Progress
89
+ #### Training and Validation Loss
90
+ ![Training and Validation Loss](https://cdn-uploads.huggingface.co/production/uploads/65bc1af7ce846f8aa908a978/4MGZWJ10Y9CkrB-pzBDsD.png)
91
+
92
+
93
+ ### 📊 Model Performance
94
+
95
+ The 355M custom LLM is evaluated using LLama-3.1-8b-instant as an automated judge. For each input, the model’s response is compared to the ground-truth output, and the judge assigns a score from 0 to 100 based on correctness. Scores are extracted as integers and aggregated to report the average performance across the test dataset which comes out to be 44%.
96
+
97
+ ---
98
+
99
+ ## Model Card Authors
100
+ **Farhan Ali Khan**
101
+
102
+ ## Model Card Contact
103
+ For questions or feedback, please reach out via my Hugging Face profile:
104
+ [FarhanAK128](https://huggingface.co/FarhanAK128)