File size: 3,514 Bytes
9f82abd
a5cead7
 
9f82abd
a5cead7
 
9f82abd
a5cead7
9f82abd
 
a5cead7
9f82abd
a5cead7
 
 
 
9f82abd
 
 
 
 
 
a5cead7
9f82abd
 
 
a5cead7
9f82abd
a5cead7
 
 
 
9f82abd
 
 
a5cead7
9f82abd
 
a5cead7
 
9f82abd
 
a5cead7
9f82abd
a5cead7
 
9f82abd
 
a5cead7
 
9f82abd
 
a5cead7
9f82abd
a5cead7
 
 
9f82abd
 
a5cead7
9f82abd
 
 
 
 
 
 
 
a5cead7
 
 
 
 
9f82abd
a5cead7
 
 
9f82abd
 
 
 
a5cead7
 
 
9f82abd
 
 
 
a5cead7
9f82abd
a5cead7
 
 
9f82abd
a5cead7
9f82abd
 
a5cead7
 
 
9f82abd
 
a5cead7
9f82abd
a5cead7
 
 
9f82abd
 
 
 
a5cead7
9f82abd
a5cead7
9f82abd
a5cead7
9f82abd
a5cead7
9f82abd
 
a5cead7
 
 
9f82abd
 
a5cead7
9f82abd
a5cead7
9f82abd
a5cead7
9f82abd
a5cead7
 
 
9f82abd
 
 
 
 
 
 
 
 
a5cead7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145

---
library_name: transformers
tags: ["gpt2", "causal-lm", "fine-tuned", "chatbot"]
---

# Model Card for GPT2-Chat (Fine-tuned)

This is a fine-tuned version of **GPT-2** adapted for **chat-style generation**.  
It was trained on conversational data to make GPT-2 behave more like ChatGPT, giving more interactive, coherent, and context-aware responses.  

---

## Model Details

### Model Description
- **Developed by:** Faijan Khan  
- **Shared by:** [faizack](https://huggingface.co/faizack)  
- **Model type:** Causal Language Model (decoder-only transformer)  
- **Language(s):** English  
- **License:** MIT (or same as GPT-2)  
- **Finetuned from:** [gpt2](https://huggingface.co/gpt2)  

### Model Sources
- **Repository:** [https://huggingface.co/faizack/gpt2-chat-ft](https://huggingface.co/faizack/gpt2-chat-ft)  
- **Paper [GPT-2 original]:** [Language Models are Unsupervised Multitask Learners](https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)  

---

## Uses

### Direct Use
- Conversational AI experiments  
- Chatbot prototyping  
- Educational or research purposes  

### Downstream Use
- Further fine-tuning for domain-specific dialogue (e.g., customer support, tutoring, storytelling).  

### Out-of-Scope Use
- Not intended for production use without additional safety layers.  
- Not suitable for sensitive domains like medical, legal, or financial advice.  

---

## Bias, Risks, and Limitations
- May generate biased, offensive, or factually incorrect responses (inherited from GPT-2).  
- Not aligned with RLHF like ChatGPT, so safety guardrails are minimal.  

### Recommendations
- Use with human oversight.  
- Add filtering, moderation, or reinforcement learning with human feedback (RLHF) if deploying in production.  

---

## How to Get Started with the Model

```python
from transformers import pipeline

chatbot = pipeline("text-generation", model="faizack/gpt2-chat-ft")

prompt = "Hello, how are you?"
response = chatbot(prompt, max_new_tokens=100, do_sample=True, temperature=0.7)
print(response[0]["generated_text"])
````

---

## Training Details

### Training Data

* Fine-tuned on conversational datasets (prompt → response pairs).

### Training Procedure

* Base model: `gpt2`
* Objective: Causal LM (next token prediction).
* Mixed precision: fp16 training.
* Optimizer: AdamW.

#### Training Hyperparameters

* Learning rate: 5e-5
* Batch size: 4
* Epochs: 3
* Warmup steps: 500

---

## Evaluation

### Metrics

* **Perplexity (PPL)** for fluency.
* Manual qualitative evaluation for coherence.

### Results

* Lower perplexity on conversational prompts compared to base GPT-2.
* Produces more context-aware and fluent chat responses.

---

## Environmental Impact

* **Hardware Type:** NVIDIA A100 (40GB)
* **Training time:** \~2 hours
* **Cloud Provider:** Vast.ai (example)
* **Carbon Emitted:** Estimated <10 kg CO2eq

---

## Technical Specifications

### Model Architecture

* Transformer decoder-only (117M parameters).
* Context length: 1024 tokens.

### Compute Infrastructure

* **Hardware:** 1x NVIDIA A100
* **Software:** PyTorch, Hugging Face Transformers, Accelerate.

---

## Citation

If you use this model, please cite GPT-2 and this fine-tuned version:

**BibTeX:**

```bibtex
@misc{faizack2025gpt2chat,
  author = {Faijan Khan},
  title = {GPT2-Chat Fine-tuned Model},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/faizack/gpt2-chat-ft}}
}
```