Text Generation
Transformers
Safetensors
English
gpt2
causal-lm
chat
conversational
text-generation-inference
Fu01978 commited on
Commit
6f21141
Β·
verified Β·
1 Parent(s): ad3e013

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +121 -0
README.md ADDED
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ tags:
6
+ - text-generation
7
+ - causal-lm
8
+ - gpt2
9
+ - chat
10
+ - conversational
11
+ pipeline_tag: text-generation
12
+ ---
13
+
14
+ # FuadeAI-50M
15
+
16
+ A 50 million parameter causal language model trained for conversational chat, built on a GPT-2 architecture with a custom tokenizer.
17
+
18
+ ## Model Details
19
+
20
+ | Property | Value |
21
+ |---|---|
22
+ | Parameters | ~50M |
23
+ | Architecture | GPT-2 (custom config) |
24
+ | Hidden size | 512 |
25
+ | Layers | 8 |
26
+ | Attention heads | 8 |
27
+ | Context length | 1024 tokens |
28
+ | Tokenizer | GPT-2 + custom special tokens |
29
+ | Training precision | FP16 |
30
+
31
+ ## Special Tokens
32
+
33
+ | Token | Purpose |
34
+ |---|---|
35
+ | `<\\|startoftext\\|>` | Beginning of conversation |
36
+ | `<user>` / `</user>` | Wraps user message |
37
+ | `<assistant>` / `</assistant>` | Wraps assistant response |
38
+ | `<\\|endoftext\\|>` | End of conversation |
39
+
40
+ ## Training Data
41
+
42
+ - [LucidexAi/VIBE-2K](https://huggingface.co/datasets/LucidexAi/VIBE-2K) β€” conversational prompts and responses
43
+ - [HuggingFaceTB/instruct-data-basics-smollm-H4](https://huggingface.co/datasets/HuggingFaceTB/instruct-data-basics-smollm-H4) β€” instruction following
44
+ - [MuskumPillerum/General-Knowledge](https://huggingface.co/datasets/MuskumPillerum/General-Knowledge) β€” general knowledge QA
45
+ - Custom synthetic dataset for identity and conversational grounding
46
+
47
+ ## How To Use
48
+
49
+ ### Installation
50
+ ```bash
51
+ pip install transformers torch
52
+ ```
53
+
54
+ ### Basic Inference
55
+ ```python
56
+ from transformers import GPT2Tokenizer, GPT2LMHeadModel
57
+ import torch
58
+
59
+ # Load model and tokenizer
60
+ tokenizer = GPT2Tokenizer.from_pretrained("your-username/FuadeAI-50M")
61
+ model = GPT2LMHeadModel.from_pretrained("your-username/FuadeAI-50M")
62
+ model.eval()
63
+
64
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
65
+ model = model.to(device)
66
+
67
+ # Chat function
68
+ def chat(prompt, temperature=0.7, top_p=0.9, max_new_tokens=100):
69
+ formatted = (
70
+ f"{tokenizer.bos_token}"
71
+ f"<user>{prompt}</user>"
72
+ f"<assistant>"
73
+ )
74
+ inputs = tokenizer(formatted, return_tensors="pt").to(device)
75
+
76
+ with torch.no_grad():
77
+ output = model.generate(
78
+ **inputs,
79
+ max_new_tokens=max_new_tokens,
80
+ do_sample=True,
81
+ temperature=temperature,
82
+ top_p=top_p,
83
+ repetition_penalty=1.3,
84
+ no_repeat_ngram_size=3,
85
+ eos_token_id=tokenizer.eos_token_id,
86
+ pad_token_id=tokenizer.pad_token_id,
87
+ )
88
+
89
+ generated = output[0][inputs["input_ids"].shape[-1]:]
90
+ return tokenizer.decode(generated, skip_special_tokens=True).strip()
91
+
92
+ # Example usage
93
+ print(chat("Hello!"))
94
+ print(chat("What is photosynthesis?"))
95
+ print(chat("Who are you?"))
96
+ ```
97
+
98
+ ### Generation Tips
99
+
100
+ - `temperature=0.7` β€” balanced creativity and coherence (recommended)
101
+ - `temperature=0.3` β€” more focused and deterministic answers
102
+ - `temperature=1.0` β€” more creative but less reliable
103
+ - `repetition_penalty=1.3` β€” keeps responses from looping (recommended, do not remove)
104
+ - `max_new_tokens=200` β€” increase for longer responses
105
+
106
+ ## Limitations
107
+
108
+ - **50M parameters is small** β€” factual recall is imperfect and some answers may be incorrect. Always verify factual claims from this model.
109
+ - **Trained on ~10k samples** β€” coverage of topics is limited compared to large-scale models.
110
+ - **Not suitable for** β€” factual research, medical/legal/financial advice, or any high-stakes decision making.
111
+ - **Context window** β€” limited to 1024 tokens total (prompt + response).
112
+
113
+ ## Intended Use
114
+
115
+ - Learning and experimentation with small language models
116
+ - Lightweight conversational agent for low-stakes applications
117
+ - Fine-tuning base for domain-specific chat applications
118
+
119
+ ## License
120
+
121
+ MIT β€” free to use, modify, and distribute with attribution.