Update README.md
Browse files
README.md
CHANGED
|
@@ -9,6 +9,11 @@ tags:
|
|
| 9 |
- chat
|
| 10 |
- conversational
|
| 11 |
pipeline_tag: text-generation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
# FuadeAI-50M
|
|
@@ -19,7 +24,7 @@ A 50 million parameter causal language model trained for conversational chat, bu
|
|
| 19 |
|
| 20 |
| Property | Value |
|
| 21 |
|---|---|
|
| 22 |
-
| Parameters |
|
| 23 |
| Architecture | GPT-2 (custom config) |
|
| 24 |
| Hidden size | 512 |
|
| 25 |
| Layers | 8 |
|
|
@@ -39,33 +44,28 @@ A 50 million parameter causal language model trained for conversational chat, bu
|
|
| 39 |
|
| 40 |
## Training Data
|
| 41 |
|
| 42 |
-
- [LucidexAi/VIBE-2K](https://huggingface.co/datasets/LucidexAi/VIBE-2K)
|
| 43 |
-
- [HuggingFaceTB/instruct-data-basics-smollm-H4](https://huggingface.co/datasets/HuggingFaceTB/instruct-data-basics-smollm-H4)
|
| 44 |
-
- [MuskumPillerum/General-Knowledge](https://huggingface.co/datasets/MuskumPillerum/General-Knowledge)
|
| 45 |
- Custom synthetic dataset for identity and conversational grounding
|
| 46 |
|
| 47 |
## How To Use
|
| 48 |
|
| 49 |
-
###
|
| 50 |
-
```bash
|
| 51 |
-
pip install transformers torch
|
| 52 |
-
```
|
| 53 |
-
|
| 54 |
-
### Basic Inference
|
| 55 |
```python
|
| 56 |
from transformers import GPT2Tokenizer, GPT2LMHeadModel
|
| 57 |
import torch
|
| 58 |
|
| 59 |
# Load model and tokenizer
|
| 60 |
-
tokenizer = GPT2Tokenizer.from_pretrained("
|
| 61 |
-
model = GPT2LMHeadModel.from_pretrained("
|
| 62 |
model.eval()
|
| 63 |
|
| 64 |
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 65 |
model = model.to(device)
|
| 66 |
|
| 67 |
# Chat function
|
| 68 |
-
def chat(prompt, temperature=0.
|
| 69 |
formatted = (
|
| 70 |
f"{tokenizer.bos_token}"
|
| 71 |
f"<user>{prompt}</user>"
|
|
@@ -80,7 +80,7 @@ def chat(prompt, temperature=0.7, top_p=0.9, max_new_tokens=100):
|
|
| 80 |
do_sample=True,
|
| 81 |
temperature=temperature,
|
| 82 |
top_p=top_p,
|
| 83 |
-
repetition_penalty=1.
|
| 84 |
no_repeat_ngram_size=3,
|
| 85 |
eos_token_id=tokenizer.eos_token_id,
|
| 86 |
pad_token_id=tokenizer.pad_token_id,
|
|
@@ -91,31 +91,27 @@ def chat(prompt, temperature=0.7, top_p=0.9, max_new_tokens=100):
|
|
| 91 |
|
| 92 |
# Example usage
|
| 93 |
print(chat("Hello!"))
|
| 94 |
-
print(chat("
|
| 95 |
print(chat("Who are you?"))
|
| 96 |
```
|
| 97 |
|
| 98 |
### Generation Tips
|
| 99 |
|
| 100 |
-
- `temperature=0.
|
| 101 |
-
- `temperature=0.
|
| 102 |
-
- `temperature=
|
| 103 |
-
- `repetition_penalty=1.
|
| 104 |
-
- `max_new_tokens=
|
| 105 |
|
| 106 |
## Limitations
|
| 107 |
|
| 108 |
- **50M parameters is small** β factual recall is imperfect and some answers may be incorrect. Always verify factual claims from this model.
|
| 109 |
-
- **
|
| 110 |
-
- **Not suitable for**
|
| 111 |
- **Context window** β limited to 1024 tokens total (prompt + response).
|
| 112 |
|
| 113 |
## Intended Use
|
| 114 |
|
| 115 |
- Learning and experimentation with small language models
|
| 116 |
- Lightweight conversational agent for low-stakes applications
|
| 117 |
-
- Fine-tuning base for domain-specific chat applications
|
| 118 |
-
|
| 119 |
-
## License
|
| 120 |
-
|
| 121 |
-
MIT β free to use, modify, and distribute with attribution.
|
|
|
|
| 9 |
- chat
|
| 10 |
- conversational
|
| 11 |
pipeline_tag: text-generation
|
| 12 |
+
datasets:
|
| 13 |
+
- LucidexAi/VIBE-2K
|
| 14 |
+
- HuggingFaceTB/instruct-data-basics-smollm-H4
|
| 15 |
+
- MuskumPillerum/General-Knowledge
|
| 16 |
+
library_name: transformers
|
| 17 |
---
|
| 18 |
|
| 19 |
# FuadeAI-50M
|
|
|
|
| 24 |
|
| 25 |
| Property | Value |
|
| 26 |
|---|---|
|
| 27 |
+
| Parameters | 51M |
|
| 28 |
| Architecture | GPT-2 (custom config) |
|
| 29 |
| Hidden size | 512 |
|
| 30 |
| Layers | 8 |
|
|
|
|
| 44 |
|
| 45 |
## Training Data
|
| 46 |
|
| 47 |
+
- [LucidexAi/VIBE-2K](https://huggingface.co/datasets/LucidexAi/VIBE-2K)
|
| 48 |
+
- [HuggingFaceTB/instruct-data-basics-smollm-H4](https://huggingface.co/datasets/HuggingFaceTB/instruct-data-basics-smollm-H4)
|
| 49 |
+
- [MuskumPillerum/General-Knowledge](https://huggingface.co/datasets/MuskumPillerum/General-Knowledge)
|
| 50 |
- Custom synthetic dataset for identity and conversational grounding
|
| 51 |
|
| 52 |
## How To Use
|
| 53 |
|
| 54 |
+
### Transformers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
```python
|
| 56 |
from transformers import GPT2Tokenizer, GPT2LMHeadModel
|
| 57 |
import torch
|
| 58 |
|
| 59 |
# Load model and tokenizer
|
| 60 |
+
tokenizer = GPT2Tokenizer.from_pretrained("Fu01978/FuadeAI-50M")
|
| 61 |
+
model = GPT2LMHeadModel.from_pretrained("Fu01978/FuadeAI-50M")
|
| 62 |
model.eval()
|
| 63 |
|
| 64 |
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
| 65 |
model = model.to(device)
|
| 66 |
|
| 67 |
# Chat function
|
| 68 |
+
def chat(prompt, temperature=0.4, top_p=0.9, max_new_tokens=100):
|
| 69 |
formatted = (
|
| 70 |
f"{tokenizer.bos_token}"
|
| 71 |
f"<user>{prompt}</user>"
|
|
|
|
| 80 |
do_sample=True,
|
| 81 |
temperature=temperature,
|
| 82 |
top_p=top_p,
|
| 83 |
+
repetition_penalty=1.2,
|
| 84 |
no_repeat_ngram_size=3,
|
| 85 |
eos_token_id=tokenizer.eos_token_id,
|
| 86 |
pad_token_id=tokenizer.pad_token_id,
|
|
|
|
| 91 |
|
| 92 |
# Example usage
|
| 93 |
print(chat("Hello!"))
|
| 94 |
+
print(chat("Who invented the first telephone?"))
|
| 95 |
print(chat("Who are you?"))
|
| 96 |
```
|
| 97 |
|
| 98 |
### Generation Tips
|
| 99 |
|
| 100 |
+
- `temperature=0.45` β balanced creativity and coherence (recommended)
|
| 101 |
+
- `temperature=0.2` β more focused and deterministic answers
|
| 102 |
+
- `temperature=0.8` β more creative but less reliable
|
| 103 |
+
- `repetition_penalty=1.2` β keeps responses from looping (recommended)
|
| 104 |
+
- `max_new_tokens=100` β increase for longer responses
|
| 105 |
|
| 106 |
## Limitations
|
| 107 |
|
| 108 |
- **50M parameters is small** β factual recall is imperfect and some answers may be incorrect. Always verify factual claims from this model.
|
| 109 |
+
- **Coverage of topics** is limited compared to large-scale models.
|
| 110 |
+
- **Not suitable for** factual research, medical/legal/financial advice, or any high-stakes decision making.
|
| 111 |
- **Context window** β limited to 1024 tokens total (prompt + response).
|
| 112 |
|
| 113 |
## Intended Use
|
| 114 |
|
| 115 |
- Learning and experimentation with small language models
|
| 116 |
- Lightweight conversational agent for low-stakes applications
|
| 117 |
+
- Fine-tuning base for domain-specific chat applications
|
|
|
|
|
|
|
|
|
|
|
|