--- language: - en license: mit tags: - text-generation - causal-lm - gpt2 - chat - conversational pipeline_tag: text-generation datasets: - LucidexAi/VIBE-2K - HuggingFaceTB/instruct-data-basics-smollm-H4 - MuskumPillerum/General-Knowledge library_name: transformers --- # FuadeAI-50M A 50 million parameter causal language model trained for conversational chat, built on a GPT-2 architecture with a custom tokenizer. ## Model Details | Property | Value | |---|---| | Parameters | 51.5M | | Architecture | GPT-2 (custom config) | | Hidden size | 512 | | Layers | 8 | | Attention heads | 8 | | Context length | 1024 tokens | | Tokenizer | GPT-2 + custom special tokens | | Training precision | FP16 | ## Special Tokens | Token | Purpose | |---|---| | `<\|startoftext\|>` | Beginning of conversation | | `` / `` | Wraps user message | | `` / `` | Wraps assistant response | | `<\|endoftext\|>` | End of conversation | ## Training Data - [LucidexAi/VIBE-2K](https://huggingface.co/datasets/LucidexAi/VIBE-2K) - [HuggingFaceTB/instruct-data-basics-smollm-H4](https://huggingface.co/datasets/HuggingFaceTB/instruct-data-basics-smollm-H4) - [MuskumPillerum/General-Knowledge](https://huggingface.co/datasets/MuskumPillerum/General-Knowledge) (4k random rows) - Custom synthetic dataset for identity and conversational grounding ## How To Use ### Transformers ```python from transformers import GPT2Tokenizer, GPT2LMHeadModel import torch # Load model and tokenizer tokenizer = GPT2Tokenizer.from_pretrained("Fu01978/FuadeAI-50M") model = GPT2LMHeadModel.from_pretrained("Fu01978/FuadeAI-50M") model.eval() device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = model.to(device) # Chat function def chat(prompt, temperature=0.4, top_p=0.9, max_new_tokens=100): formatted = ( f"{tokenizer.bos_token}" f"{prompt}" f"" ) inputs = tokenizer(formatted, return_tensors="pt").to(device) with torch.no_grad(): output = model.generate( **inputs, max_new_tokens=max_new_tokens, do_sample=True, temperature=temperature, top_p=top_p, repetition_penalty=1.2, no_repeat_ngram_size=3, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id, ) generated = output[0][inputs["input_ids"].shape[-1]:] return tokenizer.decode(generated, skip_special_tokens=True).strip() # Example usage print(chat("Hello!")) print(chat("Who invented the first telephone?")) print(chat("Who are you?")) ``` ### Generation Tips - `temperature=0.45` — balanced creativity and coherence (recommended) - `temperature=0.2` — more focused and deterministic answers - `temperature=0.8` — more creative but less reliable - `repetition_penalty=1.2` — keeps responses from looping (recommended) - `max_new_tokens=100` — increase for longer responses ## Limitations - **50M parameters is small** — factual recall is imperfect and some answers may be incorrect. Always verify factual claims from this model. - **Coverage of topics** is limited compared to large-scale models. - **Not suitable for** factual research, medical/legal/financial advice, or any high-stakes decision making. - **Context window** — limited to 1024 tokens total (prompt + response). ## Intended Use - Learning and experimentation with small language models - Lightweight conversational agent for low-stakes applications - Fine-tuning base for domain-specific chat applications