FuadeAI-50M / README.md

Fu01978

Update README.md

92c90aa verified 5 days ago

preview code

raw

history blame contribute delete

3.61 kB

metadata

language:
  - en
license: mit
tags:
  - text-generation
  - causal-lm
  - gpt2
  - chat
  - conversational
pipeline_tag: text-generation
datasets:
  - LucidexAi/VIBE-2K
  - HuggingFaceTB/instruct-data-basics-smollm-H4
  - MuskumPillerum/General-Knowledge
library_name: transformers

FuadeAI-50M

A 50 million parameter causal language model trained for conversational chat, built on a GPT-2 architecture with a custom tokenizer.

Model Details

Property	Value
Parameters	51.5M
Architecture	GPT-2 (custom config)
Hidden size	512
Layers	8
Attention heads	8
Context length	1024 tokens
Tokenizer	GPT-2 + custom special tokens
Training precision	FP16

Special Tokens

Token	Purpose
`<\|startoftext\|>`	Beginning of conversation
`<user>` / `</user>`	Wraps user message
`<assistant>` / `</assistant>`	Wraps assistant response
`<\|endoftext\|>`	End of conversation

Training Data

LucidexAi/VIBE-2K
HuggingFaceTB/instruct-data-basics-smollm-H4
MuskumPillerum/General-Knowledge (4k random rows)
Custom synthetic dataset for identity and conversational grounding

How To Use

Transformers

from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch

# Load model and tokenizer
tokenizer = GPT2Tokenizer.from_pretrained("Fu01978/FuadeAI-50M")
model = GPT2LMHeadModel.from_pretrained("Fu01978/FuadeAI-50M")
model.eval()

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

# Chat function
def chat(prompt, temperature=0.4, top_p=0.9, max_new_tokens=100):
    formatted = (
        f"{tokenizer.bos_token}"
        f"<user>{prompt}</user>"
        f"<assistant>"
    )
    inputs = tokenizer(formatted, return_tensors="pt").to(device)

    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=temperature,
            top_p=top_p,
            repetition_penalty=1.2,
            no_repeat_ngram_size=3,
            eos_token_id=tokenizer.eos_token_id,
            pad_token_id=tokenizer.pad_token_id,
        )

    generated = output[0][inputs["input_ids"].shape[-1]:]
    return tokenizer.decode(generated, skip_special_tokens=True).strip()

# Example usage
print(chat("Hello!"))
print(chat("Who invented the first telephone?"))
print(chat("Who are you?"))

Generation Tips

temperature=0.45 — balanced creativity and coherence (recommended)
temperature=0.2 — more focused and deterministic answers
temperature=0.8 — more creative but less reliable
repetition_penalty=1.2 — keeps responses from looping (recommended)
max_new_tokens=100 — increase for longer responses

Limitations

50M parameters is small — factual recall is imperfect and some answers may be incorrect. Always verify factual claims from this model.
Coverage of topics is limited compared to large-scale models.
Not suitable for factual research, medical/legal/financial advice, or any high-stakes decision making.
Context window — limited to 1024 tokens total (prompt + response).

Intended Use

Learning and experimentation with small language models
Lightweight conversational agent for low-stakes applications
Fine-tuning base for domain-specific chat applications