Transformer Baseline (GPT-2 Style) - Turkish
This model serves as the baseline for a comparative study between Transformer and Mamba architectures on agglutinative languages (specifically Turkish). It is a decoder-only Transformer model (~111M parameters) trained on the Turkish Wikipedia dataset.
Model Description
- Architecture: GPT-2 (Small)
- Parameters: ~111 Million
- Context Length: 1024
- Training Data: Turkish Wikipedia (Nov 2023)
- Purpose: To provide a performance benchmark for the Mamba architecture.
Usage
from transformers import GPT2LMHeadModel, PreTrainedTokenizerFast
model_id = "oguzatas/mamba-tr-project-transformer"
tokenizer_id = "oguzatas/mamba-tr-project-tokenizer"
tokenizer = PreTrainedTokenizerFast.from_pretrained(tokenizer_id)
model = GPT2LMHeadModel.from_pretrained(model_id)
text = "Türkiye'nin başkenti"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0]))
- Downloads last month
- 23