Transformer Baseline (GPT-2 Style) - Turkish

This model serves as the baseline for a comparative study between Transformer and Mamba architectures on agglutinative languages (specifically Turkish). It is a decoder-only Transformer model (~111M parameters) trained on the Turkish Wikipedia dataset.

Model Description

  • Architecture: GPT-2 (Small)
  • Parameters: ~111 Million
  • Context Length: 1024
  • Training Data: Turkish Wikipedia (Nov 2023)
  • Purpose: To provide a performance benchmark for the Mamba architecture.

Usage

from transformers import GPT2LMHeadModel, PreTrainedTokenizerFast

model_id = "oguzatas/mamba-tr-project-transformer"
tokenizer_id = "oguzatas/mamba-tr-project-tokenizer"

tokenizer = PreTrainedTokenizerFast.from_pretrained(tokenizer_id)
model = GPT2LMHeadModel.from_pretrained(model_id)

text = "Türkiye'nin başkenti"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)

print(tokenizer.decode(outputs[0]))
Downloads last month
23
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using oguzatas/mamba-tr-project-transformer 1