⚠️ Notice / Duyuru (Latest Version / Son Sürüm)
(04.03.2026 11AM (UTC+3))
EN — Details
Considering the model scale (370M) and the base checkpoint’s overall capability, mamba-370m-hf-turkish has been continued pre-trained with a decent level of success. Turkish fluency is generally acceptable; however, the model can still struggle with semantic coherence, context tracking, and meaningful reasoning in longer or more demanding prompts.
For practical use, we strongly recommend treating this checkpoint as a fine-tuning base (LoRA/QLoRA / supervised fine-tune depending on your setup) rather than a fully “ready-to-use” chat model.
Please note: I will not continue training or re-train this 370M checkpoint again. Future improvements (if any) will come from downstream fine-tuning by users rather than additional continued pre-training from my side.
TR — Detaylar
mamba-370m-hf-turkish, 370M model boyutu ve ana (base) checkpoint’in genel başarısı dikkate alındığında makul bir seviyede continued pre-train edilmiş bir checkpoint’tir. Türkçe akıcılık genel olarak fena değil; ancak özellikle anlamsal tutarlılık, bağlam takibi ve mantıklı bağ kurma gibi konularda hâlâ eksikler görülebilir (özellikle uzun ve zorlayıcı promptlarda).
Bu nedenle bu modeli “tamamen hazır bir sohbet modeli” gibi düşünmek yerine, fine-tune için bir temel checkpoint olarak kullanmanızı öneririm (kurulumunuza göre LoRA/QLoRA / supervised fine-tuning gibi).
Not: 370M checkpoint’i tekrar eğitmeyi / yeniden continued pre-train etmeyi planlamıyorum. İyileştirmeler, benim tarafımdan ek CPT yerine, kullanıcıların yapacağı downstream fine-tune süreçleriyle gelmelidir.
Turkish Continued Pretraining of mamba-370m-hf
This repository provides a Turkish continued-pretrained variant of state-spaces/mamba-370m-hf (Transformers-compatible Mamba ~370M). The goal is to improve Turkish fluency and local domain robustness while preserving the original architecture and Hugging Face Transformers usage experience.
Developed by the LinguAI Team, affiliated with KTUN and the YAZGİT community.
What is Mamba?
Mamba is a selective State Space Model (SSM) architecture designed for efficient sequence modeling with linear-time scaling in sequence length. It was introduced by Gu & Dao in “Mamba: Linear-Time Sequence Modeling with Selective State Spaces”.
Training summary (this checkpoint)
- Base model:
state-spaces/mamba-370m-hf - Training type: Continued pretraining (CPT) / domain-adaptation pretraining for Turkish
- Hardware: Single GPU NVIDIA GeForce RTX 4060 Laptop GPU
- Raw text used: ~400 MB Turkish text (after preprocessing)
- Approx token estimate: ~80M–120M tokens (rule-of-thumb: ~3–5 bytes/characters per token depending on tokenizer + text composition)
Notes on the token estimate: Without the exact tokenizer statistics (total
input_idscount) and exact encoding composition (UTF-8, whitespace/punctuation density), the most honest representation is a range. If you want an exact number, compute it by summing tokenized lengths across your dataset shards and replace this line.
Intended use
- Turkish text generation (base LM behavior; not instruction-tuned)
- Turkish domain adaptation for downstream fine-tuning (LoRA / full fine-tune)
- Experimentation with SSM-based backbones in
transformers
Not intended for:
- Safety-critical decisions
- Legal/medical advice
- “Chat assistant” behavior out of the box (this is a base causal LM; you’ll need instruction tuning + safety alignment for assistant-like use)
Quickstart
Install requirements (recommended)
The upstream publisher notes you should install transformers from main (historically required until transformers=4.39.0), and recommends installing the CUDA-kernel dependencies causal-conv1d and mamba-ssm for best performance.
pip install git+https://github.com/huggingface/transformers@main
pip install "causal-conv1d>=1.2.0"
pip install mamba-ssm
If either causal-conv1d or mamba-ssm is missing, the "eager" implementation will be used; otherwise optimized CUDA kernels can be used when available.
Usage (generation)
Below is the standard transformers generate workflow used by the upstream model card, adapted to this Turkish CPT checkpoint.
import torch
from transformers import AutoTokenizer, MambaForCausalLM
MODEL_ID = "serda-dev/mamba-370m-hf-turkish"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = MambaForCausalLM.from_pretrained(MODEL_ID)
prompt = "Türkiye'de yazılım mühendisi olmak hakkında kısa bir paragraf yaz:"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=120,
do_sample=True,
temperature=0.9,
top_p=0.95,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))
Tips
For fastest inference on NVIDIA GPUs, ensure CUDA kernels are enabled by installing
mamba-ssm+causal-conv1d.If you run into build issues for these packages, double-check:
- Your PyTorch CUDA build matches your driver/runtime
- You have a compiler toolchain (e.g.,
build-essential) on Linux - You’re using a compatible Python version
Fine-tuning (PEFT / LoRA)
The upstream model card includes a PEFT fine-tuning example and recommends keeping the model in float32 during finetuning in that example context.
High-level LoRA recipe:
- Keep LR conservative for CPT-adapted models if your dataset is small
- Target Mamba modules similarly to upstream suggestions (e.g.,
x_proj,embeddings,in_proj,out_proj) - Validate perplexity on a held-out Turkish set
Evaluation (what to check)
For a CPT’d base LM, common quick checks:
- Perplexity on a held-out Turkish slice
- Qualitative prompts: news style, conversational Turkish, formal writing, domain slang
- Degeneration: repetition loops, short cyclic outputs, odd token fragments
- Catastrophic forgetting: basic English capability (if you care)
Limitations & safety
This is a base language model (not instruction-tuned). It may:
- Hallucinate facts
- Produce biased or unsafe text
- Reflect issues present in training data
Use standard filtering and safety layers for deployments.
Acknowledgements (upstream credit)
This model is a continued-pretrained derivative of state-spaces/mamba-370m-hf. The installation and usage instructions above are based on the upstream Hugging Face model card for the Transformers-compatible Mamba checkpoints.
Mamba architecture reference:
- Albert Gu, Tri Dao. Mamba: Linear-Time Sequence Modeling with Selective State Spaces.
Citation
If you use this model in academic work, please cite the Mamba paper:
@article{gu2023mamba,
title={Mamba: Linear-Time Sequence Modeling with Selective State Spaces},
author={Gu, Albert and Dao, Tri},
journal={arXiv preprint arXiv:2312.00752},
year={2023}
}
Also consider citing the upstream HF checkpoint:
state-spaces/mamba-370m-hf
Team & Affiliations
This model was developed by the LinguAI Team (affiliated with Konya Technical University (KTUN) and operating under the YAZGİT community).
LinguAI Team Members (Core Contributors):
- Ahmet Furkan Kalle
- Alican Tanyeri
- Baris Icoz
- Behlul Bera Anik
- Murat Serda Çelik
- Downloads last month
- 51