serda-dev
/

mamba-130m-hf-turkish

@@ -25,13 +25,13 @@ This repository provides a **Turkish continued-pretrained** variant of **`state-
 ## What is Mamba?
-**Mamba** is a selective **State Space Model (SSM)** architecture designed for efficient sequence modeling with **linear-time** scaling in sequence length. It was introduced by Gu & Dao in *“Mamba: Linear-Time Sequence Modeling with Selective State Spaces”*. :contentReference[oaicite:0]{index=0}
 ---
 ## Training summary (this checkpoint)
-- **Base model:** `state-spaces/mamba-130m-hf` :contentReference[oaicite:1]{index=1}
 - **Training type:** Continued pretraining (CPT) / domain-adaptation pretraining for Turkish
 - **Hardware:** Single GPU **NVIDIA GeForce RTX 4060 Laptop GPU**
 - **Raw text used:** ~**400 MB** Turkish text (after your preprocessing)
@@ -58,7 +58,7 @@ This repository provides a **Turkish continued-pretrained** variant of **`state-
 ### Install requirements (recommended)
-The original publisher recommends installing `transformers` from `main` (historically required until a given release), plus the optimized CUDA-kernel dependencies for best performance: `causal-conv1d` and `mamba-ssm`. :contentReference[oaicite:2]{index=2}
 ```bash
 pip install git+https://github.com/huggingface/transformers@main
@@ -72,7 +72,7 @@ If `causal-conv1d` and/or `mamba-ssm` are not installed, Transformers will fall
 ## Usage (generation)
-Below is the standard `transformers` generate workflow used by the upstream model card. ([Hugging Face][2])
 ```python
 import torch
@@ -100,7 +100,7 @@ print(tokenizer.decode(out[0], skip_special_tokens=True))
 ### Tips
-* For fastest inference on NVIDIA GPUs, ensure **CUDA kernels** are enabled by installing `mamba-ssm` + `causal-conv1d`. ([Hugging Face][1])
 * If you run into build issues for these packages, double-check:
   * Your PyTorch CUDA build matches your driver/runtime
@@ -111,12 +111,12 @@ print(tokenizer.decode(out[0], skip_special_tokens=True))
 ## Fine-tuning (PEFT / LoRA)
-The upstream model card includes a PEFT fine-tuning example and recommends keeping the model in **float32** during finetuning in that example context. ([Hugging Face][2])
 High-level LoRA recipe:
 * Keep LR conservative for CPT-adapted models if your dataset is small
-* Target Mamba projection modules similarly to upstream suggestions (e.g., `x_proj`, `in_proj`, `out_proj`, embeddings) ([Hugging Face][2])
 * Validate perplexity on a held-out Turkish set
 *(If you want, you can paste your exact training script + config and I’ll write a “Reproducibility” section with command lines and hyperparameters.)*
@@ -147,11 +147,11 @@ For a CPT’d base LM, common quick checks:
 ## Acknowledgements (upstream credit)
-This model is a **continued-pretrained derivative** of **`state-spaces/mamba-130m-hf`**. The installation and usage instructions above are based on the upstream Hugging Face model card for Transformers-compatible Mamba. ([Hugging Face][2])
 Mamba architecture reference:
-* Albert Gu, Tri Dao. *Mamba: Linear-Time Sequence Modeling with Selective State Spaces*. ([arXiv][3])
 ---
@@ -170,7 +170,7 @@ If you use this model in academic work, please cite the Mamba paper:
 Also consider citing the upstream HF checkpoint:
-* `state-spaces/mamba-130m-hf` ([Hugging Face][1])
 ---

 ## What is Mamba?
+**Mamba** is a selective **State Space Model (SSM)** architecture designed for efficient sequence modeling with **linear-time** scaling in sequence length. It was introduced by Gu & Dao in *“Mamba: Linear-Time Sequence Modeling with Selective State Spaces”*.
 ---
 ## Training summary (this checkpoint)
+- **Base model:** `state-spaces/mamba-130m-hf`
 - **Training type:** Continued pretraining (CPT) / domain-adaptation pretraining for Turkish
 - **Hardware:** Single GPU **NVIDIA GeForce RTX 4060 Laptop GPU**
 - **Raw text used:** ~**400 MB** Turkish text (after your preprocessing)
 ### Install requirements (recommended)
+The original publisher recommends installing `transformers` from `main` (historically required until a given release), plus the optimized CUDA-kernel dependencies for best performance: `causal-conv1d` and `mamba-ssm`.
 ```bash
 pip install git+https://github.com/huggingface/transformers@main
 ## Usage (generation)
+Below is the standard `transformers` generate workflow used by the upstream model card.
 ```python
 import torch
 ### Tips
+* For fastest inference on NVIDIA GPUs, ensure **CUDA kernels** are enabled by installing `mamba-ssm` + `causal-conv1d`.
 * If you run into build issues for these packages, double-check:
   * Your PyTorch CUDA build matches your driver/runtime
 ## Fine-tuning (PEFT / LoRA)
+The upstream model card includes a PEFT fine-tuning example and recommends keeping the model in **float32** during finetuning in that example context.
 High-level LoRA recipe:
 * Keep LR conservative for CPT-adapted models if your dataset is small
+* Target Mamba projection modules similarly to upstream suggestions (e.g., `x_proj`, `in_proj`, `out_proj`, embeddings)
 * Validate perplexity on a held-out Turkish set
 *(If you want, you can paste your exact training script + config and I’ll write a “Reproducibility” section with command lines and hyperparameters.)*
 ## Acknowledgements (upstream credit)
+This model is a **continued-pretrained derivative** of **`state-spaces/mamba-130m-hf`**. The installation and usage instructions above are based on the upstream Hugging Face model card for Transformers-compatible Mamba.
 Mamba architecture reference:
+* Albert Gu, Tri Dao. *Mamba: Linear-Time Sequence Modeling with Selective State Spaces*.
 ---
 Also consider citing the upstream HF checkpoint:
+* `state-spaces/mamba-130m-hf`
 ---