Update README.md
Browse files
README.md
CHANGED
|
@@ -25,13 +25,13 @@ This repository provides a **Turkish continued-pretrained** variant of **`state-
|
|
| 25 |
|
| 26 |
## What is Mamba?
|
| 27 |
|
| 28 |
-
**Mamba** is a selective **State Space Model (SSM)** architecture designed for efficient sequence modeling with **linear-time** scaling in sequence length. It was introduced by Gu & Dao in *“Mamba: Linear-Time Sequence Modeling with Selective State Spaces”*.
|
| 29 |
|
| 30 |
---
|
| 31 |
|
| 32 |
## Training summary (this checkpoint)
|
| 33 |
|
| 34 |
-
- **Base model:** `state-spaces/mamba-130m-hf`
|
| 35 |
- **Training type:** Continued pretraining (CPT) / domain-adaptation pretraining for Turkish
|
| 36 |
- **Hardware:** Single GPU **NVIDIA GeForce RTX 4060 Laptop GPU**
|
| 37 |
- **Raw text used:** ~**400 MB** Turkish text (after your preprocessing)
|
|
@@ -58,7 +58,7 @@ This repository provides a **Turkish continued-pretrained** variant of **`state-
|
|
| 58 |
|
| 59 |
### Install requirements (recommended)
|
| 60 |
|
| 61 |
-
The original publisher recommends installing `transformers` from `main` (historically required until a given release), plus the optimized CUDA-kernel dependencies for best performance: `causal-conv1d` and `mamba-ssm`.
|
| 62 |
|
| 63 |
```bash
|
| 64 |
pip install git+https://github.com/huggingface/transformers@main
|
|
@@ -72,7 +72,7 @@ If `causal-conv1d` and/or `mamba-ssm` are not installed, Transformers will fall
|
|
| 72 |
|
| 73 |
## Usage (generation)
|
| 74 |
|
| 75 |
-
Below is the standard `transformers` generate workflow used by the upstream model card.
|
| 76 |
|
| 77 |
```python
|
| 78 |
import torch
|
|
@@ -100,7 +100,7 @@ print(tokenizer.decode(out[0], skip_special_tokens=True))
|
|
| 100 |
|
| 101 |
### Tips
|
| 102 |
|
| 103 |
-
* For fastest inference on NVIDIA GPUs, ensure **CUDA kernels** are enabled by installing `mamba-ssm` + `causal-conv1d`.
|
| 104 |
* If you run into build issues for these packages, double-check:
|
| 105 |
|
| 106 |
* Your PyTorch CUDA build matches your driver/runtime
|
|
@@ -111,12 +111,12 @@ print(tokenizer.decode(out[0], skip_special_tokens=True))
|
|
| 111 |
|
| 112 |
## Fine-tuning (PEFT / LoRA)
|
| 113 |
|
| 114 |
-
The upstream model card includes a PEFT fine-tuning example and recommends keeping the model in **float32** during finetuning in that example context.
|
| 115 |
|
| 116 |
High-level LoRA recipe:
|
| 117 |
|
| 118 |
* Keep LR conservative for CPT-adapted models if your dataset is small
|
| 119 |
-
* Target Mamba projection modules similarly to upstream suggestions (e.g., `x_proj`, `in_proj`, `out_proj`, embeddings)
|
| 120 |
* Validate perplexity on a held-out Turkish set
|
| 121 |
|
| 122 |
*(If you want, you can paste your exact training script + config and I’ll write a “Reproducibility” section with command lines and hyperparameters.)*
|
|
@@ -147,11 +147,11 @@ For a CPT’d base LM, common quick checks:
|
|
| 147 |
|
| 148 |
## Acknowledgements (upstream credit)
|
| 149 |
|
| 150 |
-
This model is a **continued-pretrained derivative** of **`state-spaces/mamba-130m-hf`**. The installation and usage instructions above are based on the upstream Hugging Face model card for Transformers-compatible Mamba.
|
| 151 |
|
| 152 |
Mamba architecture reference:
|
| 153 |
|
| 154 |
-
* Albert Gu, Tri Dao. *Mamba: Linear-Time Sequence Modeling with Selective State Spaces*.
|
| 155 |
|
| 156 |
---
|
| 157 |
|
|
@@ -170,7 +170,7 @@ If you use this model in academic work, please cite the Mamba paper:
|
|
| 170 |
|
| 171 |
Also consider citing the upstream HF checkpoint:
|
| 172 |
|
| 173 |
-
* `state-spaces/mamba-130m-hf`
|
| 174 |
|
| 175 |
|
| 176 |
---
|
|
|
|
| 25 |
|
| 26 |
## What is Mamba?
|
| 27 |
|
| 28 |
+
**Mamba** is a selective **State Space Model (SSM)** architecture designed for efficient sequence modeling with **linear-time** scaling in sequence length. It was introduced by Gu & Dao in *“Mamba: Linear-Time Sequence Modeling with Selective State Spaces”*.
|
| 29 |
|
| 30 |
---
|
| 31 |
|
| 32 |
## Training summary (this checkpoint)
|
| 33 |
|
| 34 |
+
- **Base model:** `state-spaces/mamba-130m-hf`
|
| 35 |
- **Training type:** Continued pretraining (CPT) / domain-adaptation pretraining for Turkish
|
| 36 |
- **Hardware:** Single GPU **NVIDIA GeForce RTX 4060 Laptop GPU**
|
| 37 |
- **Raw text used:** ~**400 MB** Turkish text (after your preprocessing)
|
|
|
|
| 58 |
|
| 59 |
### Install requirements (recommended)
|
| 60 |
|
| 61 |
+
The original publisher recommends installing `transformers` from `main` (historically required until a given release), plus the optimized CUDA-kernel dependencies for best performance: `causal-conv1d` and `mamba-ssm`.
|
| 62 |
|
| 63 |
```bash
|
| 64 |
pip install git+https://github.com/huggingface/transformers@main
|
|
|
|
| 72 |
|
| 73 |
## Usage (generation)
|
| 74 |
|
| 75 |
+
Below is the standard `transformers` generate workflow used by the upstream model card.
|
| 76 |
|
| 77 |
```python
|
| 78 |
import torch
|
|
|
|
| 100 |
|
| 101 |
### Tips
|
| 102 |
|
| 103 |
+
* For fastest inference on NVIDIA GPUs, ensure **CUDA kernels** are enabled by installing `mamba-ssm` + `causal-conv1d`.
|
| 104 |
* If you run into build issues for these packages, double-check:
|
| 105 |
|
| 106 |
* Your PyTorch CUDA build matches your driver/runtime
|
|
|
|
| 111 |
|
| 112 |
## Fine-tuning (PEFT / LoRA)
|
| 113 |
|
| 114 |
+
The upstream model card includes a PEFT fine-tuning example and recommends keeping the model in **float32** during finetuning in that example context.
|
| 115 |
|
| 116 |
High-level LoRA recipe:
|
| 117 |
|
| 118 |
* Keep LR conservative for CPT-adapted models if your dataset is small
|
| 119 |
+
* Target Mamba projection modules similarly to upstream suggestions (e.g., `x_proj`, `in_proj`, `out_proj`, embeddings)
|
| 120 |
* Validate perplexity on a held-out Turkish set
|
| 121 |
|
| 122 |
*(If you want, you can paste your exact training script + config and I’ll write a “Reproducibility” section with command lines and hyperparameters.)*
|
|
|
|
| 147 |
|
| 148 |
## Acknowledgements (upstream credit)
|
| 149 |
|
| 150 |
+
This model is a **continued-pretrained derivative** of **`state-spaces/mamba-130m-hf`**. The installation and usage instructions above are based on the upstream Hugging Face model card for Transformers-compatible Mamba.
|
| 151 |
|
| 152 |
Mamba architecture reference:
|
| 153 |
|
| 154 |
+
* Albert Gu, Tri Dao. *Mamba: Linear-Time Sequence Modeling with Selective State Spaces*.
|
| 155 |
|
| 156 |
---
|
| 157 |
|
|
|
|
| 170 |
|
| 171 |
Also consider citing the upstream HF checkpoint:
|
| 172 |
|
| 173 |
+
* `state-spaces/mamba-130m-hf`
|
| 174 |
|
| 175 |
|
| 176 |
---
|