serda-dev commited on
Commit
70748ee
·
verified ·
1 Parent(s): b6d4a27

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -25,13 +25,13 @@ This repository provides a **Turkish continued-pretrained** variant of **`state-
25
 
26
  ## What is Mamba?
27
 
28
- **Mamba** is a selective **State Space Model (SSM)** architecture designed for efficient sequence modeling with **linear-time** scaling in sequence length. It was introduced by Gu & Dao in *“Mamba: Linear-Time Sequence Modeling with Selective State Spaces”*. :contentReference[oaicite:0]{index=0}
29
 
30
  ---
31
 
32
  ## Training summary (this checkpoint)
33
 
34
- - **Base model:** `state-spaces/mamba-130m-hf` :contentReference[oaicite:1]{index=1}
35
  - **Training type:** Continued pretraining (CPT) / domain-adaptation pretraining for Turkish
36
  - **Hardware:** Single GPU **NVIDIA GeForce RTX 4060 Laptop GPU**
37
  - **Raw text used:** ~**400 MB** Turkish text (after your preprocessing)
@@ -58,7 +58,7 @@ This repository provides a **Turkish continued-pretrained** variant of **`state-
58
 
59
  ### Install requirements (recommended)
60
 
61
- The original publisher recommends installing `transformers` from `main` (historically required until a given release), plus the optimized CUDA-kernel dependencies for best performance: `causal-conv1d` and `mamba-ssm`. :contentReference[oaicite:2]{index=2}
62
 
63
  ```bash
64
  pip install git+https://github.com/huggingface/transformers@main
@@ -72,7 +72,7 @@ If `causal-conv1d` and/or `mamba-ssm` are not installed, Transformers will fall
72
 
73
  ## Usage (generation)
74
 
75
- Below is the standard `transformers` generate workflow used by the upstream model card. ([Hugging Face][2])
76
 
77
  ```python
78
  import torch
@@ -100,7 +100,7 @@ print(tokenizer.decode(out[0], skip_special_tokens=True))
100
 
101
  ### Tips
102
 
103
- * For fastest inference on NVIDIA GPUs, ensure **CUDA kernels** are enabled by installing `mamba-ssm` + `causal-conv1d`. ([Hugging Face][1])
104
  * If you run into build issues for these packages, double-check:
105
 
106
  * Your PyTorch CUDA build matches your driver/runtime
@@ -111,12 +111,12 @@ print(tokenizer.decode(out[0], skip_special_tokens=True))
111
 
112
  ## Fine-tuning (PEFT / LoRA)
113
 
114
- The upstream model card includes a PEFT fine-tuning example and recommends keeping the model in **float32** during finetuning in that example context. ([Hugging Face][2])
115
 
116
  High-level LoRA recipe:
117
 
118
  * Keep LR conservative for CPT-adapted models if your dataset is small
119
- * Target Mamba projection modules similarly to upstream suggestions (e.g., `x_proj`, `in_proj`, `out_proj`, embeddings) ([Hugging Face][2])
120
  * Validate perplexity on a held-out Turkish set
121
 
122
  *(If you want, you can paste your exact training script + config and I’ll write a “Reproducibility” section with command lines and hyperparameters.)*
@@ -147,11 +147,11 @@ For a CPT’d base LM, common quick checks:
147
 
148
  ## Acknowledgements (upstream credit)
149
 
150
- This model is a **continued-pretrained derivative** of **`state-spaces/mamba-130m-hf`**. The installation and usage instructions above are based on the upstream Hugging Face model card for Transformers-compatible Mamba. ([Hugging Face][2])
151
 
152
  Mamba architecture reference:
153
 
154
- * Albert Gu, Tri Dao. *Mamba: Linear-Time Sequence Modeling with Selective State Spaces*. ([arXiv][3])
155
 
156
  ---
157
 
@@ -170,7 +170,7 @@ If you use this model in academic work, please cite the Mamba paper:
170
 
171
  Also consider citing the upstream HF checkpoint:
172
 
173
- * `state-spaces/mamba-130m-hf` ([Hugging Face][1])
174
 
175
 
176
  ---
 
25
 
26
  ## What is Mamba?
27
 
28
+ **Mamba** is a selective **State Space Model (SSM)** architecture designed for efficient sequence modeling with **linear-time** scaling in sequence length. It was introduced by Gu & Dao in *“Mamba: Linear-Time Sequence Modeling with Selective State Spaces”*.
29
 
30
  ---
31
 
32
  ## Training summary (this checkpoint)
33
 
34
+ - **Base model:** `state-spaces/mamba-130m-hf`
35
  - **Training type:** Continued pretraining (CPT) / domain-adaptation pretraining for Turkish
36
  - **Hardware:** Single GPU **NVIDIA GeForce RTX 4060 Laptop GPU**
37
  - **Raw text used:** ~**400 MB** Turkish text (after your preprocessing)
 
58
 
59
  ### Install requirements (recommended)
60
 
61
+ The original publisher recommends installing `transformers` from `main` (historically required until a given release), plus the optimized CUDA-kernel dependencies for best performance: `causal-conv1d` and `mamba-ssm`.
62
 
63
  ```bash
64
  pip install git+https://github.com/huggingface/transformers@main
 
72
 
73
  ## Usage (generation)
74
 
75
+ Below is the standard `transformers` generate workflow used by the upstream model card.
76
 
77
  ```python
78
  import torch
 
100
 
101
  ### Tips
102
 
103
+ * For fastest inference on NVIDIA GPUs, ensure **CUDA kernels** are enabled by installing `mamba-ssm` + `causal-conv1d`.
104
  * If you run into build issues for these packages, double-check:
105
 
106
  * Your PyTorch CUDA build matches your driver/runtime
 
111
 
112
  ## Fine-tuning (PEFT / LoRA)
113
 
114
+ The upstream model card includes a PEFT fine-tuning example and recommends keeping the model in **float32** during finetuning in that example context.
115
 
116
  High-level LoRA recipe:
117
 
118
  * Keep LR conservative for CPT-adapted models if your dataset is small
119
+ * Target Mamba projection modules similarly to upstream suggestions (e.g., `x_proj`, `in_proj`, `out_proj`, embeddings)
120
  * Validate perplexity on a held-out Turkish set
121
 
122
  *(If you want, you can paste your exact training script + config and I’ll write a “Reproducibility” section with command lines and hyperparameters.)*
 
147
 
148
  ## Acknowledgements (upstream credit)
149
 
150
+ This model is a **continued-pretrained derivative** of **`state-spaces/mamba-130m-hf`**. The installation and usage instructions above are based on the upstream Hugging Face model card for Transformers-compatible Mamba.
151
 
152
  Mamba architecture reference:
153
 
154
+ * Albert Gu, Tri Dao. *Mamba: Linear-Time Sequence Modeling with Selective State Spaces*.
155
 
156
  ---
157
 
 
170
 
171
  Also consider citing the upstream HF checkpoint:
172
 
173
+ * `state-spaces/mamba-130m-hf`
174
 
175
 
176
  ---