hellosindh commited on
Commit
4d59830
·
verified ·
1 Parent(s): 07eb368

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +105 -13
README.md CHANGED
@@ -1,22 +1,114 @@
1
  ---
2
- base_model: unsloth/qwen3-8b-bnb-4bit
 
3
  tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
  - qwen3
8
- - trl
 
 
 
 
9
  license: apache-2.0
10
- language:
11
- - en
12
  ---
13
 
14
- # Uploaded model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
- - **Developed by:** hellosindh
17
- - **License:** apache-2.0
18
- - **Finetuned from model :** unsloth/qwen3-8b-bnb-4bit
 
 
 
 
 
 
19
 
20
- This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
21
 
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
1
  ---
2
+ language:
3
+ - sd
4
  tags:
5
+ - sindhi
 
 
6
  - qwen3
7
+ - continued-pretraining
8
+ - sindh-text-generation
9
+ - lora
10
+ base_model: unsloth/Qwen3-8B-bnb-4bit
11
+ library_name: peft
12
  license: apache-2.0
 
 
13
  ---
14
 
15
+ # Qwen3-8B Sindhi CPT (Continued Pre-Training)
16
+
17
+ This is a **LoRA adapter** for [Qwen3-8B](https://huggingface.co/unsloth/Qwen3-8B-bnb-4bit), continued pre-trained on **~164M tokens of Sindhi text**.
18
+
19
+ ---
20
+
21
+ ## Model Details
22
+
23
+ | Property | Value |
24
+ |---|---|
25
+ | Base Model | `unsloth/Qwen3-8B-bnb-4bit` |
26
+ | Training Type | Continued Pre-Training (CPT) |
27
+ | Training Tokens | ~164M Sindhi tokens |
28
+ | LoRA Rank | 32 |
29
+ | LoRA Alpha | 64 |
30
+ | Sequence Length | 2048 |
31
+ | Quantization | 4-bit (bnb) |
32
+ | Framework | Unsloth + HuggingFace PEFT |
33
+
34
+ ---
35
+
36
+ ## Usage
37
+
38
+ ### Option 1 — Load with Unsloth (recommended, faster)
39
+ ```python
40
+ from unsloth import FastLanguageModel
41
+
42
+ model, tokenizer = FastLanguageModel.from_pretrained(
43
+ model_name = "hellosindh/qwen3-sindhi-cpt",
44
+ load_in_4bit = True,
45
+ max_seq_length = 2048,
46
+ )
47
+
48
+ # Enable fast inference
49
+ FastLanguageModel.for_inference(model)
50
+ ```
51
+
52
+ ### Option 2 — Load base + adapter separately with PEFT
53
+ ```python
54
+ from transformers import AutoModelForCausalLM, AutoTokenizer
55
+ from peft import PeftModel
56
+ import torch
57
+
58
+ # Load base model
59
+ base_model = AutoModelForCausalLM.from_pretrained(
60
+ "Qwen/Qwen3-8B",
61
+ torch_dtype = torch.bfloat16,
62
+ device_map = "auto",
63
+ load_in_4bit = True,
64
+ )
65
+
66
+ # Load tokenizer
67
+ tokenizer = AutoTokenizer.from_pretrained("hellosindh/qwen3-sindhi-cpt")
68
+
69
+ # Apply Sindhi adapter on top
70
+ model = PeftModel.from_pretrained(base_model, "hellosindh/qwen3-sindhi-cpt")
71
+ ```
72
+
73
+ ### Generate Sindhi text
74
+ ```python
75
+ inputs = tokenizer("سنڌ جي ماڻهو", return_tensors="pt").to("cuda")
76
+
77
+ outputs = model.generate(
78
+ **inputs,
79
+ max_new_tokens = 200,
80
+ temperature = 0.8,
81
+ do_sample = True,
82
+ repetition_penalty = 1.1,
83
+ )
84
+
85
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
86
+ ```
87
+
88
+ ---
89
+
90
+ ## Training Details
91
 
92
+ - **Dataset**: ~164M Sindhi tokens from multiple sources
93
+ - **Tokenizer**: Qwen3 original tokenizer (no modifications)
94
+ - **Hardware**: NVIDIA A100 40GB
95
+ - **Framework**: [Unsloth](https://github.com/unslothai/unsloth) for efficient training
96
+ - **Optimizer**: AdamW 8-bit
97
+ - **Learning Rate**: `5e-5` with cosine scheduler
98
+ - **Final Loss**: ~1.20
99
+
100
+ ---
101
 
102
+ ## Intended Use
103
+
104
+ - Sindhi text generation
105
+ - Synthetic data generation for low-resource Sindhi NLP
106
+ - Base for further fine-tuning on Sindhi tasks (NER, QA, summarization)
107
+ - Pretraining data augmentation for encoder models like [SindhiBERT](https://huggingface.co/hellosindh/sindhi-bert-base)
108
+
109
+ ---
110
+ ## Limitations
111
 
112
+ - This is a **continued pre-training** adapter, not an instruction-tuned model
113
+ - Outputs may not be factually accurate — intended for linguistic pattern learning
114
+ - Best used as a base for task-specific fine-tuning