docs: sync README with Colab quickstart workflow
Browse files
README.md
CHANGED
|
@@ -21,6 +21,7 @@ pipeline_tag: text-generation
|
|
| 21 |
|
| 22 |
A Top-2 dynamic router activates 2 of 8 LoRA experts per transformer block — expanding effective capacity while keeping active compute identical to the dense baseline
|
| 23 |
|
|
|
|
| 24 |
[](https://opensource.org/licenses/Apache-2.0)
|
| 25 |
[](https://huggingface.co/Qwen/Qwen2.5-3B)
|
| 26 |
[](https://huggingface.co/iamrahulreddy/Keiro)
|
|
@@ -179,7 +180,7 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
| 179 |
tokenizer = AutoTokenizer.from_pretrained(REPO_ID)
|
| 180 |
base_model = AutoModelForCausalLM.from_pretrained(
|
| 181 |
"Qwen/Qwen2.5-3B",
|
| 182 |
-
|
| 183 |
device_map=device,
|
| 184 |
)
|
| 185 |
```
|
|
|
|
| 21 |
|
| 22 |
A Top-2 dynamic router activates 2 of 8 LoRA experts per transformer block — expanding effective capacity while keeping active compute identical to the dense baseline
|
| 23 |
|
| 24 |
+
[](https://colab.research.google.com/drive/171reT1vWXN3-YIzKgvEY3j70rtNiRo_1)
|
| 25 |
[](https://opensource.org/licenses/Apache-2.0)
|
| 26 |
[](https://huggingface.co/Qwen/Qwen2.5-3B)
|
| 27 |
[](https://huggingface.co/iamrahulreddy/Keiro)
|
|
|
|
| 180 |
tokenizer = AutoTokenizer.from_pretrained(REPO_ID)
|
| 181 |
base_model = AutoModelForCausalLM.from_pretrained(
|
| 182 |
"Qwen/Qwen2.5-3B",
|
| 183 |
+
dtype=torch.bfloat16,
|
| 184 |
device_map=device,
|
| 185 |
)
|
| 186 |
```
|