iamrahulreddy commited on
Commit
06cb25e
·
verified ·
1 Parent(s): e733237

docs: sync README with Colab quickstart workflow

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -21,6 +21,7 @@ pipeline_tag: text-generation
21
 
22
  A Top-2 dynamic router activates 2 of 8 LoRA experts per transformer block — expanding effective capacity while keeping active compute identical to the dense baseline
23
 
 
24
  [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
25
  [![Base Model](https://img.shields.io/badge/Base-Qwen2.5--3B-orange)](https://huggingface.co/Qwen/Qwen2.5-3B)
26
  [![Architecture](https://img.shields.io/badge/Architecture-Sparse_MoE-purple)](https://huggingface.co/iamrahulreddy/Keiro)
@@ -179,7 +180,7 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
179
  tokenizer = AutoTokenizer.from_pretrained(REPO_ID)
180
  base_model = AutoModelForCausalLM.from_pretrained(
181
  "Qwen/Qwen2.5-3B",
182
- torch_dtype=torch.bfloat16,
183
  device_map=device,
184
  )
185
  ```
 
21
 
22
  A Top-2 dynamic router activates 2 of 8 LoRA experts per transformer block — expanding effective capacity while keeping active compute identical to the dense baseline
23
 
24
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/171reT1vWXN3-YIzKgvEY3j70rtNiRo_1)
25
  [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
26
  [![Base Model](https://img.shields.io/badge/Base-Qwen2.5--3B-orange)](https://huggingface.co/Qwen/Qwen2.5-3B)
27
  [![Architecture](https://img.shields.io/badge/Architecture-Sparse_MoE-purple)](https://huggingface.co/iamrahulreddy/Keiro)
 
180
  tokenizer = AutoTokenizer.from_pretrained(REPO_ID)
181
  base_model = AutoModelForCausalLM.from_pretrained(
182
  "Qwen/Qwen2.5-3B",
183
+ dtype=torch.bfloat16,
184
  device_map=device,
185
  )
186
  ```