Update README.md
Browse files
README.md
CHANGED
|
@@ -23,6 +23,15 @@ the same) and initializing it as follows:
|
|
| 23 |
- every L3 token that decodes and re-encodes to multiple Qwen2 token is initialized with the mean of those embeddings
|
| 24 |
- there are no L3 tokens that cannot be translated to one or more Qwen2 tokens (both vocabularies are complete).
|
| 25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
Swapping the vocabulary with the above method yields a mostly coherent but still very confused model. It especially
|
| 27 |
struggles with numbers, and of course the embeddings for the Llama-3 control tokens do not have the significance they
|
| 28 |
would in an instruct-tuned model.
|
|
|
|
| 23 |
- every L3 token that decodes and re-encodes to multiple Qwen2 token is initialized with the mean of those embeddings
|
| 24 |
- there are no L3 tokens that cannot be translated to one or more Qwen2 tokens (both vocabularies are complete).
|
| 25 |
|
| 26 |
+
```python
|
| 27 |
+
for idx in range(target_vocab_size):
|
| 28 |
+
decode = tokenizer_target.decode(torch.tensor(idx, dtype = torch.long), decode_special_tokens = True)
|
| 29 |
+
encode = tokenizer_source.encode(decode, add_special_tokens = False, return_tensors = "pt")
|
| 30 |
+
new_emb[idx] = old_emb[encode.flatten()].mean(dim = 0)
|
| 31 |
+
new_head[idx] = old_head[encode.flatten()].mean(dim = 0)
|
| 32 |
+
```
|
| 33 |
+
Full script is [here](https://huggingface.co/turboderp/Qwama-0.5B-Instruct/blob/main/vocab_transplant.py).
|
| 34 |
+
|
| 35 |
Swapping the vocabulary with the above method yields a mostly coherent but still very confused model. It especially
|
| 36 |
struggles with numbers, and of course the embeddings for the Llama-3 control tokens do not have the significance they
|
| 37 |
would in an instruct-tuned model.
|