| | --- |
| | library_name: transformers |
| | tags: [] |
| | --- |
| | |
| | This repository contains the text-only LLM portion of `meta-llama/Llama-3.2-11B-Vision-Instruct` |
| |
|
| | **How it was done** |
| |
|
| | ```python |
| | from collections import OrderedDict |
| | from transformers import MllamaForConditionalGeneration, AutoModelForCausalLM |
| | from transformers.models.mllama.modeling_mllama import MllamaCrossAttentionDecoderLayer |
| | llama32_id = "meta-llama/Llama-3.2-11B-Vision-Instruct" |
| | llama32 = MllamaForConditionalGeneration.from_pretrained( |
| | llama32_id, |
| | torch_dtype=torch.bfloat16, |
| | device_map="cuda:0", |
| | ) |
| | |
| | |
| | new_layers = [] |
| | for idx, layer in enumerate(llama32.language_model.model.layers): |
| | if isinstance(layer, MllamaCrossAttentionDecoderLayer): |
| | # CrossAttention layers are only take effect when image is provided. |
| | # Ignore here since we want text-only model |
| | pass |
| | else: |
| | new_layers.append(layer) |
| | llama32.language_model.model.cross_attention_layers = [] |
| | llama32.language_model.model.layers = torch.nn.ModuleList(new_layers) |
| | |
| | |
| | # Now llama32.language_model is identical to Llama3.1-8B-Instruct, except the embedding size(+8) |
| | # see: https://github.com/huggingface/transformers/blob/a22a4378d97d06b7a1d9abad6e0086d30fdea199/src/transformers/models/mllama/modeling_mllama.py#L1667C9-L1667C26 |
| | new_llama32_state_dict = OrderedDict() |
| | for k, v in llama32.language_model.state_dict().items(): |
| | if k == "model.embed_tokens.weight": |
| | v = v[:128256, :] |
| | new_llama32_state_dict[k] = v |
| | |
| | |
| | # Load a llama31 for the architecture |
| | llama31_id = "meta-llama/Llama-3.1-8B-Instruct" |
| | llama31 = AutoModelForCausalLM.from_pretrained( |
| | llama31_id, |
| | torch_dtype=torch.bfloat16, |
| | device_map="cuda:1", |
| | ) |
| | |
| | llama31.load_state_dict(new_llama32_state_dict) |
| | # <All keys matched successfully> |
| | |
| | llama31.save_pretrained("./my-cool-llama3.2") |
| | ``` |
| |
|
| |
|
| | **Note:** |
| |
|
| | In the original tokenizer, there are `date_string` in `tokenizer.chat_template` (which append the current date when calling `tokenizer.apply_chat_template(messages)`). |
| |
|
| | I removed this behavior in this repo. Please be aware when you use `AutoTokenizer.from_pretrained(this_repo)`. |
| |
|