| | --- |
| | license: apache-2.0 |
| | --- |
| | |
| | A toy Llama adapted from [JackFram/llama-160m](https://huggingface.co/JackFram/llama-160m) with special tokens added. |
| |
|
| | This checkpoint can be loaded into MASE's `LlamaQuantized` |
| |
|
| | ```python |
| | from transformers.models.llama import LlamaTokenizer |
| | from chop.models.manual.llama_quantized import ( |
| | LlamaQuantizedConfig, |
| | LlamaQuantizedForCausalLM, |
| | ) |
| | |
| | name="Cheng98/llama-160m" |
| | tokenizer = LlamaTokenizer.from_pretrained(name) |
| | |
| | # override the quant_config to quantized the model |
| | # default does not quantize llama |
| | config = LlamaQuantizedConfig.from_pretrained( |
| | name, |
| | # quant_config="./quant_config_na.toml" |
| | |
| | ) |
| | |
| | llama = LlamaQuantizedForCausalLM.from_pretrained( |
| | name, |
| | config=config, |
| | ) |
| | ``` |