Model based on https://github.com/22-hours/cabrita Install dependencies ``` !pip install -q datasets loralib sentencepiece !pip uninstall transformers -y !pip install git+https://github.com/huggingface/transformers.git !pip -q install git+https://github.com/huggingface/peft.git !pip -q install bitsandbytes ``` Import ``` from peft import PeftModel from transformers import LlamaTokenizer, LlamaForCausalLM, GenerationConfig import textwrap ``` Define model ``` tokenizer = LlamaTokenizer.from_pretrained("decapoda-research/llama-7b-hf") model = LlamaForCausalLM.from_pretrained( "decapoda-research/llama-7b-hf", load_in_8bit=True, device_map="auto", ) model = PeftModel.from_pretrained(model, "berchielli/cabrita-7b-pt-br") ``` Use the model for inferences ``` generation_config = GenerationConfig( temperature=0.9, top_p=0.75, num_beams=4, ) prompt = inputs = tokenizer(prompt, return_tensors="pt") input_ids = inputs["input_ids"].cuda() generation_output = model.generate( input_ids=input_ids, generation_config=generation_config, return_dict_in_generate=True, output_scores=True, max_new_tokens=256 ) ```