Instructions to use Fan21/Llama-mt-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Fan21/Llama-mt-lora with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("question-answering", model="Fan21/Llama-mt-lora")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Fan21/Llama-mt-lora") model = AutoModelForCausalLM.from_pretrained("Fan21/Llama-mt-lora") - Notebooks
- Google Colab
- Kaggle
| license: mit | |
| language: | |
| - en | |
| pipeline_tag: question-answering | |
| # Llama-mt-lora | |
| <!-- Provide a quick summary of what the model is/does. --> | |
| This model is fine-tuned with LLaMA with 8 Nvidia A100-80G GPUs using 3,000,000 groups of conversations in the context of mathematics by students and facilitators on Algebra Nation (https://www.mathnation.com/). Llama-mt-lora consists of 32 layers and over 7 billion parameters, consuming up to 13.5 gigabytes of disk space. Researchers can experiment with and finetune the model to help construct math conversational AI that can effectively respond generation in a mathematical context. | |
| ### Here is how to use it with texts in HuggingFace | |
| ```python | |
| import torch | |
| import transformers | |
| from transformers import LlamaTokenizer, AutoModelForCausalLM | |
| tokenizer = LlamaTokenizer.from_pretrained("Fan21/Llama-mt-lora") | |
| mdoel = LlamaForCausalLM.from_pretrained( | |
| "Fan21/Llama-mt-lora", | |
| load_in_8bit=False, | |
| torch_dtype=torch.float16, | |
| device_map="auto", | |
| ) | |
| def generate_prompt(instruction, input=None): | |
| if input: | |
| return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. | |
| ### Instruction: | |
| {instruction} | |
| ### Input: | |
| {input} | |
| ### Response:""" | |
| else: | |
| return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request. | |
| ### Instruction: | |
| {instruction} | |
| ### Response:""" | |
| def evaluate( | |
| instruction, | |
| input=None, | |
| temperature=0.1, | |
| top_p=0.75, | |
| top_k=40, | |
| num_beams=4, | |
| max_new_tokens=128, | |
| **kwargs, | |
| ): | |
| prompt = generate_prompt(instruction, input) | |
| inputs = tokenizer(prompt, return_tensors="pt") | |
| input_ids = inputs["input_ids"].to(device) | |
| generation_config = GenerationConfig( | |
| temperature=temperature, | |
| top_p=top_p, | |
| top_k=top_k, | |
| num_beams=num_beams, | |
| **kwargs, | |
| ) | |
| with torch.no_grad(): | |
| generation_output = model.generate( | |
| input_ids=input_ids, | |
| generation_config=generation_config, | |
| return_dict_in_generate=True, | |
| output_scores=True, | |
| max_new_tokens=max_new_tokens, | |
| ) | |
| s = generation_output.sequences[0] | |
| output = tokenizer.decode(s) | |
| return output.split("### Response:")[1].strip() | |
| instruction = 'write your instruction here' | |
| inputs = 'write your inputs here' | |
| output= evaluate(instruction, | |
| input=inputs, | |
| temperature=0.1,#change the parameters by yourself | |
| top_p=0.75, | |
| top_k=40, | |
| num_beams=4, | |
| max_new_tokens=128,) | |
| ``` | |