--- license: apache-2.0 language: - en pipeline_tag: text-generation library_name: transformers --- # TD-Llama-OP ## TL;DR **TD (ToolDial)-Llama-OP (OverallPerformance)** is the same model used in [ToolDial](https://arxiv.org/abs/2503.00564) paper **Overall Performance Task**. We encourage you to use this model to reproduce the results. Please refer the **Experiments** of our [github page](https://github.com/holi-lab/ToolDial) to see how our evaluation has proceed. **[Model Summary]** - Trained with Q-lora quantization, and LoRA Adapters are merged to original weights. - Trained for 1 epoch with Adam-8bit optimizer with learning rate 0.00001 and beta 0.9 to 0.995 **[How to load the model]** ```python from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig from peft import PeftModel device = "cuda:0" quant_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type='nf4', bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, ) ## 1. Load the base model (we use llama3-8b-inst) with the given quantization config. base_model = AutoModelForCausalLM.from_pretrained( "meta-llama/Meta-Llama-3-8B-Instruct", quantization_config=quant_config, device_map={"": device}, ) tokenizer = AutoTokenizer.from_pretrained("HOLILAB/td-llama-op") tokenizer.pad_token_id = tokenizer.eos_token_id ## 2. Load the lora adapter with PeftModel model = PeftModel.from_pretrained(base_model, "HOLILAB/td-llama-op") ```