td-llama-op / README.md
Holi-jhshim's picture
Update README.md
6e7f6d2 verified
---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
library_name: transformers
---
# TD-Llama-OP
## TL;DR
**TD (ToolDial)-Llama-OP (OverallPerformance)** is the same model used in [ToolDial](https://arxiv.org/abs/2503.00564) paper **Overall Performance Task**. We encourage you to use this model to reproduce the results.
Please refer the **Experiments** of our [github page](https://github.com/holi-lab/ToolDial) to see how our evaluation has proceed.
**[Model Summary]**
- Trained with Q-lora quantization, and LoRA Adapters are merged to original weights.
- Trained for 1 epoch with Adam-8bit optimizer with learning rate 0.00001 and beta 0.9 to 0.995
**[How to load the model]**
```python
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
device = "cuda:0"
quant_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type='nf4',
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
## 1. Load the base model (we use llama3-8b-inst) with the given quantization config.
base_model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Meta-Llama-3-8B-Instruct",
quantization_config=quant_config,
device_map={"": device},
)
tokenizer = AutoTokenizer.from_pretrained("HOLILAB/td-llama-op")
tokenizer.pad_token_id = tokenizer.eos_token_id
## 2. Load the lora adapter with PeftModel
model = PeftModel.from_pretrained(base_model, "HOLILAB/td-llama-op")
```