| | --- |
| | language: |
| | - en |
| | license: mit |
| | tags: |
| | - lora |
| | - tool-calling |
| | - llama3 |
| | - instruction-tuning |
| | - json-generation |
| | base_model: meta-llama/Meta-Llama-3-8B-Instruct |
| | --- |
| | |
| | # Tool-Calling LoRA for LLaMA-3-8B-Instruct |
| |
|
| | This is a LoRA (Low-Rank Adaptation) model fine-tuned on tool-calling datasets to enhance the model's ability to generate structured JSON responses for tool execution. |
| |
|
| | ## Model Details |
| |
|
| | - **Base Model**: meta-llama/Meta-Llama-3-8B-Instruct |
| | - **Fine-tuning Method**: LoRA (Low-Rank Adaptation) |
| | - **LoRA Rank**: 16 |
| | - **LoRA Alpha**: 32 |
| | - **Training Dataset**: Custom tool-calling dataset with 357 samples |
| | - **Training Epochs**: 5 |
| | - **Learning Rate**: 5.0e-5 |
| |
|
| | ## Usage |
| |
|
| | ### Load the Model |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForCausalLM |
| | from peft import PeftModel |
| | |
| | # Load base model and tokenizer |
| | base_model = AutoModelForCausalLM.from_pretrained( |
| | "meta-llama/Meta-Llama-3-8B-Instruct", |
| | torch_dtype=torch.bfloat16, |
| | device_map="auto" |
| | ) |
| | tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct") |
| | |
| | # Load and merge LoRA |
| | model = PeftModel.from_pretrained(base_model, "YOUR_USERNAME/llama-traces") |
| | model = model.merge_and_unload() |
| | |
| | # Generate tool-calling responses |
| | def generate_tool_call(prompt): |
| | inputs = tokenizer(prompt, return_tensors="pt") |
| | outputs = model.generate( |
| | **inputs, |
| | max_new_tokens=512, |
| | temperature=0.7, |
| | do_sample=True, |
| | pad_token_id=tokenizer.eos_token_id |
| | ) |
| | return tokenizer.decode(outputs[0], skip_special_tokens=True) |
| | |
| | # Example usage |
| | prompt = "Check the weather in New York" |
| | response = generate_tool_call(prompt) |
| | print(response) |
| | ``` |
| |
|
| | ### Expected Output Format |
| |
|
| | The model generates structured JSON responses like: |
| | ```json |
| | { |
| | "trace_id": "002", |
| | "steps": [ |
| | { |
| | "action": "call_api", |
| | "api": "weather_api", |
| | "arguments": {"location": "New York"} |
| | }, |
| | { |
| | "action": "respond", |
| | "message": "The weather in New York is currently sunny with a temperature of 72°F." |
| | } |
| | ] |
| | } |
| | ``` |
| |
|
| | ## Training Details |
| |
|
| | - **Dataset**: Custom tool-calling dataset with instruction/input/output format |
| | - **Template**: llama3 chat template |
| | - **Cutoff Length**: 4096 tokens |
| | - **Batch Size**: 2 (effective batch size: 8 with gradient accumulation) |
| | - **Optimizer**: AdamW with cosine learning rate scheduling |
| | - **Warmup Ratio**: 0.1 |
| |
|
| | ## Performance |
| |
|
| | The model shows improved capability in: |
| | - Generating structured JSON responses |
| | - Following tool-calling patterns |
| | - Maintaining context for multi-step tool execution |
| | - Producing consistent output formats |
| |
|
| | ## Limitations |
| |
|
| | - Requires the base LLaMA-3-8B-Instruct model to function |
| | - May generate invalid JSON in some edge cases |
| | - Performance depends on the quality of the training data |
| |
|
| | ## License |
| |
|
| | This model is released under the MIT License. |
| |
|