---
language:
- en
license: mit
tags:
- lora
- tool-calling
- llama3
- instruction-tuning
- json-generation
base_model: meta-llama/Meta-Llama-3-8B-Instruct
---

# Tool-Calling LoRA for LLaMA-3-8B-Instruct

This is a LoRA (Low-Rank Adaptation) model fine-tuned on tool-calling datasets to enhance the model's ability to generate structured JSON responses for tool execution.

## Model Details

- **Base Model**: meta-llama/Meta-Llama-3-8B-Instruct
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
- **LoRA Rank**: 16
- **LoRA Alpha**: 32
- **Training Dataset**: Custom tool-calling dataset with 357 samples
- **Training Epochs**: 5
- **Learning Rate**: 5.0e-5

## Usage

### Load the Model

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3-8B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")

# Load and merge LoRA
model = PeftModel.from_pretrained(base_model, "YOUR_USERNAME/llama-traces")
model = model.merge_and_unload()

# Generate tool-calling responses
def generate_tool_call(prompt):
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
prompt = "Check the weather in New York"
response = generate_tool_call(prompt)
print(response)
```

### Expected Output Format

The model generates structured JSON responses like:
```json
{
  "trace_id": "002",
  "steps": [
    {
      "action": "call_api",
      "api": "weather_api",
      "arguments": {"location": "New York"}
    },
    {
      "action": "respond",
      "message": "The weather in New York is currently sunny with a temperature of 72°F."
    }
  ]
}
```

## Training Details

- **Dataset**: Custom tool-calling dataset with instruction/input/output format
- **Template**: llama3 chat template
- **Cutoff Length**: 4096 tokens
- **Batch Size**: 2 (effective batch size: 8 with gradient accumulation)
- **Optimizer**: AdamW with cosine learning rate scheduling
- **Warmup Ratio**: 0.1

## Performance

The model shows improved capability in:
- Generating structured JSON responses
- Following tool-calling patterns
- Maintaining context for multi-step tool execution
- Producing consistent output formats

## Limitations

- Requires the base LLaMA-3-8B-Instruct model to function
- May generate invalid JSON in some edge cases
- Performance depends on the quality of the training data

## License

This model is released under the MIT License.