llama-traces / README.md
pavan01729's picture
Update model card with comprehensive documentation
e0396b0 verified
---
language:
- en
license: mit
tags:
- lora
- tool-calling
- llama3
- instruction-tuning
- json-generation
base_model: meta-llama/Meta-Llama-3-8B-Instruct
---
# Tool-Calling LoRA for LLaMA-3-8B-Instruct
This is a LoRA (Low-Rank Adaptation) model fine-tuned on tool-calling datasets to enhance the model's ability to generate structured JSON responses for tool execution.
## Model Details
- **Base Model**: meta-llama/Meta-Llama-3-8B-Instruct
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
- **LoRA Rank**: 16
- **LoRA Alpha**: 32
- **Training Dataset**: Custom tool-calling dataset with 357 samples
- **Training Epochs**: 5
- **Learning Rate**: 5.0e-5
## Usage
### Load the Model
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Meta-Llama-3-8B-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
# Load and merge LoRA
model = PeftModel.from_pretrained(base_model, "YOUR_USERNAME/llama-traces")
model = model.merge_and_unload()
# Generate tool-calling responses
def generate_tool_call(prompt):
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
# Example usage
prompt = "Check the weather in New York"
response = generate_tool_call(prompt)
print(response)
```
### Expected Output Format
The model generates structured JSON responses like:
```json
{
"trace_id": "002",
"steps": [
{
"action": "call_api",
"api": "weather_api",
"arguments": {"location": "New York"}
},
{
"action": "respond",
"message": "The weather in New York is currently sunny with a temperature of 72°F."
}
]
}
```
## Training Details
- **Dataset**: Custom tool-calling dataset with instruction/input/output format
- **Template**: llama3 chat template
- **Cutoff Length**: 4096 tokens
- **Batch Size**: 2 (effective batch size: 8 with gradient accumulation)
- **Optimizer**: AdamW with cosine learning rate scheduling
- **Warmup Ratio**: 0.1
## Performance
The model shows improved capability in:
- Generating structured JSON responses
- Following tool-calling patterns
- Maintaining context for multi-step tool execution
- Producing consistent output formats
## Limitations
- Requires the base LLaMA-3-8B-Instruct model to function
- May generate invalid JSON in some edge cases
- Performance depends on the quality of the training data
## License
This model is released under the MIT License.