Qwen2.5-7B-Agent-Mixed-Trajectory-AWQ
This repository provides a 4-bit AWQ quantized version of a merged model fine-tuned from unsloth/Qwen2.5-7B-Instruct using LoRA + Unsloth.
The original LoRA adapter was trained on mixed agent trajectory data (ALFWorld + DBBench), then merged into the base model and quantized with AutoAWQ for faster inference.
Quantization Details
| Parameter | Value |
|---|---|
| Method | AWQ (Activation-aware Weight Quantization) |
| Bits | 4-bit |
| Group size | 128 |
| Zero point | True |
| Version | GEMM |
| Library | autoawq 0.2.7.post3 |
Dataset Construction
Training data was built by mixing and preprocessing two trajectory datasets:
- ALFWorld (
u-10bei/sft_alfworld_trajectory_dataset_v5): 2,327 samples after cleaning - DBBench (
u-10bei/dbbench_sft_dataset_react_v4): 1,200 samples after cleaning
Preprocessing steps applied to ALFWorld:
- Removal of htags template contamination
- Removal of hallucinated object IDs (e.g.
bowl 99)
Category-level upsampling was applied to reinforce weak task types:
| Category | Multiplier |
|---|---|
| ALFWorld multi-object | ×3 |
| ALFWorld cool | ×2 |
| ALFWorld examine | ×1.5 |
| DBBench aggregation-MAX | ×3 |
| DBBench INSERT | ×2 |
| DBBench counting | ×2 |
Final dataset size: 5,169 samples
Training Configuration
| Parameter | Value |
|---|---|
| Base model | unsloth/Qwen2.5-7B-Instruct |
| Method | LoRA + Unsloth (Colab Pro A100) |
| Max sequence length | 4096 |
| Epochs | 3 |
| Learning rate | 8e-6 |
| LoRA r / alpha | 64 / 128 |
| Effective batch size | 16 (bs=4 × grad_accum=4) |
Usage
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
model_id = "UtsuSl0th/mixed-lora-1-awq"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoAWQForCausalLM.from_quantized(
model_id,
device_map="auto",
fuse_layers=True,
)
inputs = tokenizer("Your prompt here", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Sources & Terms
Dataset License: MIT License. Users must comply with the MIT license and the base model's original terms of use.
- Downloads last month
- -