---
base_model: unsloth/Qwen2.5-7B-Instruct
datasets:
- u-10bei/sft_alfworld_trajectory_dataset_v5
- u-10bei/dbbench_sft_dataset_react_v4
language:
- en
license: apache-2.0
library_name: autoawq
pipeline_tag: text-generation
tags:
- awq
- 4bit
- quantized
- agent
- tool-use
- alfworld
- dbbench
---

# Qwen2.5-7B-Agent-Mixed-Trajectory-AWQ v3

This repository provides a **4-bit AWQ quantized** version of a merged model fine-tuned from
**unsloth/Qwen2.5-7B-Instruct** using **LoRA + Unsloth**.

The original LoRA adapter was trained on mixed agent trajectory data (ALFWorld + DBBench),
then merged into the base model and quantized with AutoAWQ for faster inference.

## Quantization Details

| Parameter | Value |
|---|---|
| Method | AWQ (Activation-aware Weight Quantization) |
| Bits | 4-bit |
| Group size | 128 |
| Zero point | True |
| Version | GEMM |
| Library | autoawq 0.2.7.post3 |

## Dataset Construction (v3)

Training data was built by mixing and preprocessing two trajectory datasets:
- **ALFWorld** (`u-10bei/sft_alfworld_trajectory_dataset_v5`): 1,845 samples after cleaning and success-only filtering
- **DBBench** (`u-10bei/dbbench_sft_dataset_react_v4`): 1,200 samples after cleaning

Preprocessing steps:
- Removal of htags template contamination
- Removal of hallucinated object IDs (e.g. `bowl 99`) — ALFWorld only
- **[v3 new]** ALFWorld failed trajectories excluded (success-only filtering): 2,327 → 1,845 samples

Category-level upsampling was applied to reinforce weak task types:

| Category | Multiplier |
|---|---|
| ALFWorld multi-object | ×3 |
| ALFWorld cool | ×2 |
| ALFWorld examine | ×1.5 |
| DBBench aggregation-MAX | ×3 |
| DBBench INSERT | ×2 |
| DBBench counting | ×2 |

Final dataset size: **4,687 samples**

## Training Configuration

| Parameter | Value |
|---|---|
| Base model | unsloth/Qwen2.5-7B-Instruct |
| Method | LoRA + Unsloth (Colab Pro L4) |
| Max sequence length | 4096 |
| Epochs | 3 |
| Learning rate | 8e-6 |
| LoRA r / alpha | 64 / 128 |
| Effective batch size | 16 (bs=2 × grad_accum=8) |
| load_in_4bit | True |

## Usage

```python
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_id = "UtsuSl0th/mixed-lora-3-awq"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoAWQForCausalLM.from_quantized(
    model_id,
    device_map="auto",
    fuse_layers=True,
)

inputs = tokenizer("Your prompt here", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Sources & Terms

Dataset License: MIT License.
Users must comply with the MIT license and the base model's original terms of use.