<Qwen2.5-7B-Agent-Trajectory-LoRA AWQ 4bit>
This repository provides an AWQ 4-bit quantized version of UtsuSl0th/trajectory-lora-repo, which is a LoRA adapter fine-tuned from unsloth/Qwen2.5-7B-Instruct using LoRA + Unsloth, and subsequently merged into a single standalone model.
Note: This is the quantized, ready-to-run version. The original LoRA adapter weights (non-quantized) are available at the base repository above.
What is AWQ?
AWQ (Activation-aware Weight Quantization) is a 4-bit quantization method that preserves model quality by identifying and protecting the most important weights. This quantization was performed using AutoAWQ with the following configuration:
| Parameter | Value |
|---|---|
| Bits | 4 |
| Group size | 128 |
| Zero point | True |
| Version | GEMM |
Training Objective (Original Model)
The source adapter was trained to improve multi-turn agent task performance on ALFWorld (household tasks) and DBBench (database operations).
Loss was applied to all assistant turns in the multi-turn trajectory, enabling the model to learn environment observation, action selection, tool use, and recovery from errors.
Training Configuration (Original LoRA)
| Parameter | Value |
|---|---|
| Base model | unsloth/Qwen2.5-7B-Instruct |
| Method | LoRA + Unsloth (Colab Pro A100) |
| Dataset | u-10bei/sft_alfworld_trajectory_dataset_v5 |
| Max sequence length | 4096 |
| Epochs | 2 |
| Learning rate | 2e-5 |
| LoRA r | 64 |
| LoRA alpha | 128 |
| LoRA dropout | 0 |
| LoRA target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Per-device batch size | 4 |
| Gradient accumulation | 4 (effective batch size: 16) |
| Warmup ratio | 0.1 |
| Weight decay | 0.05 |
| Seed | 3407 |
Usage
With vLLM (Recommended — fastest inference)
from vllm import LLM, SamplingParams
llm = LLM(
model="UtsuSl0th/trajectory-lora-repo-AWQ",
quantization="awq",
dtype="auto",
)
sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
outputs = llm.generate(["Your prompt here"], sampling_params)
print(outputs[0].outputs[0].text)
With AutoAWQ + Transformers
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
model_id = "UtsuSl0th/trajectory-lora-repo-AWQ"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoAWQForCausalLM.from_quantized(
model_id,
fuse_layers=True,
trust_remote_code=False,
safetensors=True,
)
inputs = tokenizer("Your prompt here", return_tensors="pt").to("cuda")
output = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))
With Transformers (standard pipeline)
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "UtsuSl0th/trajectory-lora-repo-AWQ"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
)
Sources & Terms (IMPORTANT)
- Original adapter: UtsuSl0th/trajectory-lora-repo
- Training data: u-10bei/sft_alfworld_trajectory_dataset_v5
Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License.
Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.
- Downloads last month
- 23
Model tree for UtsuSl0th/trajectory-lora-repo-AWQ
Base model
Qwen/Qwen2.5-7B