<Qwen2.5-7B-Agent-Trajectory-LoRA AWQ 4bit>

This repository provides an AWQ 4-bit quantized version of UtsuSl0th/trajectory-lora-repo, which is a LoRA adapter fine-tuned from unsloth/Qwen2.5-7B-Instruct using LoRA + Unsloth, and subsequently merged into a single standalone model.

Note: This is the quantized, ready-to-run version. The original LoRA adapter weights (non-quantized) are available at the base repository above.

What is AWQ?

AWQ (Activation-aware Weight Quantization) is a 4-bit quantization method that preserves model quality by identifying and protecting the most important weights. This quantization was performed using AutoAWQ with the following configuration:

Parameter Value
Bits 4
Group size 128
Zero point True
Version GEMM

Training Objective (Original Model)

The source adapter was trained to improve multi-turn agent task performance on ALFWorld (household tasks) and DBBench (database operations).

Loss was applied to all assistant turns in the multi-turn trajectory, enabling the model to learn environment observation, action selection, tool use, and recovery from errors.

Training Configuration (Original LoRA)

Parameter Value
Base model unsloth/Qwen2.5-7B-Instruct
Method LoRA + Unsloth (Colab Pro A100)
Dataset u-10bei/sft_alfworld_trajectory_dataset_v5
Max sequence length 4096
Epochs 2
Learning rate 2e-5
LoRA r 64
LoRA alpha 128
LoRA dropout 0
LoRA target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Per-device batch size 4
Gradient accumulation 4 (effective batch size: 16)
Warmup ratio 0.1
Weight decay 0.05
Seed 3407

Usage

With vLLM (Recommended — fastest inference)

from vllm import LLM, SamplingParams

llm = LLM(
    model="UtsuSl0th/trajectory-lora-repo-AWQ",
    quantization="awq",
    dtype="auto",
)

sampling_params = SamplingParams(temperature=0.7, max_tokens=512)
outputs = llm.generate(["Your prompt here"], sampling_params)
print(outputs[0].outputs[0].text)

With AutoAWQ + Transformers

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_id = "UtsuSl0th/trajectory-lora-repo-AWQ"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoAWQForCausalLM.from_quantized(
    model_id,
    fuse_layers=True,
    trust_remote_code=False,
    safetensors=True,
)

inputs = tokenizer("Your prompt here", return_tensors="pt").to("cuda")
output = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(output[0], skip_special_tokens=True))

With Transformers (standard pipeline)

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "UtsuSl0th/trajectory-lora-repo-AWQ"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
)

Sources & Terms (IMPORTANT)

Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License.
Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.

Downloads last month
23
Safetensors
Model size
8B params
Tensor type
I32
·
BF16
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for UtsuSl0th/trajectory-lora-repo-AWQ

Base model

Qwen/Qwen2.5-7B
Quantized
(1)
this model

Dataset used to train UtsuSl0th/trajectory-lora-repo-AWQ