UtsuSl0th
/

mixed-lora-final-AWQ

@@ -1,21 +1,100 @@
 ---
 base_model: unsloth/Qwen2.5-7B-Instruct
-tags:
-- text-generation-inference
-- transformers
-- unsloth
-- qwen2
-license: apache-2.0
 language:
 - en
 ---
-# Uploaded finetuned  model
-- **Developed by:** UtsuSl0th
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/Qwen2.5-7B-Instruct
-This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 ---
 base_model: unsloth/Qwen2.5-7B-Instruct
+datasets:
+- u-10bei/sft_alfworld_trajectory_dataset_v5
+- u-10bei/dbbench_sft_dataset_react_v4
 language:
 - en
+license: apache-2.0
+library_name: autoawq
+pipeline_tag: text-generation
+tags:
+- awq
+- 4bit
+- quantized
+- agent
+- tool-use
+- alfworld
+- dbbench
 ---
+# Qwen2.5-7B-Agent-Mixed-Trajectory-AWQ v3
+This repository provides a **4-bit AWQ quantized** version of a merged model fine-tuned from
+**unsloth/Qwen2.5-7B-Instruct** using **LoRA + Unsloth**.
+The original LoRA adapter was trained on mixed agent trajectory data (ALFWorld + DBBench),
+then merged into the base model and quantized with AutoAWQ for faster inference.
+## Quantization Details
+| Parameter | Value |
+|---|---|
+| Method | AWQ (Activation-aware Weight Quantization) |
+| Bits | 4-bit |
+| Group size | 128 |
+| Zero point | True |
+| Version | GEMM |
+| Library | autoawq 0.2.7.post3 |
+## Dataset Construction (v3)
+Training data was built by mixing and preprocessing two trajectory datasets:
+- **ALFWorld** (`u-10bei/sft_alfworld_trajectory_dataset_v5`): 1,845 samples after cleaning and success-only filtering
+- **DBBench** (`u-10bei/dbbench_sft_dataset_react_v4`): 1,200 samples after cleaning
+Preprocessing steps:
+- Removal of htags template contamination
+- Removal of hallucinated object IDs (e.g. `bowl 99`) — ALFWorld only
+- **[v3 new]** ALFWorld failed trajectories excluded (success-only filtering): 2,327 → 1,845 samples
+Category-level upsampling was applied to reinforce weak task types:
+| Category | Multiplier |
+|---|---|
+| ALFWorld multi-object | ×3 |
+| ALFWorld cool | ×2 |
+| ALFWorld examine | ×1.5 |
+| DBBench aggregation-MAX | ×3 |
+| DBBench INSERT | ×2 |
+| DBBench counting | ×2 |
+Final dataset size: **4,687 samples**
+## Training Configuration
+| Parameter | Value |
+|---|---|
+| Base model | unsloth/Qwen2.5-7B-Instruct |
+| Method | LoRA + Unsloth (Colab Pro L4) |
+| Max sequence length | 4096 |
+| Epochs | 3 |
+| Learning rate | 8e-6 |
+| LoRA r / alpha | 64 / 128 |
+| Effective batch size | 16 (bs=2 × grad_accum=8) |
+| load_in_4bit | True |
+## Usage
+```python
+from awq import AutoAWQForCausalLM
+from transformers import AutoTokenizer
+model_id = "UtsuSl0th/mixed-lora-3-awq"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoAWQForCausalLM.from_quantized(
+    model_id,
+    device_map="auto",
+    fuse_layers=True,
+)
+inputs = tokenizer("Your prompt here", return_tensors="pt").to("cuda")
+outputs = model.generate(**inputs, max_new_tokens=256)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+## Sources & Terms
+Dataset License: MIT License.
+Users must comply with the MIT license and the base model's original terms of use.