UtsuSl0th commited on
Commit
3adec59
·
verified ·
1 Parent(s): 9cc61f0

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -12
README.md CHANGED
@@ -1,21 +1,100 @@
1
  ---
2
  base_model: unsloth/Qwen2.5-7B-Instruct
3
- tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - qwen2
8
- license: apache-2.0
9
  language:
10
  - en
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- # Uploaded finetuned model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
- - **Developed by:** UtsuSl0th
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** unsloth/Qwen2.5-7B-Instruct
 
18
 
19
- This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
1
  ---
2
  base_model: unsloth/Qwen2.5-7B-Instruct
3
+ datasets:
4
+ - u-10bei/sft_alfworld_trajectory_dataset_v5
5
+ - u-10bei/dbbench_sft_dataset_react_v4
 
 
 
6
  language:
7
  - en
8
+ license: apache-2.0
9
+ library_name: autoawq
10
+ pipeline_tag: text-generation
11
+ tags:
12
+ - awq
13
+ - 4bit
14
+ - quantized
15
+ - agent
16
+ - tool-use
17
+ - alfworld
18
+ - dbbench
19
  ---
20
 
21
+ # Qwen2.5-7B-Agent-Mixed-Trajectory-AWQ v3
22
+
23
+ This repository provides a **4-bit AWQ quantized** version of a merged model fine-tuned from
24
+ **unsloth/Qwen2.5-7B-Instruct** using **LoRA + Unsloth**.
25
+
26
+ The original LoRA adapter was trained on mixed agent trajectory data (ALFWorld + DBBench),
27
+ then merged into the base model and quantized with AutoAWQ for faster inference.
28
+
29
+ ## Quantization Details
30
+
31
+ | Parameter | Value |
32
+ |---|---|
33
+ | Method | AWQ (Activation-aware Weight Quantization) |
34
+ | Bits | 4-bit |
35
+ | Group size | 128 |
36
+ | Zero point | True |
37
+ | Version | GEMM |
38
+ | Library | autoawq 0.2.7.post3 |
39
+
40
+ ## Dataset Construction (v3)
41
+
42
+ Training data was built by mixing and preprocessing two trajectory datasets:
43
+ - **ALFWorld** (`u-10bei/sft_alfworld_trajectory_dataset_v5`): 1,845 samples after cleaning and success-only filtering
44
+ - **DBBench** (`u-10bei/dbbench_sft_dataset_react_v4`): 1,200 samples after cleaning
45
+
46
+ Preprocessing steps:
47
+ - Removal of htags template contamination
48
+ - Removal of hallucinated object IDs (e.g. `bowl 99`) — ALFWorld only
49
+ - **[v3 new]** ALFWorld failed trajectories excluded (success-only filtering): 2,327 → 1,845 samples
50
+
51
+ Category-level upsampling was applied to reinforce weak task types:
52
+
53
+ | Category | Multiplier |
54
+ |---|---|
55
+ | ALFWorld multi-object | ×3 |
56
+ | ALFWorld cool | ×2 |
57
+ | ALFWorld examine | ×1.5 |
58
+ | DBBench aggregation-MAX | ×3 |
59
+ | DBBench INSERT | ×2 |
60
+ | DBBench counting | ×2 |
61
+
62
+ Final dataset size: **4,687 samples**
63
+
64
+ ## Training Configuration
65
+
66
+ | Parameter | Value |
67
+ |---|---|
68
+ | Base model | unsloth/Qwen2.5-7B-Instruct |
69
+ | Method | LoRA + Unsloth (Colab Pro L4) |
70
+ | Max sequence length | 4096 |
71
+ | Epochs | 3 |
72
+ | Learning rate | 8e-6 |
73
+ | LoRA r / alpha | 64 / 128 |
74
+ | Effective batch size | 16 (bs=2 × grad_accum=8) |
75
+ | load_in_4bit | True |
76
+
77
+ ## Usage
78
+
79
+ ```python
80
+ from awq import AutoAWQForCausalLM
81
+ from transformers import AutoTokenizer
82
+
83
+ model_id = "UtsuSl0th/mixed-lora-3-awq"
84
+
85
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
86
+ model = AutoAWQForCausalLM.from_quantized(
87
+ model_id,
88
+ device_map="auto",
89
+ fuse_layers=True,
90
+ )
91
 
92
+ inputs = tokenizer("Your prompt here", return_tensors="pt").to("cuda")
93
+ outputs = model.generate(**inputs, max_new_tokens=256)
94
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
95
+ ```
96
 
97
+ ## Sources & Terms
98
 
99
+ Dataset License: MIT License.
100
+ Users must comply with the MIT license and the base model's original terms of use.