hara-CU
/

Advanced_FinalCandidate_482

@@ -1,21 +1,64 @@
 ---
-base_model: unsloth/Qwen3-4B-Instruct-2507
-tags:
-- text-generation-inference
-- transformers
-- unsloth
-- qwen3
-license: apache-2.0
 language:
 - en
 ---
-# Uploaded finetuned  model
-- **Developed by:** hara-CU
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/Qwen3-4B-Instruct-2507
-This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 ---
+base_model: Qwen/Qwen3-4B-Instruct-2507
+datasets:
+- hara-CU/LLM2025_DB_base_AW_345NoEAd_ALFformat_QH5L4R5_1392
 language:
 - en
+license: mit
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- lora
+- agent
+- tool-use
+- alfworld
+- dbbench
 ---
+# Qwen3-4B-DBbase_AW_345NoEAd_ALFformat_QH5L4R5_1392-r16a32-B16-2ep-5e6
+This repository provides a **LoRA adapter** fine-tuned from
+**Qwen/Qwen3-4B-Instruct-2507** using **LoRA + Unsloth**.
+## Training Objective
+This model is trained to improve **multi-turn agent task performance**
+on ALFWorld (household tasks) and DBBench (database operations).
+Loss is applied to **all assistant turns** in the multi-turn trajectory,
+enabling the model to learn environment observation, action selection,
+tool use, and recovery from errors.
+## Training Configuration
+- Base model: Qwen/Qwen3-4B-Instruct-2507
+- Method: LoRA (full precision base)
+- Max sequence length: 8192
+- Epochs: 2
+- Learning rate: 5e-06
+- LoRA: r=16, alpha=32, use_rslora=False
+- TOTAL_BATCH_SIZE: 16
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+model_id = "hara-CU/Qwen3-4B-DBbase_AW_345NoEAd_ALFformat_QH5L4R5_1392-r16a32-B16-2ep-5e6"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.float16,
+    device_map="auto",
+)
+```
+## Sources & Terms (IMPORTANT)
+Training data: hara-CU/LLM2025_DB_base_AW_345NoEAd_ALFformat_QH5L4R5_1392
+Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License.
+Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.