Upload merged Qwen3-4B-Instruct-2507 model (auto-generated README)

1d8458b verified 1 day ago

1.73 kB

	---
	base_model: Qwen/Qwen2.5-7B-Instruct
	datasets:
	- u-10bei/sft_alfworld_trajectory_dataset_v5
	language:
	- en
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- agent
	- tool-use
	- alfworld
	- dbbench
	---

	# qwen25_7b_lora_agentbench_v21

	This repository provides a merged model fine-tuned from
	Qwen/Qwen2.5-7B-Instruct. The fine-tuning was performed using LoRA + Unsloth and the resulting adapter has been merged back into the base model weights.

	This repository contains full model weights, making it ready for inference
	without the need to load a separate adapter.

	## Training Objective

	This model is optimized for multi-turn agent tasks, specifically for
	ALFWorld (household navigation/interaction) and DBBench (database operations).

	The training process applied loss to all assistant turns in the multi-turn
	trajectories, allowing the model to learn not just final answers, but also
	intermediate reasoning (Thought), environment observation processing,
	action selection, and error recovery.

	## Training Configuration

	- Base model: Qwen/Qwen2.5-7B-Instruct
	- Method: LoRA (merged post-training)
	- Max sequence length: 2048
	- Epochs: 2
	- Learning rate: 2e-06
	- LoRA Parameters: r=64, alpha=128

	## Usage

	This model can be loaded using the standard `transformers` library or
	deployed with `vLLM` (recommended for evaluation).

	### Transformers
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_id = "your_hf_id/your_repo_name"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)