Burnt-Toast
/

fujin-9b

Text Generation

image-text-to-text

Model card Files Files and versions

fujin-9b / README.md

ToastyPigeon's picture

Upload folder using huggingface_hub

3b7b868 verified 5 days ago

|

history blame contribute delete

2.73 kB

	---
	base_model: Qwen/Qwen3.5-9B
	library_name: peft
	model_name: output-fujin
	tags:
	- base_model:adapter:Qwen/Qwen3.5-9B
	- lora
	- sft
	- transformers
	- trl
	licence: license
	pipeline_tag: text-generation
	---

	# output-fujin

	This model is a fine-tuned version of [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B).

	W&B run: [https://wandb.ai/cooawoo-personal/huggingface/runs/sr7glk4m](https://wandb.ai/cooawoo-personal/huggingface/runs/sr7glk4m)
	## Training procedure

	### Hyperparameters

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Learning rate \| `0.0002` \|
	\| LR scheduler \| SchedulerType.COSINE \|
	\| Per-device batch size \| 1 \|
	\| Gradient accumulation \| 8 \|
	\| Effective batch size \| 8 \|
	\| Epochs \| 1 \|
	\| Max sequence length \| 2048 \|
	\| Optimizer \| OptimizerNames.PAGED_ADEMAMIX_8BIT \|
	\| Weight decay \| 0.01 \|
	\| Warmup ratio \| 0.05 \|
	\| Max gradient norm \| 1.0 \|
	\| Precision \| bf16 \|
	\| Loss type \| nll \|


	### LoRA configuration

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Rank (r) \| 128 \|
	\| Alpha \| 16 \|
	\| Dropout \| 0.05 \|
	\| Target modules \| attn.proj, down_proj, gate_proj, in_proj_a, in_proj_b, in_proj_qkv, in_proj_z, k_proj, linear_fc1, linear_fc2, o_proj, out_proj, q_proj, qkv, up_proj, v_proj \|
	\| Quantization \| 4-bit (nf4) \|


	### Dataset statistics

	\| Dataset \| Samples \| Total tokens \| Trainable tokens \|
	\|---------\|--------:\|-------------:\|-----------------:\|
	\| rpDungeon/some-revised-datasets/rosier_inf_strict_text.parquet \| 36,438 \| 65,084,381 \| 65,084,381 \|


	<details>
	<summary>Training config</summary>

	```yaml
	model_name_or_path: Qwen/Qwen3.5-9B
	bf16: true
	gradient_checkpointing: true
	gradient_checkpointing_kwargs:
	use_reentrant: false
	use_liger: true
	max_length: 2048
	learning_rate: 0.0002
	warmup_ratio: 0.05
	weight_decay: 0.01
	lr_scheduler_type: cosine
	label_smoothing_factor: 0.1
	per_device_train_batch_size: 1
	gradient_accumulation_steps: 8
	optim: paged_ademamix_8bit
	max_grad_norm: 1.0
	use_peft: true
	load_in_4bit: true
	lora_r: 128
	lora_alpha: 16
	lora_dropout: 0.05
	logging_steps: 1
	disable_tqdm: true
	save_strategy: steps
	save_steps: 500
	save_total_limit: 3
	report_to: wandb
	output_dir: output-fujin
	data_config: data.yaml
	prepared_dataset: prepared
	num_train_epochs: 1
	saves_per_epoch: 3
	run_name: qwen35-9b-qlora
	```

	</details>

	<details>
	<summary>Data config</summary>

	```yaml
	datasets:
	- path: rpDungeon/some-revised-datasets
	data_files: rosier_inf_strict_text.parquet
	type: text
	truncation_strategy: split
	shuffle_datasets: true
	shuffle_combined: true
	shuffle_seed: 42
	eval_split: 0.0
	split_seed: 42
	assistant_only_loss: false
	```

	</details>

	### Framework versions

	- PEFT 0.18.1
	- Loft: 0.1.0
	- Transformers: 5.2.0
	- Pytorch: 2.10.0
	- Datasets: 4.5.0
	- Tokenizers: 0.22.2