Qwen3-8B FormFactory SFT+GRPO LoRA

QLoRA (4-bit NF4) adapter for Qwen/Qwen3-8B, fine-tuned on the FormFactory benchmark for web form-filling.

Training Details

Base model: Qwen/Qwen3-8B
Method: SFT initialization + Online GRPO with real browser execution, G=4, lr=5e-5, KL coeff=0.1
LoRA rank: 16, alpha=32, dropout=0.05, all-linear targets
Quantization: 4-bit NormalFloat (NF4) with double quantization
Compute: Single NVIDIA A10G (24 GB), AWS g5.xlarge on Anyscale Ray

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-8B",
    device_map="auto",
    torch_dtype="auto",
)
model = PeftModel.from_pretrained(base_model, "billyenrizky/Qwen3-8B-FormFactory-GRPO-LoRA")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")

Results on FormFactory (124 held-out tasks)

Split	Nonzero Rate	Avg Reward
Val	100.0%	0.670
Test	100.0%	0.669

Citation

If you use this model, please cite our paper and the FormFactory benchmark:

@article{brillian2026browser,
  title={Browser-in-the-Loop: Reinforcement Fine-Tuning LLM Agents for Web Form Filling},
  author={Brillian, Muhammad Enrizky},
  year={2026}
}

@article{li2025formfactory,
  title={FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents},
  author={Li, B. and Wang, Y. and Fei, H. and Li, J. and Ji, W. and Lee, M.-L. and Hsu, W.},
  journal={arXiv preprint arXiv:2506.01520},
  year={2025}
}

Downloads last month: 38

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for billyenrizky/Qwen3-8B-FormFactory-GRPO-LoRA

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Adapter

(1060)

this model

Paper for billyenrizky/Qwen3-8B-FormFactory-GRPO-LoRA

FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents

Paper • 2506.01520 • Published Jun 2, 2025