Qwen3-8B FormFactory SFT+GRPO LoRA

QLoRA (4-bit NF4) adapter for Qwen/Qwen3-8B, fine-tuned on the FormFactory benchmark for web form-filling.

Training Details

  • Base model: Qwen/Qwen3-8B
  • Method: SFT initialization + Online GRPO with real browser execution, G=4, lr=5e-5, KL coeff=0.1
  • LoRA rank: 16, alpha=32, dropout=0.05, all-linear targets
  • Quantization: 4-bit NormalFloat (NF4) with double quantization
  • Compute: Single NVIDIA A10G (24 GB), AWS g5.xlarge on Anyscale Ray

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-8B",
    device_map="auto",
    torch_dtype="auto",
)
model = PeftModel.from_pretrained(base_model, "billyenrizky/Qwen3-8B-FormFactory-GRPO-LoRA")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")

Results on FormFactory (124 held-out tasks)

Split Nonzero Rate Avg Reward
Val 100.0% 0.670
Test 100.0% 0.669

Citation

If you use this model, please cite our paper and the FormFactory benchmark:

@article{brillian2026browser,
  title={Browser-in-the-Loop: Reinforcement Fine-Tuning LLM Agents for Web Form Filling},
  author={Brillian, Muhammad Enrizky},
  year={2026}
}

@article{li2025formfactory,
  title={FormFactory: An Interactive Benchmarking Suite for Multimodal Form-Filling Agents},
  author={Li, B. and Wang, Y. and Fei, H. and Li, J. and Ji, W. and Lee, M.-L. and Hsu, W.},
  journal={arXiv preprint arXiv:2506.01520},
  year={2025}
}
Downloads last month
38
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for billyenrizky/Qwen3-8B-FormFactory-GRPO-LoRA

Finetuned
Qwen/Qwen3-8B
Adapter
(1060)
this model

Paper for billyenrizky/Qwen3-8B-FormFactory-GRPO-LoRA