BS4-Qwen3-8B Step 395

Qwen3-8B fine-tuned on BeautifulSoup HTML parsing tasks using reinforcement learning.

Training Details

  • Base model: Qwen/Qwen3-8B
  • Training method: RL with prime-rl
  • Training steps: 395
  • Hardware: 2x H100 80GB
  • LoRA config: rank=8, alpha=32
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

License

Apache 2.0 (same as base model)

Downloads last month
3
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for seconds-0/qwen3-8b-bs4-rl

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Adapter
(488)
this model