BS4-Qwen3-8B Step 395
Qwen3-8B fine-tuned on BeautifulSoup HTML parsing tasks using reinforcement learning.
Training Details
- Base model: Qwen/Qwen3-8B
- Training method: RL with prime-rl
- Training steps: 395
- Hardware: 2x H100 80GB
- LoRA config: rank=8, alpha=32
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
License
Apache 2.0 (same as base model)
- Downloads last month
- 3