Freakz3z
/

Qwen-JSON

Text Generation

reinforcement-learning

Model card Files Files and versions

Qwen-JSON / README.md

nielsr's picture

nielsr HF Staff

Add pipeline tag and link to paper

148b750 verified 3 months ago

|

2.3 kB

	---
	base_model: Qwen/Qwen3-4B-Instruct-2507
	language:
	- en
	library_name: transformers
	license: apache-2.0
	model_name: qwen-json
	pipeline_tag: text-generation
	tags:
	- unsloth
	- trl
	- grpo
	- reinforcement-learning
	- json
	- recipe
	---

	# RL-Struct: Bridging the Structure Gap

	[中文版本](./README_CN.md) \| [📚 Paper](https://huggingface.co/papers/2512.00319)

	We introduce RL-Struct, a lightweight Reinforcement Learning framework designed to solve the "Structure Gap"—the tension between probabilistic token generation and deterministic structured formats (e.g., JSON). By leveraging GRPO (Gradient Regularized Policy Optimization) and a Multi-dimensional Reward Function, our model achieves superior structural reliability without the high inference latency of constrained decoding.

	## 🚀 Key Features

	- Multi-dimensional Reward Function: Decomposes the objective into Structure, Format, Validity, Correctness, and Length.
	- Efficient Training: Uses GRPO to eliminate the critic network, reducing VRAM usage by ~40% compared to PPO.
	- Emergent Curriculum: The model spontaneously learns syntax (how to speak) before semantics (what to say).
	- High Performance: Achieves 89.7% Structural Accuracy and 92.1% JSON Validity on complex recipe generation, outperforming LLaMA-3-8B and GPT-3.5.

	## 📊 Model Details

	- Base Model: [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
	- Training Method: GRPO (Reinforcement Learning) + LoRA
	- Task: Structured Output Generation (JSON Recipes, GSM8K-JSON, ToolUse)
	- License: Apache-2.0

	## 🛠️ Usage

	The following is the system prompt:

	```text
	You are a precise recipe assistant. Always respond in the following JSON format:
	{
	"reasoning": "Your step-by-step reasoning here...",
	"answer": "{\"name\": \"Recipe Name\", \"nutrition\": \"Calories: ..., Protein: ..., Fat: ...\"}"
	}
	Do not include any other text, explanations, or markdown. Only output valid JSON.
	```

	## 📈 Performance

	\| Method \| Structural Acc. \| JSON Validity \| Content Acc. \|
	\| :--- \| :---: \| :---: \| :---: \|
	\| GPT-3.5 (Zero-shot) \| 45.5% \| 82.1% \| 88.0% \|
	\| LLaMA-3-8B (SFT) \| 78.2% \| 85.4% \| 86.0% \|
	\| RL-Struct (Ours) \| 89.7% \| 92.1% \| 84.5% \|