Freakz3z
/

Qwen-JSON

Text Generation

reinforcement-learning

Model card Files Files and versions

Freakz3z commited on Nov 28, 2025

Commit

826fc97

·

verified ·

1 Parent(s): 75a7e15

Update README.md

Files changed (1) hide show

README.md +55 -5

README.md CHANGED Viewed

@@ -1,6 +1,56 @@
 ---
-license: mit
-base_model:
-- Qwen/Qwen3-4B-Instruct-2507
-pipeline_tag: translation
----

 ---
+base_model: Qwen/Qwen3-4B-Instruct-2507
+library_name: transformers
+model_name: qwen-json
+tags:
+- unsloth
+- trl
+- grpo
+- reinforcement-learning
+- json
+- recipe
+license: apache-2.0
+language:
+- en
+---
+# RL-Struct: Bridging the Structure Gap
+[中文版本](./README_CN.md)
+We introduce **RL-Struct**, a lightweight Reinforcement Learning framework designed to solve the "Structure Gap"—the tension between probabilistic token generation and deterministic structured formats (e.g., JSON). By leveraging **GRPO (Gradient Regularized Policy Optimization)** and a **Multi-dimensional Reward Function**, our model achieves superior structural reliability without the high inference latency of constrained decoding.
+## 🚀 Key Features
+-   **Multi-dimensional Reward Function**: Decomposes the objective into Structure, Format, Validity, Correctness, and Length.
+-   **Efficient Training**: Uses GRPO to eliminate the critic network, reducing VRAM usage by ~40% compared to PPO.
+-   **Emergent Curriculum**: The model spontaneously learns syntax (how to speak) before semantics (what to say).
+-   **High Performance**: Achieves **89.7% Structural Accuracy** and **92.1% JSON Validity** on complex recipe generation, outperforming LLaMA-3-8B and GPT-3.5.
+## 📊 Model Details
+-   **Base Model:** [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
+-   **Training Method:** GRPO (Reinforcement Learning) + LoRA
+-   **Task:** Structured Output Generation (JSON Recipes, GSM8K-JSON, ToolUse)
+-   **License:** Apache-2.0
+## 🛠️ Usage
+The following is the system prompt:
+```text
+You are a precise recipe assistant. Always respond in the following JSON format:
+{
+  "reasoning": "Your step-by-step reasoning here...",
+  "answer": "{\"name\": \"Recipe Name\", \"nutrition\": \"Calories: ..., Protein: ..., Fat: ...\"}"
+}
+Do not include any other text, explanations, or markdown. Only output valid JSON.
+```
+## 📈 Performance
+| Method | Structural Acc. | JSON Validity | Content Acc. |
+| :--- | :---: | :---: | :---: |
+| GPT-3.5 (Zero-shot) | 45.5% | 82.1% | 88.0% |
+| LLaMA-3-8B (SFT) | 78.2% | 85.4% | 86.0% |
+| **RL-Struct (Ours)** | **89.7%** | **92.1%** | **84.5%** |