--- title: Retail RL Reco Explainer emoji: 🛒 colorFrom: indigo colorTo: pink sdk: gradio sdk_version: "4.44.0" python_version: "3.10" app_file: app.py pinned: false --- # Retail Recommendation Explainer — Before vs After RL (DPO) This Space compares a base instruct model vs a DPO-trained LoRA adapter for retail product recommendations with concise, constraint-aware explanations. ## Environment Variables (optional) - `BASE_MODEL` (default: `Qwen/Qwen2.5-0.5B-Instruct`) - `ADAPTER_RL_PATH` (default: `./adapter_dpo`) ## How to use 1. Train and produce `adapter_dpo/` using the scripts in `trainer/`. 2. Copy the `adapter_dpo/` folder into the Space repo (same folder level as `app.py`). 3. Run the Space and click **Generate (Before vs After)**.