A newer version of the Gradio SDK is available:
6.6.0
metadata
title: Retail RL Reco Explainer
emoji: 🛒
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 4.44.0
python_version: '3.10'
app_file: app.py
pinned: false
Retail Recommendation Explainer — Before vs After RL (DPO)
This Space compares a base instruct model vs a DPO-trained LoRA adapter for retail product recommendations with concise, constraint-aware explanations.
Environment Variables (optional)
BASE_MODEL(default:Qwen/Qwen2.5-0.5B-Instruct)ADAPTER_RL_PATH(default:./adapter_dpo)
How to use
- Train and produce
adapter_dpo/using the scripts intrainer/. - Copy the
adapter_dpo/folder into the Space repo (same folder level asapp.py). - Run the Space and click Generate (Before vs After).