| | --- |
| | title: Retail RL Reco Explainer |
| | emoji: π |
| | colorFrom: indigo |
| | colorTo: pink |
| | sdk: gradio |
| | sdk_version: "4.44.0" |
| | python_version: "3.10" |
| | app_file: app.py |
| | pinned: false |
| | --- |
| | |
| | # Retail Recommendation Explainer β Before vs After RL (DPO) |
| |
|
| | This Space compares a base instruct model vs a DPO-trained LoRA adapter for |
| | retail product recommendations with concise, constraint-aware explanations. |
| |
|
| | ## Environment Variables (optional) |
| | - `BASE_MODEL` (default: `Qwen/Qwen2.5-0.5B-Instruct`) |
| | - `ADAPTER_RL_PATH` (default: `./adapter_dpo`) |
| |
|
| | ## How to use |
| | 1. Train and produce `adapter_dpo/` using the scripts in `trainer/`. |
| | 2. Copy the `adapter_dpo/` folder into the Space repo (same folder level as `app.py`). |
| | 3. Run the Space and click **Generate (Before vs After)**. |
| |
|