Fahad-sha's picture
Update README.md
f77848f verified

A newer version of the Gradio SDK is available: 6.6.0

Upgrade
metadata
title: Retail RL Reco Explainer
emoji: 🛒
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 4.44.0
python_version: '3.10'
app_file: app.py
pinned: false

Retail Recommendation Explainer — Before vs After RL (DPO)

This Space compares a base instruct model vs a DPO-trained LoRA adapter for retail product recommendations with concise, constraint-aware explanations.

Environment Variables (optional)

  • BASE_MODEL (default: Qwen/Qwen2.5-0.5B-Instruct)
  • ADAPTER_RL_PATH (default: ./adapter_dpo)

How to use

  1. Train and produce adapter_dpo/ using the scripts in trainer/.
  2. Copy the adapter_dpo/ folder into the Space repo (same folder level as app.py).
  3. Run the Space and click Generate (Before vs After).