StanislavKo28's picture
Upload README.md
febd47d verified
metadata
license: unknown

Intro

It's RL (Reinforcement Learning) PPO (Proximal Policy Optimization) model for DOOH DSP Bidder problem. The model should respect 4 rules:

  • even pacing over time
  • desired publishers distribution (which can be different from publishers distribution in raw bid requests flow).
  • desired venue types distribution (which can be different from venue types distribution in raw bid requests flow).
  • desired household sizes distribution (which can be different from household sizes distribution in raw bid requests flow).

Requirements.txt

torch==2.10.0
matplotlib==3.10.8
ipython==8.0.0
torchrl==0.11.1
tensordict==0.11.0
numpy==2.4.2
pandas==2.3.3

Training process

alt

Data flow

alt

Python all-in-one files