--- license: unknown --- # Intro It's RL (Reinforcement Learning) PPO (Proximal Policy Optimization) model for DOOH DSP Bidder problem. The model should respect 4 rules: - even pacing over time - desired publishers distribution (which can be different from publishers distribution in raw bid requests flow). - desired venue types distribution (which can be different from venue types distribution in raw bid requests flow). - desired household sizes distribution (which can be different from household sizes distribution in raw bid requests flow). # Requirements.txt ``` torch==2.10.0 matplotlib==3.10.8 ipython==8.0.0 torchrl==0.11.1 tensordict==0.11.0 numpy==2.4.2 pandas==2.3.3 ``` # Training process ![alt](training_500_014_300_pt_distr_GOOD.png) # Data flow ![alt](bidder_transormer_4_001.png) # Python all-in-one files - [dsp_bidder_4_training.py](https://huggingface.co/StanislavKo28/AdTech_DSP_Bidder___RL_PPO_4_rules/blob/main/p561_dsp_bidder_4_ppo_training.py) - training - [dsp_bidder_4_environment.py](https://huggingface.co/StanislavKo28/AdTech_DSP_Bidder___RL_PPO_4_rules/blob/main/ppoenv/p561_dsp_bidder_4_ppo_environment_003_pt_distr_GOOD.py) - environment - [dsp_bidder_4_inference.py](https://huggingface.co/StanislavKo28/AdTech_DSP_Bidder___RL_PPO_4_rules/blob/main/p591_dsp_bidder_4_ppo_inference.py) - inference - [dsp_bidder_4_inference_once.py](https://huggingface.co/StanislavKo28/AdTech_DSP_Bidder___RL_PPO_4_rules/blob/main/p592_dsp_bidder_4_ppo_inference_once.py) - inference single bid request