107 kB

Title: 1 Introduction

URL Source: https://arxiv.org/html/2302.01523

Markdown Content: \DoubleSpacedXI\TITLE

Fairness in the Autobidding World with Machine-learned Advice \ARTICLEAUTHORS\AUTHORYuan Deng \AFFGoogle, \EMAILdengyuan@google.com, \AUTHORNegin Golrezaei \AFFSloan School of Management, Massachusetts Institute of Technology \EMAILgolrezaei@mit.edu, \AUTHORPatrick Jaillet \AFFDepartment of Electrical Engineering and Computer Science, Massachusetts Institute of Technology \EMAILjaillet@mit.edu, \AUTHORJason Cheuk Nam Liang \AFFOperations Research Center, Massachusetts Institute of Technology \EMAILjcnliang@mit.edu, \AUTHORVahab Mirrokni \AFFGoogle, \EMAILmirrokni@google.com,

\ABSTRACT

The increasing availability of real-time data has fueled the prevalence of algorithmic bidding (or autobidding) in online advertising markets, and has enabled online ad platforms to produce signals through machine learning techniques (i.e., ML advice) on advertisers’ true perceived values for ad conversions. Previous works have studied the auction design problem while incorporating ML advice through various forms to improve total welfare of advertisers. Yet, such improvements could come at the cost of individual bidders’ welfare, consequently eroding fairness of the ad platform. Motivated by this, we study how ad platforms can utilize ML advice to improve welfare guarantees and fairness on the individual bidder level in the autobidding world. We focus on a practical setting where ML advice takes the form of lower confidence bounds (or confidence intervals). We motivate a simple approach that directly sets such advice as personalized reserve prices when the platform consists of value-maximizing autobidders who are subject to return-on-ad spent (ROAS) constraints competing in multiple parallel auctions. Under parallel VCG auctions with ML advice-based reserves, we present a worst-case welfare lower-bound guarantee for individual agents, and show that platform fairness is positively correlated with ML advice quality. We also present an instance that demonstrates our welfare guarantee is tight. Further, we prove an impossibility result showing that no truthful, and possibly randomized mechanism with anonymous allocations and ML advice as personalized reserves can achieve universally better fairness guarantees than VCG when coupled with ML advice of the same quality. Finally, we extend our fairness guarantees with ML advice to generalized first price (GFP) and generalized second price (GSP) auctions.

\KEYWORDS

Fairness, mechanism design, machine-learned advice, welfare maximization

1 Introduction

In today’s online advertisers world, advertisers (including but not limited to small businesses, marketing practitioners, non-profits, etc) have been embracing an expanding array of advertising platforms such as search engines, social media platforms, web publisher display etc. which present a plenitude of channels for advertisers to procure ad impressions and obtain traffic. In this growing multi-channel environment, the booming online advertising activities have fueled extensive research and technological advancements in attribution analytics to answer questions like which channels are more effective in targeting certain users? Or, which channels produce more user conversion (e.g. ad clicks) or return-on-investment (ROI) with the same amount of investments? (see kannan2016path for a comprehensive survey on attribution analytics). Yet, this area of research has largely left out a crucial phase in the workflow of advertisers’ creation of a digital ad campaign, namely how advertisers interact with advertising channels, which is the physical starting point of a campaign.

To illustrate the significance of advertiser-channel interactions, consider for example a small business who is relatively well-informed through attribution research that Google Ads and Meta ads are the two most effective channels for its products. The business instantiates its ad campaigns through interacting with the platforms’ ad management interfaces (see Figure 1), on which the business utilizes levers such as specifying budget and a target ROI111Target ROI is the numerical inverse of CPA or cost per action on Google Ads, and cost per result goal in Meta Ads. to control campaigns. Channels then input these specified parameters into their autobidding procedures, where they procure impressions on the advertiser’s behalf through automated blackbox algorithms. Eventually, channels report performance metrics such as expenditure and conversion back to the advertiser once the campaign ends. Therefore, one of the most important decisions advertisers need to make involves how to optimize over these levers provided by channels. Unfortunately, this has rarely been addressed in attribution analytics and relevant literature. Hence, this works contributes to filling this vacancy by addressing two themes of practical significance: {itquote} How effective are these channel levers for advertisers to achieve their conversion goals? And how should advertisers optimize decisions for such levers?

Figure 1: Interfaces on Google Ads (left) and Meta Ads Manager (right) for creating advertising campaigns that allow advertisers to set budgets, target ROIs, and campaign duration. CPA, or cost per action on Google Ads, as well as cost per result goal on Meta Ads Manager, is effectively the inverse value for an advertiser’s per-channel target ROI. Meta Ads Manager specifically highlights that the impression procurement methodology via autobidding maximizes total conversion while respecting advertisers’ per-channel target ROI (see red box highlighted), providing evidence that supports the GL-OPT and CH-OPT models in Eqs. (1) and (3), respectively.

To answer these questions, we study a setting where an advertiser simultaneously procures ads on multiple channels, each of which consists of multiple ad auctions that sell ad impressions. The advertiser’s global optimization problem is to maximize total conversion over all channels, while respecting a global budget constraint that limits total spend, and a global ROI constraint that ensures total conversion is at least the target ROI times total spend. However, channels operate as independent entities and conduct autobidding procurement on behalf of advertisers, thereby there are no realistic means for an advertiser to implement the global optimization problem via optimizing over individual auctions. Instead, advertisers can only use two levers, namely a per-channel ROI and per-channel budget, to influence how channels should autobid for impressions. Our goal is to understand how effective these levers are by comparing the total conversion via optimizing levers with the globally optimal conversion, and also present methodologies to help advertisers optimize over the usage of these levers. We summarize our contributions as followed:

1.1 Main contributions

Modelling ad procurement through per-channel ROI and budget levers.

In Section 2 we develop a novel model for online advertisers to optimize over the per-channel ROI and budget levers to maximize total conversion over channels while respecting a global ROI and budget constraint. This multi-channel optimization model closely imitates real-world practices (see Figure 1 for evidence), and to the best of our knowledge is the first of its kind to characterize advertisers’ interactions with channels to run ad campaigns.

Solely optimizing per-channel budgets are sufficient to maximize conversion.

In Theorem LABEL:thm:roifail of Section LABEL:sec:futil, we show that solely optimizing for per-channel ROIs is inadequate to optimize conversion across all channels, possibly resulting in arbitrary worse total conversions compared to the hypothetical global optimal where advertisers can optimize over individual auctions. In contrast, in Theorem LABEL:thm:budgonlysuff and Corollary LABEL:cor:roiredund we show that solely optimizing for per-channel budgets allows an advertiser to achieve the global optimal.

Algorithm to optimize per-channel budget levers.

Under a realistic bandit feedback structure where advertisers can only observe the total conversion and spend in each channel after making a per-channel budget decision, in Section LABEL:sec:online, we develop an algorithm that augments stochastic gradient descent (SGD) with the upper-confidence bound (UCB) algorithm, and eventually outputs within 𝑇 iterations a per-channel budget profile with which advertisers can achieve 𝒪 ⁢ ( 𝑇 − 1 / 3 ) approximation accuracy in total conversion to that of the optimal per-channel budget profile. Our algorithm relates to constrained convex optimization with uncertain constraints and bandit feedback under a “one point estimation” regime, and to the best of our knowledge, our proposed algorithm is the first to handle such a setting; see more discussions in Section 1.2 and Remark LABEL:rmk:onepoint of Section LABEL:sec:online.

Extensions to general advertiser objectives and mutli-impression auctions.

In Sections LABEL:sec:multi and LABEL:sec:extend, we shed light on the applicability of our results in Sections LABEL:sec:futil and LABEL:sec:online to more general settings when auctions correspond to the sale of multiple auctions, or when advertisers aim to optimize a private cost model instead of conversion.

Numerical studies.

In Section LABEL:sec:num, we conduct numerical studies to demonstrate that our proposed algorithm accurately approximates optimal per-channel budgets even with a moderately small number of data points, and that its performance gracefully deteriorates when channels do not optimally procure ads on advertisers’ behalf.

1.2 Related works.

Generally speaking, our work focuses on advertisers’ impression procurement process or the interactions between advertisers and impression sellers, which has been addressed in a vast amount of literature in mechanism design and online learning; see e.g. braverman2018selling, deng2019prior, golrezaei2019dynamic, golrezaei2019incentive, balseiro2019black, golrezaei2021bidding to name a few. Here, we review literature that relate to key themes of this work, namely autobidding, budget and ROI management, and constrained optimization with bandit feedback.

Autobidding.

There has been a rich line of research that model the autobidding setup as well as budget and ROI management strategies. The autobidding model has been formally developed in aggarwal2019autobidding, and has been analyzed through the lens of welfare efficiency or price of anarchy in deng2021towards, balseiro2021robust, deng2022efficiency, mehta2022auction, as well as individual advertiser welfare in deng2022fairness. The autobidding model has also been compared to classic quasi-linear utility models in balseiro2021landscape. The autobidding model considered in these papers assume advertisers can directly optimize over individual auctions, whereas in this work we address a more realistic setting that mimics practice where advertisers can only use levers provided by channels, and let channels procure ads on their behalf. Finally, a recent paper alimohammadi2023incentive studies whether advertisers have incentive to misreport their target ROIs or budgets to a single autobidding platforms, whereas in this paper we optimize over per-channel budget decisions over multiple channels.

Budget and ROI management.

Budget and ROI management strategies have been widely studied in the context of mechanism design and online learning. balseiro2017budget studies the “system equilibria” of a range of budget management strategies in terms of the platforms’ profits and advertisers’ utility; balseiro2019learning, balseiro2022best study online bidding algorithms (called pacing) that help advertisers achieve high utility in repeated second-price auctions while maintaining a budget constraint, whereas feng2022online studies similar algorithms but considers respecting a long term ROI constraint in addition to a fixed budget. On the other hand, there has been a recent line of work that study the setting where multiple budget or ROI constrained bidders run pacing-type algorithms, and analyze time-average welfare guarantees among all bidders gaitonde2022budget, lucier2023autobidders. All of these works on budget and ROI management focus on bidding strategies in a single repeated auction where advertisers’ decisions are bid values submitted directly to the auctions. In contrast, this work focuses on the setting where advertisers procure ads from multiple auctions through channels, and make decisions on how to adjust the per-channel ROI and budget levers while leaving the bidding to channels’ blackbox algorithms.

Online optimization.

Section LABEL:sec:online where we develop an algorithm to optimize over per-channel target ROI and budgets relates to the area of convex constrained optimization with bandit feedback (also referred to as zero-order or gradient-less feedback) since in light of Lemma LABEL:lem:Vstruct in Section LABEL:sec:online our problem of interest is also constrained and convex. First, there has been a plenitude of algorithms developed for deterministic constrained convex optimization under a bandit feedback structures where function evaluations for the objective and constraints are non-stochastic. Such algorithms include filter methods audet2004pattern, pourmohamad2020statistical, barrier-type methods fasano2014linesearch, dzahini2022constrained, as well as Nelder-Mead type algorithms bHurmen2006grid, audet2018mesh; see nguyen2022stochastic and references therein for a comprehensive survey. In contrast to these works, our optimization algorithm developed in Section LABEL:sec:online handles noisy bandit feedback.

Regarding works that also address stochastic settings, flaxman2004online presents online optimization algorithms under the known constraint regime, which assumes the optimizer can evaluate whether all constraints are satisfied, i.e. constraints are analytically available. Further, the algorithm achieves a 𝒪 ⁢ ( 𝑇 − 1 / 4 ) accuracy. In this work, our setting is more complex as the optimizer (i.e. the advertiser) cannot tell whether the ROI constrained is satisfied (due to unknown value and cost distributions in each channels’ auctions). Yet our proposed algorithm can still achieve a more superior 𝒪 ⁢ ( 𝑇 − 1 / 3 ) accuracy.

Most relevant to this paper is the very recent works usmanova2019safe, nguyen2022stochastic, which considers a similar setting to ours that optimizes for a constrained optimization problem where the objective and constraints are only available through noisy function value evaluations (i.e. unknown constraints). usmanova2019safe focuses on a special (unknown) linear constraint setting, and nguyen2022stochastic extends to general convex constraints. Although usmanova2019safe and nguyen2022stochastic achieve 𝒪 ⁢ ( 𝑇 − 1 ) and 𝒪 ⁢ ( 𝑇 − 1 / 2 ) approximation accuracy to the optimal solution which contrasts our 𝒪 ⁢ ( 𝑇 − 1 / 3 ) accuracy, these works imposes several assumptions that are stronger than the ones that we consider. First, the objective and constraint functions are strongly smooth (i.e. the gradients are Lipschitz continuous) and nguyen2022stochastic further assume strong convexity. But in our work, our objectives and constraints are piece-wise linear and do not satisfy such salient properties. Second, and most importantly, both works consider a setting with “two point estimations” that allows the optimizer to access the objective and constraint function values twice in each iteration, enabling more efficient estimations. This work, however, lies in the one-point setting where we can only access function values once per iteration. Finally, we remark that the optimal accuracy/oracle complexity for the one-point setting for constrained (non-smooth) convex optimization with bandit feedback and unknown constraints remains an open question; see Remark LABEL:rmk:onepoint in Section LABEL:sec:online for more details. We refer readers to Table 4.1 in larson2019derivative for a survey on best known bounds under different one-point bandit feedback settings.

2 Preliminaries

Advertisers’ global optimization problem. Consider an advertiser running a digital ad campaign to procure ad impressions on 𝑀 ∈ ℕ platforms such as Google Ads, Meta Ads Manager etc., each of which we call a channel. Each channel 𝑗 consists of 𝑚 𝑗 ∈ ℕ parallel ad auctions, each of which corresponds to the sale of an ad impression.222Ad auctions for each channel may be run by the channel itself or other external ad inventory suppliers such as web publishers. An ad auction 𝑛 ∈ [ 𝑚 𝑗 ] is associated with a value 𝑣 𝑗 , 𝑛 ≥ 0 that represents the expected conversion (e.g. number of clicks) of the impression on sale, and a cost 𝑑 𝑗 , 𝑛 ≥ 0 that is required for the purchase of the impression. For example, the cost in a single slot second-price auction is the highest competing bid of competitors in the market, and in a posted price auction the cost is simply the posted price by the seller of the impression. Writing 𝒗 𝑗

( 𝑣 𝑗 , 𝑛 ) 𝑛 ∈ [ 𝑚 𝑗 ] and 𝒅 𝑗

( 𝑑 𝑗 , 𝑛 ) 𝑛 ∈ [ 𝑚 𝑗 ] , we assume that 𝒛 𝑗 := ( 𝒗 𝑗 , 𝒅 𝑗 ) is sampled from some discrete distribution 𝒑 𝑗 supported on some finite set 𝐹 𝑗 ⊆ ℝ + 𝑚 𝑗 × ℝ + 𝑚 𝑗 .

The advertiser’s goal is to maximize total conversion of procured ad impressions, while subject to a return-on-investment (ROI) constraint that states total conversion across all channels is no less than 𝛾 times total spend for some pre-specified target ROI 0 < 𝛾 < ∞ , as well as a budget constraint that states total spend over all channels is no greater than the total budget 𝜌 ≥ 0 . Mathematically, the advertiser’s global optimization problem across all 𝑀 channels can be written as:

GL-OPT

max 𝒙 1 , … , 𝒙 𝑀

∑ 𝑗 ∈ [ 𝑀 ] 𝔼 ⁢ [ 𝒗 𝑗 ⊤ ⁢ 𝒙 𝑗 ]

s.t.

∑ 𝑗 ∈ [ 𝑀 ] 𝔼 ⁢ [ 𝒗 𝑗 ⊤ ⁢ 𝒙 𝑗 ] ≥ 𝛾 ⁢ ∑ 𝑗 ∈ [ 𝑀 ] 𝔼 ⁢ [ 𝒅 𝑗 ⊤ ⁢ 𝒙 𝑗 ]

∑ 𝑗 ∈ [ 𝑀 ] 𝔼 ⁢ [ 𝒅 𝑗 ⊤ ⁢ 𝒙 𝑗 ] ≤ 𝜌

𝒙 𝑗 ∈ [ 0 , 1 ] 𝑚 𝑗 𝑗 ∈ [ 𝑀 ] . (1)

Here, the decision variable 𝒙 𝑗 ∈ [ 0 , 1 ] 𝑚 𝑗 is a vector where 𝑥 𝑗 , 𝑛 denotes whether impression in auction 𝑛 for channel 𝑗 is procured. We remark that 𝒙 depends on the realization of 𝒛

( 𝒗 𝑗 , 𝒅 𝑗 ) 𝑗 ∈ [ 𝑀 ] and is also random. We note that the ROI and budget constraints are taken in expectation because an advertiser procures impressions from a very large number of auctions (since the number of auctions in each platform is typically very large) and thus the advertiser only demands to satisfy constraints in an average sense. We note that GL-OPT is a widely adopted formulation for autobidding practices in modern online advertising, which represents advertisers’ conversion maximizing behavior while respecting certain financial targets for ROIs and budgets; see e.g. aggarwal2019autobidding, balseiro2021robust, deng2021towards, deng2022efficiency. In Section LABEL:subsec:genutil we discuss more general advertiser objectives, e.g. maximizing quasi-linear utility.

Our overarching goal of this work is to develop methodologies that enable an advertiser to achieve total campaign conversion that match GL-OPT while respecting her global ROI 𝛾 and budget 𝜌 . However, directly optimizing GL-OPT may not be plausible as we discuss in the following.

Advertisers’ levers to solve their global problems. To solve the global optimization problem GL-OPT, ideally advertisers would like to optimize over individual auctions across all channels. However, in reality channels operate as independent entities, and typically do not provide means for general advertisers to participate in specific individual auctions at their discretion. Instead, channels provide advertisers with specific levers to express their ad campaign goals on spend and conversion. In this work, we focus on two of the most widely used levers, namely the per-channel ROI target and per-channel budget (see illustration in Fig. 1). After an advertiser inputs these parameters to a channel, the channel then procures on behalf of the advertiser through autonomous programs (we call this programmatic process autobidding) to help advertiser achieve procurement results that match with the inputs. We will elaborate on this process later.

Formally, we consider the setting where for each channel 𝑗 ∈ [ 𝑀 ] , an advertiser is allowed to input a per-channel target ROI 0 ≤ 𝛾 𝑗 < ∞ , and a per-channel budget 𝜌 𝑗 ∈ [ 0 , 𝜌 ] where we recall 𝜌 > 0 is the total advertiser budget for a certain campaign. Then, the channel uses these inputs in its autobidding algorithm to procure ads, and returns the total conversion \totv 𝑗 ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 ; 𝒛 𝑗 ) ≥ 0 , as well as total spend \totc 𝑗 ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 ; 𝒛 𝑗 ) ≥ 0 to the advertiser, where we recall 𝒛 𝑗

( 𝒗 𝑗 , 𝒅 𝑗 ) ∈ ℝ 𝑚 𝑗 × ℝ 𝑚 𝑗 is the vector of value-cost pairs in channel 𝑗 sampled from discrete support 𝐹 𝑗 according to distribution 𝒑 𝑗 ; \totv 𝑗 and \totc 𝑗 will be further specified later.

As the advertiser has the freedom of choice to input either per-channel target ROI’s, budgets, or both, we consider three options for the advertiser: 1. input only a per-channel target ROI for each channel; 2. input only a per-channel budget for each channel; 3. input both per-channel target ROI and budgets for each channel. Such options correspond to the following decision sets for ( 𝛾 𝑗 , 𝜌 𝑗 ) 𝑗 ∈ [ 𝑀 ] :

Per-channel budget only option: ⁢ ℐ \budg

{ ( 𝛾 𝑗 , 𝜌 𝑗 ) 𝑗 ∈ [ 𝑀 ] ∈ ℝ + 2 × 𝑀 : 𝛾 𝑗

0 , 𝜌 𝑗 ∈ [ 0 , 𝜌 ] ⁢ for ⁢ ∀ 𝑗 } .

Per-channel target ROI only option: ⁢ ℐ \roi

{ ( 𝛾 𝑗 , 𝜌 𝑗 ) 𝑗 ∈ [ 𝑀 ] ∈ ℝ + 2 × 𝑀 : 𝛾 𝑗 ≥ 0 , 𝜌 𝑗

∞ ⁢ for ⁢ ∀ 𝑗 } .

General option: ⁢ ℐ 𝐺

{ ( 𝛾 𝑗 , 𝜌 𝑗 ) 𝑗 ∈ [ 𝑀 ] : 𝛾 𝑗 ≥ 0 , 𝜌 𝑗 ∈ [ 0 , 𝜌 ] ⁢ for ⁢ ∀ 𝑗 } . (2)

The advertiser’s goal in practice is to maximize their total conversion of procured ad impressions through optimizing over per-channel budgets and target ROIs, while subject to the global ROI and budget constraint similar to those in GL-OPT. Mathematically, for any option ℐ ∈ { ℐ \budg , ℐ \roi , ℐ 𝐺 } , the advertiser’s optimization problem through channels can be written as

CH-OPT ⁢ ( ℐ )

max ( 𝛾 𝑗 , 𝜌 𝑗 ) 𝑗 ∈ [ 𝑀 ] ∈ ℐ

∑ 𝑗 ∈ 𝑀 𝔼 ⁢ [ \totv 𝑗 ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 ; 𝒛 𝑗 ) ]

s.t.

∑ 𝑗 ∈ 𝑀 𝔼 ⁢ [ \totv 𝑗 ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 ; 𝒛 𝑗 ) ] ≥ 𝛾 ⁢ ∑ 𝑗 ∈ 𝑀 𝔼 ⁢ [ \totc 𝑗 ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 ; 𝒛 𝑗 ) ]

∑ 𝑗 ∈ [ 𝑀 ] 𝔼 ⁢ [ \totc 𝑗 ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 ; 𝒛 𝑗 ) ] ≤ 𝜌 , (3)

where the expectation is taken w.r.t. randomness in 𝒛 𝑗 . We remark that for any channel 𝑗 ∈ [ 𝑀 ] , the number of auctions 𝑚 𝑗 as well as the distribution 𝒑 𝑗 are fixed and not a function of the input parameters 𝛾 𝑗 , 𝜌 𝑗 .

The functions ( \totv 𝑗 , \totc 𝑗 ) that map per-channel target ROI and budgets 𝛾 𝑗 , 𝜌 𝑗 to the total conversion and expenditure are specified by various factors including but not limited to channel 𝑗 ’s autobidding algorithms deployed to procure ads on advertisers’ behalf as well as the auctions mechanisms that sell impressions. In this work, we study a general setup that closely mimics industry practices. We assume that on the behalf of the advertiser, each channel aims to optimize their conversion over all 𝑚 𝑗 auctions while respecting the advertiser’s input (i.e., per-channel target ROI and budgets). (See e.g. Meta Ads Manager in Figure 1 specifically highlights the channel’s autobidding procurement methodology provides evidence to support the aforementioned setup). Hence, each channel 𝑗 ’s optimization problem can be written as

𝒙 𝑗 * ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 ; 𝒛 𝑗 )

arg ⁡ max 𝒙 ∈ [ 0 , 1 ] 𝑚 𝑗

𝒗 𝑗 ⊤ ⁢ 𝒙 s.t. 𝒗 𝑗 ⊤ ⁢ 𝒙 ≥ 𝛾 𝑗 ⁢ 𝒅 𝑗 ⊤ ⁢ 𝒙 , 𝒅 𝑗 ⊤ ⁢ 𝒙 ≤ 𝜌 𝑗 , (4)

where 𝒙

( 𝑥 𝑛 ) 𝑛 ∈ [ 𝑚 𝑗 ] ∈ [ 0 , 1 ] 𝑚 𝑗 denotes the vector of probabilities to win each of the parallel auctions, i.e. 𝑥 𝑛 ∈ [ 0 , 1 ] is the probability to win auction 𝑛 ∈ [ 𝑚 𝑗 ] in channel 𝑗 . In light of this representation, the corresponding conversion and spend functions are given by

\totv 𝑗 ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 ; 𝒛 𝑗 )

𝒗 𝑗 ⊤ ⁢ 𝒙 𝑗 * ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 ; 𝒛 𝑗 ) and \totv 𝑗 ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 )

𝔼 ⁢ [ \totv 𝑗 ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 ; 𝒛 𝑗 ) ]

\totc 𝑗 ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 ; 𝒛 𝑗 )

𝒅 𝑗 ⊤ ⁢ 𝒙 𝑗 * ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 ; 𝒛 𝑗 ) and \totc 𝑗 ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 )

𝔼 ⁢ [ \totc 𝑗 ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 ; 𝒛 𝑗 ) ] . (5)

Here, the expectation is taken w.r.t. randomness in 𝒛 𝑗

( 𝒗 𝑗 , 𝒅 𝑗 ) ∈ ℝ + 𝑚 𝑗 × ℝ + 𝑚 𝑗 . We assume that for any ( 𝛾 𝑗 , 𝜌 𝑗 ) and realization 𝒛 𝑗 , \totv 𝑗 ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 ; 𝒛 𝑗 ) is bounded above by some absolute constant \totv ¯ ∈ ( 0 , ∞ ) almost surely. We remark that Eq.(5) assumes channels are able to achieve optimal procurement performance. Later in Section LABEL:sec:num, we conduct numerical studies to address setups where channels does not optimally solve for Eq.(4), and in Section LABEL:subsec:nonoptautobid, we will briefly discuss theoretical approaches to handle non-optimal autobidding.

Key focuses and organization of this work. In this paper, we address two key topics:

How effective are the per-channel ROI and budget levers to help advertisers achieve the globally optimal conversion GL-OPT while respecting the global ROI and budget constraints? In particular, for each of the advertiser options ℐ ∈ { ℐ \budg , ℐ \roi , ℐ 𝐺 } defined in Eq. (2), what is the discrepancy between CH-OPT ⁢ ( ℐ ) , i.e. the optimal conversion an advertiser can achieve in practice, versus the optimal GL-OPT?

Since in reality advertisers can only utilize the two per-channel levers offered by channels, how can advertisers optimize per-channel target ROIs and budgets to solve for CH-OPT ⁢ ( ℐ ) ?

In Section LABEL:sec:futil, we address the first question to determine the gap between CH-OPT ⁢ ( ℐ ) and GL-OPT for different advertiser options. In Section LABEL:sec:online, we develop an efficient algorithm to solve for per-channel levers that optimize CH-OPT ⁢ ( ℐ ) .

3 Fairness guarantees for VCG with ML advice 4 Impossibility result: VCG is the fairest 5 Extensions: fairness guarantees for GSP and GFP with ML advice

Appendices for

Multi-channel Autobidding with Budget and ROI constraints

{APPENDICES} 6 Proofs for Section LABEL:sec:futil 6.1 Proof of Lemma LABEL:lem:glbest

Fix any option ℐ ∈ { ℐ \budg , ℐ \roi , ℐ 𝐺 } defined in Eq. (2), and let ( 𝜸 ~ , 𝝆 ~ ) ∈ ℐ be the optimal solution to CH-OPT ⁢ ( ℐ ) . Note that for the per-channel ROI only option ℐ \roi , we have 𝜌 ~ 𝑗

∞ and for the per-channel budget only we have 𝛾 ~ 𝑗

0 for all 𝑗 ∈ [ 𝑀 ] . Further, for any realization of value-cost pairs over all auctions 𝒛

( 𝒗 𝑗 , 𝒅 𝑗 ) 𝑗 ∈ [ 𝑀 ] , recall the optimal solution 𝒙 𝑗 * ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) to \totv 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) for each channel 𝑗 ∈ [ 𝑀 ] as defined in Eq. (4).

Due to feasibility of ( 𝜸 ~ , 𝝆 ~ ) ∈ ℐ for CH-OPT ⁢ ( ℐ ) , we have

∑ 𝑗 ∈ 𝑀 𝔼 ⁢ [ \totv 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) ] ≥ 𝛾 ⁢ ∑ 𝑗 ∈ 𝑀 𝔼 ⁢ [ \totc 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) ] ⟹ ∑ 𝑗 ∈ [ 𝑀 ] 𝔼 ⁢ [ 𝒗 𝑗 ⊤ ⁢ 𝒙 𝑗 * ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) ] ≥ 𝛾 ⁢ ∑ 𝑗 ∈ [ 𝑀 ] [ 𝒅 𝑗 ⊤ ⁢ 𝒙 𝑗 * ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) ]

where we used the definitions \totv 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 )

𝒗 𝑗 ⊤ ⁢ 𝒙 𝑗 * ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) and \totc 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 )

𝒅 𝑗 ⊤ ⁢ 𝒙 𝑗 * ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) in Eq. (5). This implies ( 𝒙 𝑗 * ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) ) 𝑗 ∈ [ 𝑀 ] satisfies the ROI constraint in GL-OPT. A similar analysis implies ( 𝒙 𝑗 * ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) ) 𝑗 ∈ [ 𝑀 ] also satisfies the budget constraint in GL-OPT. Therefore, ( 𝒙 𝑗 * ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) ) 𝑗 ∈ [ 𝑀 ] is feasible to GL-OPT. So

GL-OPT ≥ ∑ 𝑗 ∈ [ 𝑀 ] 𝔼 ⁢ [ 𝒗 𝑗 ⊤ ⁢ 𝒙 𝑗 * ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) ]

∑ 𝑗 ∈ 𝑀 [ \totv 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) ]

CH-OPT ⁢ ( ℐ ) . (6)

where the final equality follows from the assumption that ( 𝜸 ~ , 𝝆 ~ ) ∈ ℐ is the optimal solution to CH-OPT ⁢ ( ℐ ) . \halmos

6.2 Proof of Theorem LABEL:thm:roifail

Let 𝜸 ~

( 𝛾 ~ 1 , 𝛾 ~ 2 ) be the optimal solution to CH-OPT ⁢ ( ℐ \roi ) and recall under the option ℐ \roi , we let per-channel budgets to be infinity. It is easy to see that 𝛾 ~ 1 can be any arbitrary nonnegative number because the advertiser always wins auction 1, and 𝛾 ~ 2 > 𝑋 1 + 𝑋 : if otherwise 𝛾 ~ 2 ≤ 𝑋 1 + 𝑋 , then the optimal outcome of channel 2 is to win both auctions 2 and 3. However, in this case, the advertiser wins all auctions and acquires total value 1 + 𝑋 + 2 ⁢ 𝑋

1 + 3 ⁢ 𝑋 , and incurs total spend 0 + ( 1 + 𝑋 ) + 2 ⁢ ( 1 + 𝑋 )

3 + 3 ⁢ 𝑋 , which violates the ROI constraint in CH-OPT ⁢ ( ℐ \roi ) because 1 + 3 ⁢ 𝑋 3 + 3 ⁢ 𝑋 < 1 . Therefore the advertiser can only win auction 1, or in other words 𝛾 ~ 2 > 𝑋 1 + 𝑋 . This implies that the optimal objective to CH-OPT ⁢ ( ℐ \roi ) is 1 . On the other hand, it is easy to see that the optimal solution to GL-OPT is to only win auctions 1 and 2, yielding an optimal value of 1 + 𝑋 . Therefore CH-OPT ⁢ ( ℐ \roi ) GL-OPT

1 1 + 𝑋 . Taking 𝑋 → ∞ yeilds the desired result. \halmos

6.3 Proof of Theorem LABEL:thm:budgonlysuff

In light of Lemma LABEL:lem:glbest, we only need to show CH-OPT ⁢ ( ℐ \budg ) ≥ GL-OPT . Let 𝒙 ~ ( 𝒛 )

{ 𝒙 ~ 𝑗 ( 𝒛 𝑗 ) ) } 𝑗 ∈ [ 𝑁 ] be the optimal solution to GL-OPT, and define 𝛾 ~ 𝑗

0 and 𝜌 ~ 𝑗

𝔼 [ 𝒅 𝑗 ⊤ 𝒙 ~ 𝑗 ( 𝒛 𝑗 ) ) ] to be the corresponding expected spend for each channel 𝑗 under the optimal solution 𝒙 ~ ⁢ ( 𝒛 ) to GL-OPT, respectively.

We first argue that ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ) 𝑗 ∈ [ 𝑀 ] is feasible to CH-OPT ⁢ ( ℐ \budg ) . Recall the optimal solution 𝒙 𝑗 * ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) to \totv 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) for each channel 𝑗 ∈ [ 𝑀 ] as defined in Eq. (4), as well as the definitions \totv 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 )

𝒗 𝑗 ⊤ ⁢ 𝒙 𝑗 * ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) and \totc 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 )

𝒅 𝑗 ⊤ ⁢ 𝒙 𝑗 * ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) in Eq. (5). Then, we have

𝔼 ⁢ [ \totc 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) ]

𝔼 ⁢ [ 𝒅 𝑗 ⊤ ⁢ 𝒙 𝑗 * ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) ] ⁢ ≤ ( 𝑖 ) ⁢ 𝜌 ~ 𝑗

𝔼 ⁢ [ 𝒅 𝑗 ⊤ ⁢ 𝒙 ~ 𝑗 ⁢ ( 𝒛 𝑗 ) ] , (7)

where (i) follows from feasibility of 𝒙 𝑗 * ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) to \totv 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) . Summing over 𝑗 ∈ [ 𝑀 ] we conclude that ( 𝜸 𝑗 , 𝝆 𝑗 ) 𝑗 ∈ [ 𝑀 ] satisfies the budget constraint in CH-OPT ⁢ ( ℐ \budg ) :

∑ 𝑗 ∈ [ 𝑀 ] 𝔼 ⁢ [ \totc 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) ] ≤ ∑ 𝑗 ∈ [ 𝑀 ] 𝔼 ⁢ [ 𝒅 𝑗 ⊤ ⁢ 𝒙 ~ 𝑗 ⁢ ( 𝒛 𝑗 ) ] ⁢ ≤ ( 𝑖 ) ⁢ 𝜌 . (8)

Here (i) follows from feasibility of 𝒙 ~ ( 𝒛 )

{ 𝒙 ~ 𝑗 ( 𝒛 𝑗 ) ) } 𝑗 ∈ [ 𝑁 ] to GL-OPT since it is the optimal solution.

On the other hand, we have

\totv 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 )

𝒗 𝑗 ⊤ ⁢ 𝒙 𝑗 * ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) ⁢ ≥ ( 𝑖 ) ⁢ 𝒗 𝑗 ⊤ ⁢ 𝒙 ~ 𝑗 ⁢ ( 𝒛 𝑗 ) (9)

where (i) follows from optimality of 𝒙 𝑗 * ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) to \totv 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) . Hence, we have

∑ 𝑗 ∈ 𝑀 𝔼 ⁢ [ \totv 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) ] ≥ ∑ 𝑗 ∈ 𝑀 𝔼 ⁢ [ 𝒗 𝑗 ⊤ ⁢ 𝒙 ~ 𝑗 ⁢ ( 𝒛 𝑗 ) ] ⁢ ≥ ( 𝑖 ) ⁢ 𝛾 ⁢ ∑ 𝑗 ∈ 𝑀 𝔼 ⁢ [ 𝒅 𝑗 ⊤ ⁢ 𝒙 ~ 𝑗 ⁢ ( 𝒛 𝑗 ) ] ⁢ ≥ ( 𝑖 ⁢ 𝑖 ) ⁢ 𝛾 ⁢ ∑ 𝑗 ∈ [ 𝑀 ] 𝔼 ⁢ [ \totc 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) ] (10)

where (i) follows from feasibility of 𝒙 ~ ( 𝒛 )

{ 𝒙 ~ 𝑗 ( 𝒛 𝑗 ) ) } 𝑗 ∈ [ 𝑁 ] to GL-OPT since it is the optimal solution; (ii) follows from Eq. (7). Hence combining Eq. (8) (10) we can conclude that ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ) 𝑗 ∈ [ 𝑀 ] is feasible to CH-OPT ⁢ ( ℐ \budg ) .

Finally, we have CH-OPT ⁢ ( ℐ \budg ) ≥ ∑ 𝑗 ∈ 𝑀 𝔼 ⁢ [ \totv 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) ] ≥ ∑ 𝑗 ∈ 𝑀 𝔼 ⁢ [ 𝒗 𝑗 ⊤ ⁢ 𝒙 ~ 𝑗 ⁢ ( 𝒛 𝑗 ) ]

GL-OPT , where the last inequality follows from (10), and the final equality is because we assumed 𝒙 ~ ( 𝒛 )

{ 𝒙 ~ 𝑗 ( 𝒛 𝑗 ) ) } 𝑗 ∈ [ 𝑁 ] is the optimal solution to GL-OPT. \halmos

6.4 Proof of Corollary LABEL:cor:roiredund

In light of Lemma LABEL:lem:glbest, we only need to show CH-OPT ⁢ ( ℐ 𝐺 ) ≥ GL-OPT . Let ( 𝜸 ~ , 𝝆 ~ ) ∈ ℐ \budg , and by definition of ℐ \budg in Eq. (2) we have 𝛾 ~ 𝑗

0 for all 𝑗 ∈ [ 𝑀 ] . Since ( 𝜸 ~ , 𝝆 ~ ) is feasible to CH-OPT ⁢ ( ℐ \budg ) , it is also feasible to CH-OPT ⁢ ( ℐ 𝐺 ) since these two problems share the same ROI and budget constraints. Because they also share the same objectives, we have

CH-OPT ⁢ ( ℐ 𝐺 ) ≥ CH-OPT ⁢ ( ℐ \budg )

GL-OPT (11)

where the final equality follows from Theorem LABEL:thm:budgonlysuff. \halmos

7 Proofs for Section LABEL:sec:online 7.1 Proof of Proposition LABEL:prop:ubbenchlag

Let ( 𝜌 𝑗 * ) 𝑗 ∈ [ 𝑀 ] be the optimal per-channel budgets to CH-OPT ⁢ ( ℐ \budg ) , and define 𝜇 ¯ 𝑇

1 𝜏 𝐴 ⁢ ∑ 𝑡 ∈ [ 𝜏 𝐴 ] 𝜇 𝑡 as well as 𝜆 ¯ 𝑇

1 𝜏 𝐴 ⁢ ∑ 𝑡 ∈ [ 𝜏 𝐴 ] 𝜆 𝑡 . Then

𝑇 ⋅ GL-OPT − ∑ 𝑡 ∈ [ 𝑇 ] ∑ 𝑗 ∈ [ 𝑀 ] \totv 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 )

≤ ( 𝑖 )

𝑀 ⁢ \totv ¯ ⁢ ( 𝑇 − 𝜏 𝐴 ) + 𝜏 𝐴 ⁢ CH-OPT ⁢ ( ℐ \budg ) − ∑ 𝑡 ∈ [ 𝜏 𝐴 ] ∑ 𝑗 ∈ [ 𝑀 ] \totv 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 )

≤

𝑀 ⁢ \totv ¯ ⁢ ( 𝑇 − 𝜏 𝐴 ) + 𝜏 𝐴 ⋅ ( ℒ 𝑗 ⁢ ( 𝜌 𝑗 * , 𝜆 ¯ 𝑇 , 𝜇 ¯ 𝑇 ) + 𝜌 ⁢ 𝜇 ¯ 𝑇 ) − ∑ 𝑡 ∈ [ 𝜏 𝐴 ] ∑ 𝑗 ∈ [ 𝑀 ] \totv 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 )

≤ ( 𝑖 ⁢ 𝑖 )

𝑀 ⁢ \totv ¯ ⁢ ( 𝑇 − 𝜏 𝐴 ) + 𝜌 ⁢ ∑ 𝑡 ∈ [ 𝜏 𝐴 ] 𝜇 𝑡 + ∑ 𝑡 ∈ [ 𝜏 𝐴 ] ∑ 𝑗 ∈ [ 𝑀 ] ℒ 𝑗 ⁢ ( 𝜌 𝑗 * , 𝜆 𝑡 , 𝜇 𝑡 ) − ∑ 𝑡 ∈ [ 𝜏 𝐴 ] ∑ 𝑗 ∈ [ 𝑀 ] ℒ 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 , \cont 𝑡 ) − 𝜆 𝑡 ⁢ ( \totv 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 ) − 𝛾 ⁢ 𝜌 𝑗 , 𝑡 ) + 𝜇 𝑡 ⁢ 𝜌 𝑗 , 𝑡

≤ ( 𝑖 ⁢ 𝑖 ⁢ 𝑖 )

𝑀 ⁢ \totv ¯ ⁢ ( 𝑇 − 𝜏 𝐴 ) + ∑ 𝑗 ∈ [ 𝑀 ] ∑ 𝑡 ∈ [ 𝜏 𝐴 ] ℒ 𝑗 ⁢ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) , \cont 𝑡 ) − ℒ 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 , \cont 𝑡 ) + ∑ 𝑡 ∈ [ 𝜏 𝐴 ] ( 𝜆 𝑡 ⁢ 𝑔 1 , 𝑡 + 𝜇 𝑡 ⁢ 𝑔 2 , 𝑡 ) . (12)

Here, (i) follows from Theorem LABEL:thm:budgonlysuff that states GL-OPT

CH-OPT ⁢ ( ℐ \budg ) and CH-OPT ⁢ ( ℐ \budg ) is apparently upper bounded by 𝑀 ⁢ \totv ¯ ; (ii) follows from the CH-OPT ⁢ ( ℐ \budg )

∑ 𝑗 ∈ [ 𝑀 ] \totv 𝑗 ⁢ ( 𝜌 𝑗 * ) and the definition of the Lagrangian in Eq. (LABEL:eq:Lagrangian); in (iii) we define 𝜌 𝑗 * ⁢ ( 𝑡 )

arg ⁡ max 𝜌 𝑗 ≥ 0 ⁡ ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont 𝑡 ) to be the optimal budget that maximizes the Lagrangian w.r.t. the dual variables \cont 𝑡

( 𝜆 𝑡 , 𝜇 𝑡 ) . ∎

7.2 Proof for Lemma LABEL:lem:boundcs

Recall 𝑔 1 , 𝑡

∑ 𝑗 ∈ [ 𝑀 ] ( \totv 𝑗 , 𝑡 ⁢ ( 𝜌 𝑗 , 𝑡 ; 𝒛 𝑗 , 𝑡 ) − 𝛾 ⁢ 𝜌 𝑗 , 𝑡 ) and 𝑔 2 , 𝑡

𝜌 − ∑ 𝑗 ∈ [ 𝑀 ] 𝜌 𝑗 , 𝑡 defined in Algorithm LABEL:alg:omducb. Also recall 𝜏 𝐴 ∈ [ 𝑇 ] defined in step 10 of Algorithm LABEL:alg:omducb. In the following, we will show

𝑀 ⁢ \totv ¯ ⁢ ( 𝑇 − 𝜏 𝐴 ) + ∑ 𝑡 ∈ [ 𝜏 𝐴 ] ( 𝜆 𝑡 ⁢ 𝑔 1 , 𝑡 + 𝜇 𝑡 ⁢ 𝑔 2 , 𝑡 )

≤
\dualub 𝐹 ⁢ max ⁡ { 𝑀 ⁢ \totv ¯ , 𝜌 } + 𝑀 2 ⁢ \totv ¯ ⁢ 𝜌 ⋅ max ⁡ { 1 \safe ⁢ \safebudg , 1 𝜌 − 𝑀 ⁢ \safebudg } + ( 𝛾 ⁢ 𝑀 2 ⁢ \totv ¯ 2 + 𝜌 2 ) 2 ⋅ 𝜂 ⁢ 𝑇 + 1 2 ⁢ 𝜂 ⁢ \dualub 𝐹 2

𝒪 ⁢ ( 𝜂 ⁢ 𝑇 + 1 𝜂 ) , (13)

where we recall \dualub 𝐹

𝑀 ⁢ \totv ¯ ⁢ max ⁡ { 1 \safe ⁢ \safebudg , 1 𝜌 − 𝑀 ⁢ \safebudg } defined in Eq. (LABEL:def:daulvarub).

From Lemma 7.3, we have for any 𝑡 ∈ [ 𝑇 ] , and 𝜆 , 𝜇 ∈ [ 0 , \dualub 𝐹 ] ,

∑ 𝜏 ∈ [ 𝑡 ] ( 𝜆 𝜏 − 𝜆 ) ⁢ 𝑔 1 , 𝜏 ≤ 𝜂 ⁢ 𝑀 2 ⁢ \totv ¯ 2 2 ⋅ 𝑡 + 1 2 ⁢ 𝜂 ⁢ 𝜆 2

∑ 𝜏 ∈ [ 𝑡 ] ( 𝜇 𝜏 − 𝜇 ) ⁢ 𝑔 2 , 𝜏 ≤ 𝜂 ⁢ 𝜌 2 2 ⋅ 𝑡 + 1 2 ⁢ 𝜂 ⁢ 𝜇 2 , (14)

where we used the fact that 𝜆 1

𝜇 1

0 in Algorithm LABEL:alg:omducb.

Suppose that 𝜏 𝐴

𝑇 and thus 𝑀 ⁢ \totv ¯ ⁢ ( 𝑇 − 𝜏 𝐴 )

0 . Then, considering 𝜆

𝜇

0 in Eq. (14), we have

∑ 𝑡 ∈ [ 𝜏 𝐴 ] 𝜆 𝑡 ⁢ 𝑔 1 , 𝑡 ≤ 𝜂 ⁢ 𝑀 2 ⁢ \totv ¯ 2 2 ⋅ 𝑇 ⁢ and ⁢ ∑ 𝑡 ∈ [ 𝜏 𝐴 ] 𝜇 𝑡 ⁢ 𝑔 2 , 𝑡 ≤ 𝜂 ⁢ 𝜌 2 2 ⋅ 𝑇 . (15)

Thus, Eq. (13) holds.

If 𝜏 𝐴 < 𝑇 , then according to Algorithm LABEL:alg:omducb, we either have 𝑆 1 , 𝜏 𝐴 − 𝛾 ⁢ 𝑀 ⁢ 𝜌 + \safe ⁢ \safebudg ⁢ ( 𝑇 − 𝜏 𝐴 ) < 0 or 𝑆 2 , 𝜏 𝐴 + 𝑀 ⁢ 𝜌 + 𝑀 ⁢ \safebudg ⁢ ( 𝑇 − 𝜏 𝐴 ) > 𝜌 ⁢ 𝑇 , where we recall 𝑆 1 , 𝜏 𝐴

∑ 𝑡 ∈ [ 𝜏 𝐴 − 1 ] 𝑔 1 , 𝑡 and 𝑆 2 , 𝜏 𝐴

∑ 𝑡 ∈ [ 𝜏 𝐴 − 1 ] ∑ 𝑗 ∈ [ 𝑀 ] 𝜌 𝑗 , 𝑡

∑ 𝑡 ∈ [ 𝜏 𝐴 − 1 ] ( 𝜌 − 𝑔 2 , 𝑡 ) :

•

If 𝑆 1 , 𝜏 𝐴 − 𝛾 ⁢ 𝑀 ⁢ 𝜌 + \safe ⁢ \safebudg ⁢ ( 𝑇 − 𝜏 𝐴 ) < 0 , then we have ∑ 𝑡 ∈ [ 𝜏 𝐴 − 1 ] 𝑔 1 , 𝑡 < 𝛾 ⁢ 𝑀 ⁢ 𝜌 − \safe ⁢ \safebudg ⁢ ( 𝑇 − 𝜏 𝐴 ) . Hence, considering 𝜆

𝑀 ⁢ \totv ¯ \safe ⁢ \safebudg ∈ [ 0 , \dualub 𝐹 ] in Eq. (14), we have

𝑀 ⁢ \totv ¯ ⁢ ( 𝑇 − 𝜏 𝐴 ) + ∑ 𝑡 ∈ [ 𝜏 𝐴 ] 𝜆 𝑡 ⁢ 𝑔 1 , 𝑡

≤

𝑀 ⁢ \totv ¯ ⁢ ( 𝑇 − 𝜏 𝐴 ) + 𝜆 𝜏 𝐴 ⁢ 𝑔 1 , 𝜏 𝐴 + ∑ 𝑡 ∈ [ 𝜏 𝐴 − 1 ] 𝜆 ⁢ 𝑔 1 , 𝑡 + 𝜂 ⁢ 𝑀 2 ⁢ \totv ¯ 2 2 ⋅ ( 𝜏 𝐴 − 1 ) + 1 2 ⁢ 𝜂 ⁢ 𝜆 2

𝜆 𝜏 𝐴 ⁢ 𝑔 1 , 𝜏 𝐴 + 𝑀 ⁢ \totv ¯ ⁢ ( 𝑇 − 𝜏 𝐴 ) − 𝑀 ⁢ \totv ¯ ⁢ ( 𝑇 − 𝜏 𝐴 ) + 𝛾 ⁢ 𝑀 2 ⁢ \totv ¯ ⁢ 𝜌 \safe ⁢ \safebudg + 𝜂 ⁢ 𝑀 2 ⁢ \totv ¯ 2 2 ⋅ ( 𝜏 𝐴 − 1 ) + 1 2 ⁢ 𝜂 ⁢ 𝜆 2

≤

\dualub 𝐹 ⁢ 𝑀 ⁢ \totv ¯ + 𝛾 ⁢ 𝑀 2 ⁢ \totv ¯ ⁢ 𝜌 \safe ⁢ \safebudg + 𝜂 ⁢ 𝑀 2 ⁢ \totv ¯ 2 2 ⋅ 𝑇 + 1 2 ⁢ 𝜂 ⁢ \dualub 𝐹 2 , (16)

where the final inequality uses the fact that 𝜏 𝐴 ≤ 𝑇 , 𝜆 ≤ \dualub 𝐹 , and 𝑔 1 , 𝑡 ≤ 𝑀 ⁢ \totv ¯ for any 𝑡 ∈ [ 𝑇 ] . Hence, similar to Eq. (15) by further taking 𝜇

0 in Eq.(14) we show that Eq. (13) holds.

•

If 𝑆 2 , 𝜏 𝐴 + 𝑀 ⁢ 𝜌 + 𝑀 ⁢ \safebudg ⁢ ( 𝑇 − 𝜏 𝐴 ) > 𝜌 ⁢ 𝑇 , then we have ∑ 𝑡 ∈ [ 𝜏 𝐴 − 1 ] ( 𝜌 − 𝑔 2 , 𝑡 ) > 𝜌 ⁢ 𝑇 − 𝑀 ⁢ 𝜌 − 𝑀 ⁢ \safebudg ⁢ ( 𝑇 − 𝜏 𝐴 ) , or equivalently ∑ 𝑡 ∈ [ 𝜏 𝐴 − 1 ] 𝑔 2 , 𝑡 < 𝑀 ⁢ \safebudg ⁢ ( 𝑇 − 𝜏 𝐴 ) + 𝑀 ⁢ 𝜌 − 𝜌 ⁢ ( 𝑇 − 𝜏 𝐴 ) ≤ − ( 𝜌 − 𝑀 ⁢ \safebudg ) ⁢ ( 𝑇 − 𝜏 𝐴 ) + 𝑀 ⁢ 𝜌 . Hence, considering 𝜇

𝑀 ⁢ \totv ¯ 𝜌 − 𝑀 ⁢ \safebudg ∈ [ 0 , \dualub 𝐹 ] in Eq.(14) we have

𝑀 ⁢ \totv ¯ ⁢ ( 𝑇 − 𝜏 𝐴 ) + ∑ 𝑡 ∈ [ 𝜏 𝐴 ] 𝜇 𝑡 ⁢ 𝑔 1 , 𝑡 ≤

𝑀 ⁢ \totv ¯ ⁢ ( 𝑇 − 𝜏 𝐴 ) + 𝜇 𝜏 𝐴 ⁢ 𝑔 2 , 𝜏 𝐴 + ∑ 𝑡 ∈ [ 𝜏 𝐴 − 1 ] 𝜇 ⁢ 𝑔 2 , 𝑡 + 𝜂 ⁢ 𝜌 2 2 ⋅ 𝜏 𝐴 + 1 2 ⁢ 𝜂 ⁢ 𝜇 2

𝜇 𝜏 𝐴 ⁢ 𝑔 2 , 𝜏 𝐴 + 𝑀 ⁢ \totv ¯ ⁢ ( 𝑇 − 𝜏 𝐴 ) − 𝑀 ⁢ \totv ¯ ⁢ ( 𝑇 − 𝜏 𝐴 ) + 𝑀 2 ⁢ \totv ¯ ⁢ 𝜌 𝜌 − \safebudg + 𝜂 ⁢ 𝜌 2 2 ⋅ 𝜏 𝐴 + 1 2 ⁢ 𝜂 ⁢ 𝜇 2

≤

\dualub 𝐹 ⁢ 𝜌 + 𝑀 2 ⁢ \totv ¯ ⁢ 𝜌 𝜌 − \safebudg + 𝜂 ⁢ 𝜌 2 2 ⋅ 𝑇 + 1 2 ⁢ 𝜂 ⁢ \dualub 𝐹 2 , (17)

where the final inequality uses the fact that 𝜏 𝐴 ≤ 𝑇 , 𝜆 ≤ \dualub 𝐹 , and 𝑔 2 , 𝑡 ≤ 𝜌 for any 𝑡 ∈ [ 𝑇 ] . Hence, similar to Eq. (15) by further taking 𝜆

0 in Eq.(14) we show that Eq. (13) holds.

∎

7.3 Proof of Lemma LABEL:lem:Vstruct

We first show for any realization 𝒛

( 𝒛 𝑗 ) 𝑗 ∈ [ 𝑀 ]

( 𝒗 𝑗 , 𝒅 𝑗 ) 𝑗 ∈ [ 𝑀 ] , the conversion function \totv 𝑗 ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 ) is piecewise linear, strictly inreasing, and concave for any 𝑗 ∈ [ 𝑀 ] .

Fix any channel 𝑗 which consists of 𝑚 𝑗 parallel auctions, and recall that we assumed the orderding 𝑣 𝑗 , 1 𝑑 𝑗 , 1

𝑣 𝑗 , 2 𝑑 𝑗 , 2

⋯

𝑣 𝑗 , 𝑚 𝑗 𝑑 𝑗 , 𝑚 𝑗 for any realization 𝒛 𝑗 . Then, with the option where the per-channel ROI is set to 0 (i.e. omitted) \totv 𝑗 ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 ) is exactly the LP relaxation of a 0-1 knapsack, whose optimal solution 𝒙 𝑗 * ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 ) is well known to be unique, and takes the form for any auction index 𝑛 ∈ [ 𝑚 𝑗 ] :

𝑥 𝑗 , 𝑛 * ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 )

{ 1

if ⁢ ∑ 𝑛 ′ ∈ [ 𝑛 ] 𝑑 𝑗 , 𝑛 ′ ≤ 𝜌 𝑗

𝜌 𝑗 − ∑ 𝑛 ′ ∈ [ 𝑛 − 1 ] 𝑑 𝑗 , 𝑛 ′ 𝑑 𝑗 , 𝑛

if ⁢ ∑ 𝑛 ′ ∈ [ 𝑛 ] 𝑑 𝑗 , 𝑛 ′

𝜌 𝑗

otherwise (18)

where we denote 𝑑 𝑗 , 0

0 . With this form, it is easy to see

\totv 𝑗 ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 )

𝒗 𝑗 ⊤ ⁢ 𝒙 𝑗 * ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 )

∑ 𝑛 ∈ [ 𝑚 𝑗 ] ( 𝑣 𝑗 , 𝑛 𝑑 𝑗 , 𝑛 ⁢ 𝜌 𝑗 + 𝑏 𝑗 , 𝑛 ) ⁢ 𝕀 ⁢ { 𝑑 𝑗 , 0 + ⋯ + 𝑑 𝑗 , 𝑛 − 1 ≤ 𝜌 𝑗 ≤ 𝑑 𝑗 , 0 + ⋯ + 𝑑 𝑗 , 𝑛 } (19)

where we denote 𝑑 𝑗 , 0

0 and also 𝑏 𝑗 , 𝑛

∑ 𝑛 ′ ∈ [ 𝑛 − 1 ] 𝑣 𝑗 , 𝑛 ′ − 𝑣 𝑗 , 𝑛 𝑑 𝑗 , 𝑛 ⋅ ( ∑ 𝑛 ′ ∈ [ 𝑛 − 1 ] 𝑑 𝑗 , 𝑛 ′ ) and 𝑣 𝑗 , 0

0 . It is easy to check that any two line segments, say [ 𝑋 𝑛 − 1 , 𝑋 𝑛 ] and [ 𝑋 𝑛 , 𝑋 𝑛 + 1 ] where we write 𝑋 𝑛

𝑑 𝑗 , 0 + ⋯ + 𝑑 𝑗 , 𝑛 , intersect at 𝜌 𝑗

𝑋 𝑛 , because 𝑣 𝑗 , 𝑛 𝑑 𝑗 , 𝑛 ⁢ 𝜌 𝑗 + 𝑏 𝑗 , 𝑛

𝑣 𝑗 , 𝑛 + 1 𝑑 𝑗 , 𝑛 + 1 ⁢ 𝜌 𝑗 + 𝑏 𝑗 , 𝑛 + 1 at 𝜌 𝑗

𝑋 𝑛 . Hence, from Eq. (19) we can conclude \totv 𝑗 ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 ) is continuous, which further implies it is piecewise linear and strictly increasing. Further, the ordering 𝑣 𝑗 , 1 𝑑 𝑗 , 1

𝑣 𝑗 , 2 𝑑 𝑗 , 2

⋯

𝑣 𝑗 , 𝑚 𝑗 𝑑 𝑗 , 𝑚 𝑗 implies that the slopes on each segment [ 𝑋 𝑛 , 𝑋 𝑛 + 1 ] decreases as 𝑛 increases, which implies \totv 𝑗 ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 ) is concave.

Since \totv 𝑗 ⁢ ( 𝜌 𝑗 )

𝔼 ⁢ [ \totv 𝑗 ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 ) ] , where the expectation is taken w.r.t. randomness in 𝒛 𝑗 , and since the 𝒛 𝑗 is sampled from some discrete distribution 𝒑 𝑗 on finite support 𝐹 𝑗 , \totv 𝑗 ⁢ ( 𝜌 𝑗 ) is simply a weighted average over all ( \totv 𝑗 ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 ) ) 𝒛 𝑗 ∈ 𝐹 𝑗 with weights in 𝒑 𝑗 , so \totv 𝑗 ⁢ ( 𝜌 𝑗 ) is also continuous, piecewise linear, strictly increasing, and concave, and thus can be written as in Lemma LABEL:lem:Vstruct with parameters { ( 𝑠 𝑗 , 𝑛 , 𝑏 𝑗 , 𝑛 , 𝑟 𝑗 , 𝑛 ) } 𝑛 ∈ [ 𝑆 𝑗 ] that only depend on the support 𝐹 𝑗 and distribution 𝒑 𝑗 .

Finally, according to the definition of ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont )

𝔼 ⁢ [ ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont ; 𝒛 𝑗 ) ] and ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont ; 𝒛 𝑗 )

( 1 + 𝜆 ) ⁢ \totv 𝑗 ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 ) − ( 𝜆 ⁢ 𝛾 + 𝜇 ) ⁢ 𝜌 𝑗 as defined in Eq. (LABEL:eq:Lagrangian), we have

ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont )

( 1 + 𝜆 ) ⁢ \totv 𝑗 ⁢ ( 𝜌 𝑗 ) − ( 𝜆 ⁢ 𝛾 + 𝜇 ) ⁢ 𝜌 𝑗 (20)

which implies ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont ) is continuous, piecewise linear, and concave because \totv 𝑗 ⁢ ( 𝜌 𝑗 ) is continuous, piecewise linear, and concave as shown above. Combining Eq. (20) and the representation of \totv 𝑗 ⁢ ( 𝜌 𝑗 ) in Lemma (LABEL:lem:Vstruct), we have

ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont )

∑ 𝑛 ∈ [ 𝑆 𝑗 ] ( \slope 𝑗 , 𝑛 ⁢ ( \cont ) ⁢ 𝜌 𝑗 + ( 1 + 𝜆 ) ⁢ 𝑏 𝑗 , 𝑛 ) ⁢ 𝕀 ⁢ { 𝑟 𝑗 , 𝑛 − 1 ≤ 𝜌 𝑗 ≤ 𝑟 𝑗 , 𝑛 } . (21)

where the slope \slope 𝑗 , 𝑛 ⁢ ( \cont )

( 1 + 𝜆 ) ⁢ 𝑠 ⁢ 𝑗 , 𝑛 − ( 𝜇 + 𝛾 ⁢ 𝜆 ) decreases in 𝑛 . Thus at the point 𝑟 𝑗 , 𝑛 *

max ⁡ { 𝑟 𝑗 , 𝑛 : 𝑛

0 , 1 ⁢ … , 𝑆 𝑗 , \slope 𝑗 , 𝑛 ⁢ ( \cont ) ≥ 0 } in which the slope to the right turns negative for the first time, ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont ) takes its maximum value max 𝜌 𝑗 ≥ 0 ⁡ ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont ) , because to the left of 𝑟 𝑗 , 𝑛 * , namely the region [ 0 , 𝑟 𝑗 , 𝑛 * ] , ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont ) strictly increases because slopes are positive; and to the right of 𝑟 𝑗 , 𝑛 * , namely the region [ 𝑟 𝑗 , 𝑛 * , 𝜌 ] , ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont ) strictly decreases because slopes are negative. ∎

7.4 Proof for Lemma LABEL:lem:dualvarub

Recall the definition of the Lagrangian function ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont ; 𝒛 𝑗 )

( 1 + 𝜆 ) ⁢ \totv 𝑗 ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 ) − ( 𝜆 ⁢ 𝛾 + 𝜇 ) ⁢ 𝜌 𝑗 in Eq.(LABEL:eq:Lagrangian). Then, since \totv 𝑗 ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 ) ≤ \totv ¯ , and 𝜆 𝑡 , 𝜇 𝑡 ∈ [ 0 , \dualub 𝐹 ] for any period 𝑡 ∈ [ 𝑇 ] and per-channel budget 𝜌 𝑗 ∈ [ 0 , 𝜌 ] , we can conclude − ( 1 + 𝛾 ) ⁢ 𝜌 ⁢ \dualub 𝐹 ≤ ℒ 𝑗 ⁢ ( 𝜌 𝑗 , 𝜆 𝑡 , 𝜇 𝑡 ) ≤ ( 1 + \dualub 𝐹 ) ⁢ \totv ¯ . ∎

7.5 Proof for Lemma LABEL:lem:ucbregret

In the following, instead of bounding ∑ 𝑡 ∈ [ 𝜏 𝐴 ] ℒ 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 * , \cont 𝑡 ) − ℒ 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 , \cont 𝑡 ) , we bound ∑ 𝑡 ∈ [ 𝑇 ] ℒ 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 * , \cont 𝑡 ) − ℒ 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 , \cont 𝑡 ) where we consider the hypothetical scenario in which we ignore the termination criteria for the while loop in Algorithm LABEL:alg:omducb, and continue to set per-channel budgets based on steps 4-6 in the algorithm until the end of period 𝑇 . This is due to the fact that ∑ 𝑡 ∈ [ 𝑇 ] ℒ 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 * , \cont 𝑡 ) − ℒ 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 , \cont 𝑡 ) ≥ ∑ 𝑡 ∈ [ 𝜏 𝐴 ] ℒ 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 * , \cont 𝑡 ) − ℒ 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 , \cont 𝑡 ) .

We fix some channel 𝑗 ∈ [ 𝑀 ] and omit the subscript 𝑗 when the context is clear. Also, we first introduce some definitions that will be used throughout our proof. Fix some positive constant \slope ¯

0 whose value we choose later, and recall 𝑎 𝑘 denotes the 𝑘 th arm in the discretized budget set \budgset ⁢ ( \discerr ) as we defined in Eq. (LABEL:eq:budgetdisc). Then we define the following

Δ 𝑘 ⁢ ( \cont )

max 𝜌 𝑗 ∈ [ 0 , 𝜌 ] ⁡ ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont ) − ℒ 𝑗 ⁢ ( 𝑎 𝑘 , \cont )

\contset 𝑛

{ \cont ∈ { \cont 𝑡 } 𝑡 ∈ [ 𝑇 ] : 𝑟 𝑗 , 𝑛

arg ⁡ max 𝜌 𝑗 ≥ 0 ⁡ ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont ) } ⁢ for ⁢ 𝑛

0 ⁢ … ⁢ 𝑆 𝑗

\contset ⁢ ( \slope ¯ )

{ \cont ∈ { \cont 𝑡 } 𝑡 ∈ [ 𝑇 ] : \slope 𝑗 − ⁢ ( \cont ) > \slope ¯ , | \slope 𝑗 + ⁢ ( \cont ) | > \slope ¯ } ⁢ for ⁢ 𝑛

0 , … , 𝑆 𝑗

𝑚 𝑘 ⁢ ( \cont )

8 ⁢ log ⁡ ( 𝑇 ) Δ 𝑘 2 ⁢ ( \cont ) ⁢ for ⁢ ∀ ( 𝑘 , \cont ) ⁢ s.t. ⁢ Δ 𝑘 ⁢ ( \cont )

0 . (22)

Here, the “adjacent slopes” \slope 𝑗 − ⁢ ( \cont ) and \slope 𝑗 + ⁢ ( \cont ) , which are defined in Eq.(LABEL:eq:adj), represent the slopes that are adjacent to the optimal budget arg ⁡ max 𝜌 𝑗 ∈ [ 0 , 𝜌 ] ⁡ ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont ) for any context \cont

( 𝜆 , 𝜇 ) . Further, 𝑆 𝑗 and { 𝑟 𝑗 , 𝑛 } 𝑗 ∈ [ 𝑆 𝑗 ] are defined in Lemma LABEL:lem:Vstruct. Here we state in words the meanings of Δ 𝑘 ⁢ ( \cont ) , \contset ⁢ ( \slope ¯ ) and \contset 𝑛 , respectively.

•

Δ 𝑘 ⁢ ( \cont ) denotes the loss in contextual bandit rewards when pulling arm 𝑎 𝑘 under context \cont .

•

\contset 𝑛 is the set including all context \cont 𝑡 under which the optimal per-channel budget arg ⁡ max 𝜌 𝑗 ≥ 0 ⁡ ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont 𝑡 ) is taken at the 𝑛 th “turning point” 𝑟 𝑗 , 𝑛 (see Lemma LABEL:lem:Vstruct).

•

\contset ⁢ ( \slope ¯ ) is the set of all contexts, in which the adjacent slopes to the optimal point w.r.t. the context \cont , namely arg ⁡ max 𝜌 𝑗 ≥ 0 ⁡ ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont ) , have magnitude greater than \slope ¯ , or in other words, the adjacent slopes are steep.

On a related note, for any context \cont , we define the following “adjacent regions” that sandwich the optimal budget w.r.t. \cont

\adj 𝑗 − ⁢ ( \cont )

[ 𝑟 𝑗 , 𝑛 − 1 , 𝑟 𝑗 , 𝑛 ] ⁢ and ⁢ \adj 𝑗 + ⁢ ( \cont )

[ 𝑟 𝑗 , 𝑛 , 𝑟 𝑗 , 𝑛 + 1 ] if \cont ∈ \contset 𝑛 . (23)

In other words, if \cont ∈ \contset 𝑛 , per the definition of \contset 𝑛 above, arg ⁡ max 𝜌 𝑗 ∈ [ 0 , 𝜌 ] ⁡ ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont ) is located at the 𝑛 th “turning point” 𝑟 𝑗 , 𝑛 , then \adj 𝑗 − ⁢ ( \cont ) and \adj 𝑗 − ⁢ ( \cont ) are respectively the left and right regions surrounding 𝑟 𝑗 , 𝑛 .

With the above definitions, we demonstrate how to bound the UCB-error. Define 𝑁 𝑘 , 𝑡

∑ 𝜏 ≤ 𝑡 − 1 𝕀 ⁢ { 𝜌 𝑗 , 𝜏

𝑎 𝑘 } to be the number of times arm 𝑘 is pulled up to time 𝑡 , then we can decompose the UCB error as followed

∑ 𝑡 > 𝐾 ℒ 𝑗 ⁢ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) , \cont 𝑡 ) − ℒ 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 , \cont 𝑡 )

𝑋 1 + 𝑋 2 + 𝑋 3 where

𝑋 1

∑ 𝑡 > 𝐾 : \cont 𝑡 ∉ \contset ⁢ ( \slope ¯ ) ∑ 𝑘 ∈ [ 𝐾 ] Δ 𝑘 ⁢ ( \cont 𝑡 ) ⁢ 𝕀 ⁢ { 𝜌 𝑗 , 𝑡

𝑎 𝑘 , 𝑁 𝑘 , 𝑡 ≤ 𝑚 𝑘 ⁢ ( \cont 𝑡 ) }

𝑋 2

∑ 𝑡 > 𝐾 : \cont 𝑡 ∈ \contset ⁢ ( \slope ¯ ) ∑ 𝑘 ∈ [ 𝐾 ] Δ 𝑘 ⁢ ( \cont 𝑡 ) ⁢ 𝕀 ⁢ { 𝜌 𝑗 , 𝑡

𝑎 𝑘 , 𝑁 𝑘 , 𝑡 ≤ 𝑚 𝑘 ⁢ ( \cont 𝑡 ) }

𝑋 3

∑ 𝑘 ∈ [ 𝐾 ] ∑ 𝑡 > 𝐾 Δ 𝑘 ⁢ ( \cont 𝑡 ) ⁢ 𝕀 ⁢ { 𝜌 𝑗 , 𝑡

𝑎 𝑘 , 𝑁 𝑘 , 𝑡

𝑚 𝑘 ⁢ ( \cont 𝑡 ) } . (24)

In Section 7.5.1, we show that 𝑋 1 ≤ 𝒪 ~ ⁢ ( \discerr ⁢ 𝑇 + \slope ¯ ⁢ 𝑇 + 1 \discerr ) ; in Section 7.5.2 we show that 𝑋 2 ≤ 𝒪 ~ ⁢ ( \discerr ⁢ 𝑇 + 1 \discerr ⁢ \slope ¯ ) ; in Section 7.5.3 we show that 𝑋 3 ≤ 𝒪 ~ ⁢ ( 1 𝛿 ⁢ 𝑇 ) .

Remark

In the following sections 7.5.1, 7.5.2 and 7.5.3 where we bound 𝑋 1 , 𝑋 2 , and 𝑋 3 , respectively, we assume the optimal per-channel 𝜌 𝑗 * ⁢ ( 𝑡 )

arg ⁡ max 𝜌 𝑗 ∈ [ 0 , 𝜌 ] ⁡ ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont 𝑡 ) lies in the arm set \budgset ⁢ ( \discerr ) for all 𝑡 . This is because otherwise, we can consider the following decomposition of the UCB error in period 𝑡 as followed:

ℒ 𝑗 ⁢ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) , \cont 𝑡 ) − ℒ 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 , \cont 𝑡 )

ℒ 𝑗 ⁢ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) , \cont 𝑡 ) − ℒ 𝑗 ⁢ ( 𝑎 𝑡 * , \cont 𝑡 ) + ℒ 𝑗 ⁢ ( 𝑎 𝑡 * , \cont 𝑡 ) − ℒ 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 , \cont 𝑡 ) where ⁢ 𝑎 𝑡 *

arg ⁡ max 𝑎 𝑘 ∈ \budgset ⁢ ( \discerr ) ⁡ ℒ 𝑗 ⁢ ( 𝑎 𝑘 , \cont 𝑡 )

The first term will yield an error in the order of 𝒪 ⁢ ( \discerr ) due to the Lagrangian function being unimodal, piecewise linear liner, which implies | 𝑎 𝑡 * − 𝜌 𝑗 * ⁢ ( 𝑡 ) | ≤ \discerr so that ℒ 𝑗 ⁢ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) , \cont 𝑡 ) − ℒ 𝑗 ⁢ ( 𝑎 𝑡 * , \cont 𝑡 )

𝒪 ⁢ ( \discerr ) . Hence, this “discretization error” will accumulate to a magnitude of 𝒪 ⁢ ( \discerr ⁢ 𝑇 ) over 𝑇 periods, which leads to an additional error that is already accounted for in the statement of the lemma.

7.5.1 Bounding 𝑋 1 .

Our strategy to bound 𝑋 1 consists of 4 steps, namely bounding the loss of arm 𝑎 𝑘 at each context \cont ∉ \contset ⁢ ( \slope ¯ ) when 𝑎 𝑘 ∈ \adj 𝑗 − ⁢ ( \cont ) lies on the left adjacent region of the optimal budget; 𝑎 𝑘 < min ⁡ \adj 𝑗 − ⁢ ( \cont ) lies to the left of the left adjacent region; 𝑎 𝑘 ∈ \adj 𝑗 + ⁢ ( \cont ) lies on the right adjacent region of the optimal budget; and 𝑎 𝑘

max ⁡ \adj 𝑗 + ⁢ ( \cont ) lies to the right of the right adjacent region. Here we recall the adjacent regions are defined in Eq.(23).

Step 1: 𝑎 𝑘 ∈ \adj 𝑗 − ⁢ ( \cont 𝑡 ) . For arm 𝑘 such that 𝑎 𝑘 ∈ \adj 𝑗 − ⁢ ( \cont 𝑡 ) , recall Lemma LABEL:lem:Vstruct that ℒ 𝑗 ⁢ ( 𝑎 , \cont 𝑡 ) is linear in 𝑎 for 𝑎 ∈ \adj 𝑗 − ⁢ ( \cont 𝑡 ) , so Δ 𝑘 ⁢ ( \cont 𝑡 )

\slope 𝑗 − ⁢ ( \cont 𝑡 ) ⋅ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) − 𝑎 𝑘 ) ≤ \slope ¯ ⁢ 𝜌 where we used the condition that \cont 𝑡 ∉ \contset ⁢ ( \slope ¯ ) so the adjacent slopes have magnitude at most \slope ¯ , and 𝜌 𝑗 * ⁢ ( 𝑡 ) ≤ 𝜌 . Thus, summing over all such 𝑘 we get

∑ 𝑡 > 𝐾 : \cont 𝑡 ∉ \contset ⁢ ( \slope ¯ ) ∑ 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘 ∈ \adj 𝑗 − ⁢ ( \cont 𝑡 ) Δ 𝑘 ⁢ ( \cont 𝑡 ) ⁢ 𝕀 ⁢ { 𝜌 𝑗 , 𝑡

𝑎 𝑘 , 𝑁 𝑘 , 𝑡 ≤ 𝑚 𝑘 ⁢ ( \cont 𝑡 ) }

≤
∑ 𝑡 > 𝐾 : \cont 𝑡 ∉ \contset ⁢ ( \slope ¯ ) ∑ 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘 ∈ \adj 𝑗 − ⁢ ( \cont 𝑡 ) \slope ¯ ⁢ 𝜌 ⋅ 𝕀 ⁢ { 𝜌 𝑗 , 𝑡

𝑎 𝑘 , 𝑁 𝑘 , 𝑡 ≤ 𝑚 𝑘 ⁢ ( \cont 𝑡 ) } ≤ \slope ¯ ⁢ 𝜌 ⁢ 𝑇

𝒪 ⁢ ( \slope ¯ ⁢ 𝑇 ) . (25)

Step 2: 𝑎 𝑘 < min ⁡ \adj 𝑗 − ⁢ ( \cont 𝑡 ) . For arm 𝑘 such that 𝑎 𝑘 < min ⁡ \adj 𝑗 − ⁢ ( \cont 𝑡 ) , we further split contexts into groups \contset 𝑛 for 𝑛

0 ⁢ … ⁢ 𝑆 𝑗 (defined in Eq. (22)) based on whether the corresponding optimal budget w.r.t. the Lagrangian at the context is taken at the 𝑛 th “turning point” (see Figure LABEL:fig:lagr of illustration). Then, for each context group 𝑛 by defining 𝑘 ′ := max ⁡ { 𝑘 : 𝑎 𝑘 < 𝑟 𝑗 , 𝑛 − 1 } to be the arm closest to and less than 𝑟 𝑗 , 𝑛 − 1 , we have

∑ 𝑡 > 𝐾 : \cont 𝑡 ∈ \contset 𝑛 / \contset ⁢ ( \slope ¯ ) ∑ 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘 < min ⁡ \adj 𝑗 − ⁢ ( \cont 𝑡 ) Δ 𝑘 ⁢ ( \cont 𝑡 ) ⁢ 𝕀 ⁢ { 𝜌 𝑗 , 𝑡

𝑎 𝑘 , 𝑁 𝑘 , 𝑡 ≤ 𝑚 𝑘 ⁢ ( \cont 𝑡 ) }

= ( 𝑖 )
∑ 𝑡 > 𝐾 : \cont 𝑡 ∈ \contset 𝑛 / \contset ⁢ ( \slope ¯ ) ∑ 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘 < 𝑟 𝑗 , 𝑛 − 1 Δ 𝑘 ⁢ ( \cont 𝑡 ) ⁢ 𝕀 ⁢ { 𝜌 𝑗 , 𝑡

𝑎 𝑘 , 𝑁 𝑘 , 𝑡 ≤ 𝑚 𝑘 ⁢ ( \cont 𝑡 ) }

=
∑ 𝑡 > 𝐾 ∑ \cont ∈ \contset 𝑛 / \contset ⁢ ( \slope ¯ ) ∑ 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘 < 𝑟 𝑗 , 𝑛 − 1 Δ 𝑘 ⁢ ( \cont ) ⁢ 𝕀 ⁢ { \cont 𝑡

\cont , 𝜌 𝑗 , 𝑡

𝑎 𝑘 , 𝑁 𝑘 , 𝑡 ≤ 𝑚 𝑘 ⁢ ( \cont ) }

≤ ( 𝑖 ⁢ 𝑖 )
∑ 𝑡 > 𝐾 ∑ \cont ∈ \contset 𝑛 / \contset ⁢ ( \slope ¯ ) ( Δ 𝑘 ′ ⁢ ( \cont ) ⁢ 𝕀 ⁢ { \cont 𝑡

\cont } + ∑ 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘 < 𝑟 𝑗 , 𝑛 − 1 − \discerr Δ 𝑘 ⁢ ( \cont ) ⁢ 𝕀 ⁢ { \cont 𝑡

\cont , 𝜌 𝑗 , 𝑡

𝑎 𝑘 , 𝑁 𝑘 , 𝑡 ≤ 𝑚 𝑘 ⁢ ( \cont ) } )

≤ ( 𝑖 ⁢ 𝑖 ⁢ 𝑖 )

( ( 1 + \dualub 𝐹 ) ⁢ 𝑠 𝑗 , 𝑛 − 1 ⁢ \discerr + 𝜌 ⁢ \slope ¯ ) ⁢ 𝑇 + ∑ 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘 < 𝑟 𝑗 , 𝑛 − 1 − \discerr ∑ \cont ∈ \contset 𝑛 / \contset ⁢ ( \slope ¯ ) Δ 𝑘 ⁢ ( \cont ) ⁢ 𝑌 𝑘 ⁢ ( \cont ) (26)

where in the final equality we defined 𝑌 𝑘 ⁢ ( \cont )

∑ 𝑡 > 𝐾 𝕀 ⁢ { \cont 𝑡

\cont , 𝜌 𝑗 , 𝑡

𝑎 𝑘 , 𝑁 𝑘 , 𝑡 ≤ 𝑚 𝑘 ⁢ ( \cont ) } . In (i) we used the fact that the left end of the left adjacent region, i.e. min ⁡ \adj 𝑗 − ⁢ ( \cont 𝑡 ) is exactly 𝑟 𝑗 , 𝑛 − 1 because for context \cont 𝑡 ∈ \contset 𝑛 the optimal budget arg ⁡ max 𝜌 𝑗 ∈ [ 0 , 𝜌 ] ⁡ ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont 𝑡 ) is at the 𝑛 th turning point; in (ii) we used the definition 𝑘 ′ := max ⁡ { 𝑘 : 𝑎 𝑘 < 𝑟 𝑗 , 𝑛 − 1 } where we recall arms are indexed such that 𝑎 1 < 𝑎 2 < ⋯ < 𝑎 𝐾 . Note that in (ii) we separate out the arm 𝑎 𝑘 ′ because its distance to the optimal per-channel may be less than \discerr since it is the closest arm, and thus we ensure all other arms indexed by 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘 < 𝑟 𝑗 , 𝑛 − 1 − \discerr , are at least \discerr away from the optimal per-channel budget; (iii) follows from the fact that under a context \cont ∈ \contset 𝑛 / \contset ⁢ ( \slope ¯ ) , we have arg ⁡ max 𝜌 𝑗 ∈ [ 0 , 𝜌 ] ⁡ ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont )

𝑟 𝑗 , 𝑛 so

Δ 𝑘 ′ ⁢ ( \cont )

ℒ 𝑗 ⁢ ( 𝑟 𝑗 , 𝑛 , \cont ) − ℒ 𝑗 ⁢ ( 𝑟 𝑗 , 𝑛 − 1 , \cont ) + ℒ 𝑗 ⁢ ( 𝑟 𝑗 , 𝑛 − 1 , \cont ) − ℒ 𝑗 ⁢ ( 𝑎 𝑘 ′ , \cont )

\slope 𝑗 − ⁢ ( \cont ) ⁢ ( 𝑟 𝑗 , 𝑛 − 𝑟 𝑗 , 𝑛 − 1 ) + \slope 𝑗 , 𝑛 − 1 ⁢ ( \cont ) ⁢ ( 𝑟 𝑗 , 𝑛 − 1 − 𝑎 𝑘 ′ )

≤ ( 𝑖 ⁢ 𝑣 )

\slope ¯ ⁢ 𝜌 + \slope 𝑗 , 𝑛 − 1 ⁢ ( \cont ) ⁢ \discerr

≤ ( 𝑣 )

\slope ¯ ⁢ 𝜌 + ( 1 + \dualub 𝐹 ) ⁢ 𝑠 𝑗 , 𝑛 − 1 ⁢ \discerr ,

where in (iv) we used \cont ∈ \contset 𝑛 / \contset ⁢ ( \slope ¯ ) implies \slope 𝑗 − ⁢ ( \cont ) ≤ \slope ¯ , as well as all 𝑟 𝑗 , 𝑛 ≤ 𝜌 for any 𝑛 and the fact that 𝑘 ′ lies on the line segment between points 𝑟 𝑗 , 𝑛 − 2 and 𝑟 𝑗 , 𝑛 − 1 since \discerr < min 𝑛 ′ ∈ [ 𝑆 𝑗 ] ⁡ 𝑟 𝑗 , 𝑛 ′ − 𝑟 𝑗 , 𝑛 ′ − 1 ; in (v) we recall \slope 𝑗 , 𝑛 − 1 ⁢ ( \cont )

( 1 + 𝜆 ) ⁢ 𝑠 𝑗 , 𝑛 − 1 − ( 𝜇 + 𝛾 ⁢ 𝜆 ) ≤ ( 1 + \dualub 𝐹 ) ⁢ 𝑠 𝑗 , 𝑛 − 1 where \dualub 𝐹 is defined in Lemma LABEL:lem:dualvarub.

We now bound ∑ \cont ∈ \contset 𝑛 / \contset ⁢ ( \slope ¯ ) Δ 𝑘 ⁢ ( \cont ) ⁢ 𝑌 𝑘 ⁢ ( \cont ) in Eq. (26). It is easy to see the following inequality for any sequence of context \cont ( 1 ) , … , \cont ( ℓ ) ∈ { \cont 𝑡 } 𝑡 ∈ [ 𝑇 ] (This is a slight generalization of an inequality result shown in balseiro2019contextual):

𝑌 𝑘 ⁢ ( \cont ( 1 ) ) + ⋯ + 𝑌 𝑘 ⁢ ( \cont ( ℓ ) ) ≤ max ℓ ′

1 ⁢ … ⁢ ℓ ⁡ 𝑚 𝑘 ⁢ ( \cont ( ℓ ′ ) ) . (27)

This is because

∑ ℓ ′ ∈ [ ℓ ] 𝑌 𝑘 ⁢ ( \cont ( ℓ ′ ) )

∑ 𝑡 > 𝐾 ∑ ℓ ′ ∈ [ ℓ ] 𝕀 ⁢ { \cont 𝑡

\cont ( ℓ ′ ) , 𝜌 𝑗 , 𝑡

𝑎 𝑘 , 𝑁 𝑘 , 𝑡 ≤ 𝑚 𝑘 ⁢ ( \cont ( ℓ ′ ) ) }

≤
∑ 𝑡 > 𝐾 ∑ ℓ ′ ∈ [ ℓ ] 𝕀 ⁢ { \cont 𝑡

\cont ( ℓ ′ ) , 𝜌 𝑗 , 𝑡

𝑎 𝑘 , 𝑁 𝑘 , 𝑡 ≤ max ℓ ′

1 ⁢ … ⁢ ℓ ⁡ 𝑚 𝑘 ⁢ ( \cont ( ℓ ′ ) ) }

∑ 𝑡 > 𝐾 𝕀 ⁢ { \cont 𝑡 ∈ { \cont ( ℓ ′ ) } ℓ ′ ∈ [ ℓ ] , 𝜌 𝑗 , 𝑡

𝑎 𝑘 , 𝑁 𝑘 , 𝑡 ≤ max ℓ ′

1 ⁢ … ⁢ ℓ ⁡ 𝑚 𝑘 ⁢ ( \cont ( ℓ ′ ) ) }

≤
max ℓ ′

1 ⁢ … ⁢ ℓ ⁡ 𝑚 𝑘 ⁢ ( \cont ( ℓ ′ ) ) .

For simplicity denote 𝐿

| \contset 𝑛 / \contset ⁢ ( \slope ¯ ) | , and order contexts in \cont ∈ \contset 𝑛 / \contset ⁢ ( \slope ¯ ) as { \cont ( ℓ ) } ℓ ∈ [ 𝐿 ] s.t. Δ 𝑘 ⁢ ( \cont ( 1 ) ) > Δ 𝑘 ⁢ ( \cont ( 2 ) ) > ⋯ > Δ 𝑘 ⁢ ( \cont ( 𝐿 ) ) , or equivalently 𝑚 𝑘 ⁢ ( \cont ( 1 ) ) < 𝑚 𝑘 ⁢ ( \cont ( 2 ) ) < ⋯ < 𝑚 𝑘 ⁢ ( \cont ( 𝐿 ) ) according to Eq.(22). Then multiplying Eq. (27) by by Δ 𝑘 ⁢ ( \cont ( ℓ ) ) − Δ 𝑘 ⁢ ( \cont ( ℓ + 1 ) ) (which is strictly positive due to the ordering of contexts), and summing ℓ

1 ⁢ … ⁢ 𝐿 we get

∑ \cont ∈ \contset 𝑛 / \contset ⁢ ( \slope ¯ ) Δ 𝑘 ⁢ ( \cont ) ⁢ 𝑌 𝑘 ⁢ ( \cont )

∑ ℓ ∈ [ 𝐿 ] Δ 𝑘 ⁢ ( \cont ( ℓ ) ) ⁢ 𝑌 𝑘 ⁢ ( \cont ( ℓ ) ) ≤ ∑ ℓ ∈ [ 𝐿 ] 𝑚 𝑘 ⁢ ( \cont ( ℓ ) ) ⁢ ( Δ 𝑘 ⁢ ( \cont ( ℓ ) ) − Δ 𝑘 ⁢ ( \cont ( ℓ + 1 ) ) )

= ( 𝑖 )

8 ⁢ log ⁡ ( 𝑇 ) ⁢ ∑ ℓ ∈ [ 𝐿 − 1 ] Δ 𝑘 ⁢ ( \cont ( ℓ ) ) − Δ 𝑘 ⁢ ( \cont ( ℓ + 1 ) ) Δ 𝑘 2 ⁢ ( \cont ( ℓ ) ) ⁢ ≤ ( 𝑖 ⁢ 𝑖 ) ⁢ 8 ⁢ log ⁡ ( 𝑇 ) ⁢ ∫ Δ 𝑘 ⁢ ( \cont ( 𝐿 ) ) ∞ 𝑑 ⁢ 𝑧 𝑧 2

=
8 ⁢ log ⁡ ( 𝑇 ) Δ 𝑘 ⁢ ( \cont ( 𝐿 ) ) ⁢

( 𝑖 ⁢ 𝑖 ⁢ 𝑖 ) ⁢ 8 ⁢ log ⁡ ( 𝑇 ) min \cont ∈ \contset 𝑛 / \contset ⁢ ( \slope ¯ ) ⁡ Δ 𝑘 ⁢ ( \cont ) . (28)

Here (i) follows from the definition of 𝑚 𝑘 ⁢ ( 𝒄 ) in Eq. (22) where 𝑚 𝑘 ⁢ ( \cont )

8 ⁢ log ⁡ ( 𝑇 ) Δ 𝑘 2 ⁢ ( \cont ) ; both (ii) and (iii) follow from the ordering of contexts so that Δ 𝑘 ⁢ ( \cont ( 1 ) )

Δ 𝑘 ⁢ ( \cont ( 2 ) )

⋯

Δ 𝑘 ⁢ ( \cont ( 𝐿 ) ) . Note that for any \cont ∈ \contset 𝑛 / \contset ⁢ ( \slope ¯ ) and arm 𝑘 such that 𝑎 𝑘 < 𝑟 𝑗 , 𝑛 − 1 , we have

Δ 𝑘 ⁢ ( \cont )

ℒ 𝑗 ⁢ ( 𝑟 𝑗 , 𝑛 , \cont ) − ℒ 𝑗 ⁢ ( 𝑟 𝑗 , 𝑛 − 1 , \cont ) + ℒ 𝑗 ⁢ ( 𝑟 𝑗 , 𝑛 − 1 , \cont ) − ℒ 𝑗 ⁢ ( 𝑎 𝑘 , \cont )

ℒ 𝑗 ⁢ ( 𝑟 𝑗 , 𝑛 − 1 , \cont ) − ℒ 𝑗 ⁢ ( 𝑎 𝑘 , \cont )

≥ ( 𝑖 )

\slope 𝑗 , 𝑛 − 1 ⁢ ( \cont ) ⁢ ( 𝑟 𝑗 , 𝑛 − 1 − 𝑎 𝑘 )

≥ ( 𝑖 ⁢ 𝑖 )

( \slope 𝑗 , 𝑛 − 1 ⁢ ( \cont ) − \slope 𝑗 , 𝑛 ⁢ ( \cont ) ) ⁢ ( 𝑟 𝑗 , 𝑛 − 1 − 𝑎 𝑘 )

= ( 𝑖 ⁢ 𝑖 ⁢ 𝑖 )

( 1 + 𝜆 ) ⁢ ( 𝑠 𝑗 , 𝑛 − 1 − 𝑠 𝑗 , 𝑛 ) ⁢ ( 𝑟 𝑗 , 𝑛 − 1 − 𝑎 𝑘 )

( 𝑠 𝑗 , 𝑛 − 1 − 𝑠 𝑗 , 𝑛 ) ⁢ ( 𝑟 𝑗 , 𝑛 − 1 − 𝑎 𝑘 ) , (29)

where in (i) we recall the slope \slope 𝑗 , 𝑛 − 1 ⁢ ( \cont ) is defined in Lemma LABEL:lem:Vstruct and further (i) follows from concavity of ℒ 𝑗 ⁢ ( 𝜌 ⁢ 𝑗 , \cont ) in the first argument 𝜌 𝑗 ; in (ii) we used the fact that \slope 𝑗 , 𝑛 ⁢ ( \cont ) ≥ 0 since the optimal budget arg ⁡ max 𝜌 𝑗 ∈ [ 0 , 𝜌 ] ⁡ ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont ) is taken at the 𝑛 th turning point, and is the largest turning point whose left slope is non-negative from Lemma LABEL:lem:Vstruct; (iii) follows from the definition \slope 𝑗 , 𝑛 ′ ⁢ ( \cont )

( 1 + 𝜆 ) ⁢ 𝑠 𝑗 , 𝑛 ′ − ( 𝜇 + 𝛾 ⁢ 𝜆 ) for any 𝑛 ′ .

Finally combining Eqs. (26), (28) and (29), and summing over 𝑛

1 ⁢ … ⁢ 𝑆 𝑗 we get

∑ 𝑡 > 𝐾 : \cont 𝑡 ∉ \contset ⁢ ( \slope ¯ ) ∑ 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘 < min ⁡ \adj 𝑗 − ⁢ ( \cont 𝑡 ) Δ 𝑘 ⁢ ( \cont 𝑡 ) ⁢ 𝕀 ⁢ { 𝜌 𝑗 , 𝑡

𝑎 𝑘 , 𝑁 𝑘 , 𝑡 ≤ 𝑚 𝑘 ⁢ ( \cont 𝑡 ) }

=
∑ 𝑛 ∈ [ 𝑆 𝑗 ] ∑ 𝑡 > 𝐾 : \cont 𝑡 ∈ \contset 𝑛 / \contset ⁢ ( \slope ¯ ) ∑ 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘 < min ⁡ \adj 𝑗 − ⁢ ( \cont 𝑡 ) Δ 𝑘 ⁢ ( \cont 𝑡 ) ⁢ 𝕀 ⁢ { 𝜌 𝑗 , 𝑡

𝑎 𝑘 , 𝑁 𝑘 , 𝑡 ≤ 𝑚 𝑘 ⁢ ( \cont 𝑡 ) }

≤

∑ 𝑛 ∈ [ 𝑆 𝑗 ] ( ( 1 + \dualub 𝐹 ) ⁢ 𝑠 𝑗 , 𝑛 − 1 ⁢ \discerr + 𝜌 ⁢ \slope ¯ ) ⁢ 𝑇 + ∑ 𝑛 ∈ [ 𝑆 𝑗 ] ∑ 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘 < 𝑟 𝑗 , 𝑛 − 1 − \discerr 8 ⁢ log ⁡ ( 𝑇 ) ( 𝑠 𝑗 , 𝑛 − 1 − 𝑠 𝑗 , 𝑛 ) ⁢ ( 𝑟 𝑗 , 𝑛 − 1 − 𝑎 𝑘 )

≤ ( 𝑖 )
∑ 𝑛 ∈ [ 𝑆 𝑗 ] ( ( 1 + \dualub 𝐹 ) ⁢ 𝑠 𝑗 , 𝑛 − 1 ⁢ \discerr + 𝜌 ⁢ \slope ¯ ) ⁢ 𝑇 + ∑ 𝑛 ∈ [ 𝑆 𝑗 ] ∑ ℓ

1 𝐾 8 ⁢ log ⁡ ( 𝑇 ) ( 𝑠 𝑗 , 𝑛 − 1 − 𝑠 𝑗 , 𝑛 ) ⁢ ℓ ⁢ \discerr

≤

∑ 𝑛 ∈ [ 𝑆 𝑗 ] ( ( 1 + \dualub 𝐹 ) ⁢ 𝑠 𝑗 , 𝑛 − 1 ⁢ \discerr + 𝜌 ⁢ \slope ¯ ) ⁢ 𝑇 + 8 ⁢ log ⁡ ( 𝑇 ) ⁢ log ⁡ ( 𝐾 ) \discerr ⁢ ∑ 𝑛 ∈ [ 𝑆 𝑗 ] 1 ( 𝑠 𝑗 , 𝑛 − 1 − 𝑠 𝑗 , 𝑛 )

𝒪 ~ ⁢ ( \discerr ⁢ 𝑇 + \slope ¯ ⁢ 𝑇 + 1 \discerr ) . (30)

Note that (i) follows because for all 𝑎 𝑘 < 𝑟 𝑗 , 𝑛 − 1 − \discerr , the 𝑎 𝑘 ’s distances from 𝑟 𝑗 , 𝑛 − 1 are at least \discerr , 3 ⁢ \discerr , 2 ⁢ \discerr ⁢ … . In the last equation, we hide all logarithmic factors using the notation 𝒪 ~ , and note that the constants \dualub 𝐹 , ( 𝑠 𝑗 , 𝑛 ) 𝑛 ∈ 𝑆 𝑗 , 𝑆 𝑗 are all absolute constants that depend only on the support 𝐹 𝑗 and corresponding sampling distribution 𝒑 𝑗 for value-cost pairs; see definitions of these absolute constants in Lemmas LABEL:lem:Vstruct and LABEL:lem:dualvarub.

Step 3 and 4: 𝑎 𝑘 ∈ \adj 𝑗 + ⁢ ( \cont 𝑡 ) or 𝑎 𝑘

max ⁡ \adj 𝑗 + ⁢ ( \cont 𝑡 ) . The cases where arm 𝑎 𝑘 ∈ \adj 𝑗 + ⁢ ( \cont 𝑡 ) and 𝑎 𝑘

max ⁡ \adj 𝑗 + ⁢ ( \cont 𝑡 ) are symmetric to 𝑎 𝑘 ∈ \adj 𝑗 − ⁢ ( \cont 𝑡 ) and 𝑎 𝑘 < min ⁡ \adj 𝑗 + ⁢ ( \cont 𝑡 ) , respectively, and we omit from this paper.

Therefore, combining Eqs. (25) and (30) we can conclude

𝑋 1 ≤ 𝒪 ~ ⁢ ( \discerr ⁢ 𝑇 + \slope ¯ ⁢ 𝑇 + 1 \discerr ) . (31) 7.5.2 Bounding 𝑋 2 .

We first rewrite 𝑋 2 as followed

𝑋 2

∑ 𝑡 > 𝐾 : \cont 𝑡 ∈ \contset ⁢ ( \slope ¯ ) ∑ 𝑘 ∈ [ 𝐾 ] Δ 𝑘 ⁢ ( \cont 𝑡 ) ⁢ 𝕀 ⁢ { 𝜌 𝑗 , 𝑡

𝑎 𝑘 , 𝑁 𝑘 , 𝑡 ≤ 𝑚 𝑘 ⁢ ( \cont 𝑡 ) }

=
∑ 𝑡 > 𝐾 ∑ 𝑛 ∈ [ 𝑆 𝑗 ] ∑ 𝑘 ∈ [ 𝐾 ] ∑ \cont ∈ \contset 𝑛 ∩ \contset ⁢ ( \slope ¯ ) Δ 𝑘 ⁢ ( \cont ) ⁢ 𝕀 ⁢ { \cont 𝑡

\cont , 𝜌 𝑗 , 𝑡

𝑎 𝑘 , 𝑁 𝑘 , 𝑡 ≤ 𝑚 𝑘 ⁢ ( \cont ) }

= ( 𝑖 )

∑ 𝑛 ∈ [ 𝑆 𝑗 ] ∑ 𝑘 ∈ [ 𝐾 ] ∑ \cont ∈ \contset 𝑛 ∩ \contset ⁢ ( \slope ¯ ) Δ 𝑘 ⁢ ( \cont ) ⁢ 𝑌 𝑘 ⁢ ( \cont )

= ( 𝑖 ⁢ 𝑖 )

∑ 𝑛 ∈ [ 𝑆 𝑗 ] ∑ \cont ∈ \contset 𝑛 ∩ \contset ⁢ ( \slope ¯ ) ∑ 𝑘 ∈ { 𝑘 𝑛 − , 𝑘 𝑛 + } Δ 𝑘 ⁢ ( \cont ) ⁢ 𝑌 𝑘 ⁢ ( \cont ) + ∑ 𝑛 ∈ [ 𝑆 𝑗 ] ∑ \cont ∈ \contset 𝑛 ∩ \contset ⁢ ( \slope ¯ ) ∑ 𝑘 ∈ [ 𝐾 ] / { 𝑘 𝑛 − , 𝑘 𝑛 + } Δ 𝑘 ⁢ ( \cont ) ⁢ 𝑌 𝑘 ⁢ ( \cont )

≤ ( 𝑖 ⁢ 𝑖 ⁢ 𝑖 )

𝑇 ⁢ \discerr ⁢ ( 1 + \dualub 𝐹 ) ⁢ ∑ 𝑛 ∈ [ 𝑆 𝑗 ] ( 𝑠 𝑗 , 𝑛 + 𝑠 𝑗 , 𝑛 + 1 ) + ∑ 𝑛 ∈ [ 𝑆 𝑗 ] ∑ \cont ∈ \contset 𝑛 ∩ \contset ⁢ ( \slope ¯ ) ∑ 𝑘 ∈ [ 𝐾 ] / { 𝑘 𝑛 − , 𝑘 𝑛 + } Δ 𝑘 ⁢ ( \cont ) ⁢ 𝑌 𝑘 ⁢ ( \cont ) . (32)

where in (i) we define 𝑌 𝑘 ⁢ ( \cont )

∑ 𝑡 > 𝐾 𝕀 ⁢ { \cont 𝑡

\cont , 𝜌 𝑗 , 𝑡

𝑎 𝑘 , 𝑁 𝑘 , 𝑡 ≤ 𝑚 𝑘 ⁢ ( \cont ) } ; in (ii) we separate out two arms 𝑘 𝑛 − and 𝑘 𝑛 + defined as followed: recall for context \cont ∈ \contset 𝑛 ∩ \contset ⁢ ( \slope ¯ ) , the optimal budget arg ⁡ max 𝜌 𝑗 ∈ [ 0 , 𝜌 ] ⁡ ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont )

𝑟 𝑗 , 𝑛 is taken at the 𝑛 th turning point per the definition of \contset 𝑛 in Eq. (22), and thereby we defined 𝑘 𝑛 − := max ⁡ { 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘 < 𝑟 𝑗 , 𝑛 } to be the arm closest to and no greater than 𝑟 𝑗 , 𝑛 , whereas 𝑘 𝑛 + := min ⁡ { 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘 > 𝑟 𝑗 , 𝑛 } to be the arm closest to and no less than 𝑟 𝑗 , 𝑛 ; in (iii), for small enough \discerr < min 𝑛 ′ ∈ [ 𝑆 𝑗 ] ⁡ 𝑟 𝑗 , 𝑛 ′ − 𝑟 𝑗 , 𝑛 ′ − 1 , we know that 𝑘 𝑛 − lies on the line segment between 𝑟 𝑗 , 𝑛 − 1 and 𝑟 𝑗 , 𝑛 , so Δ 𝑘 𝑛 − ⁢ ( \cont )

\slope 𝑗 − ⁢ ( \cont ) ⁢ ( 𝑟 𝑗 , 𝑛 − 𝑎 𝑘 𝑛 − ) ≤ \slope 𝑗 − ⁢ ( \cont ) ⁢ \discerr ≤ ( 1 + \dualub 𝐹 ) ⁢ 𝑠 𝑗 , 𝑛 − 1 ⁢ \discerr , where in the final inequality follows from the definition of \slope 𝑗 − ⁢ ( \cont )

\slope 𝑗 , 𝑛 ⁢ ( \cont )

( 1 + 𝜆 ) ⁢ 𝑠 𝑗 , 𝑛 − ( 𝜇 + 𝛾 ⁢ 𝜆 ) ≤ ( 1 + 𝜆 ) ⁢ 𝑠 𝑗 , 𝑛 ≤ ( 1 + \dualub 𝐹 ) ⁢ 𝑠 𝑗 , 𝑛 where \dualub 𝐹 is defined in Eq. (LABEL:lem:dualvarub). A similar bound holds for Δ 𝑘 𝑛 + ⁢ ( \cont ) .

Then, following the same logic as Eqs. (27), (28), (29) in Section 7.5.1 where we bound 𝑋 1 , we can bound ∑ \cont ∈ \contset 𝑛 ∩ \contset ⁢ ( \slope ¯ ) Δ 𝑘 ⁢ ( \cont ) ⁢ 𝑌 𝑘 ⁢ ( \cont ) as followed for any arm 𝑘 ∈ [ 𝐾 ] / { 𝑘 𝑛 − , 𝑘 𝑛 + } , i.e. arms who are at least \discerr away from the optimal per-channel budget w.r.t. \cont :

∑ \cont ∈ \contset 𝑛 ∩ \contset ⁢ ( \slope ¯ ) Δ 𝑘 ⁢ ( \cont ) ⁢ 𝑌 𝑘 ⁢ ( \cont ) ≤ 8 ⁢ log ⁡ ( 𝑇 ) min \cont ∈ \contset 𝑛 ∩ \contset ⁢ ( \slope ¯ ) ⁡ Δ 𝑘 ⁢ ( \cont ) . (33)

Now, the set 𝑘 ∈ [ 𝐾 ] / { 𝑘 𝑛 − , 𝑘 𝑛 + } in Eq. (32) can be further split into two subsets, namely { 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘 < 𝑟 𝑗 , 𝑛 − \discerr } and { 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘

𝑟 𝑗 , 𝑛 + \discerr } due to the definitions 𝑘 𝑛 − := max ⁡ { 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘 < 𝑟 𝑗 , 𝑛 } and 𝑘 𝑛 + := min ⁡ { 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘

𝑟 𝑗 , 𝑛 } . Therefore, for any 𝑘 s.t. 𝑎 𝑘 < 𝑟 𝑗 , 𝑛 − \discerr and any \cont ∈ \contset 𝑛 ∩ \contset ⁢ ( \slope ¯ ) ,

Δ 𝑘 ⁢ ( \cont )

ℒ 𝑗 ⁢ ( 𝑟 𝑗 , 𝑛 , \cont ) − ℒ 𝑗 ⁢ ( 𝑎 𝑘 , \cont ) ≥ \slope 𝑗 − ⁢ ( \cont ) ⁢ ( 𝑟 𝑗 , 𝑛 − 𝑎 𝑘 ) ≥ \slope ¯ ⁢ ( 𝑟 𝑗 , 𝑛 − 𝑎 𝑘 ) ,

where the final inequality follows from the definition of \contset ⁢ ( \slope ¯ ) in Eq. (22) such that \slope 𝑗 − ( \cont ) ≥ ( ¯ \slope ) for \cont ∈ \contset ⁢ ( \slope ¯ ) . Hence combining this with Eq. (33) we have

∑ 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘 < 𝑟 𝑗 , 𝑛 − \discerr ∑ \cont ∈ \contset 𝑛 ∩ \contset ⁢ ( \slope ¯ ) Δ 𝑘 ⁢ ( \cont ) ⁢ 𝑌 𝑘 ⁢ ( \cont ) ≤ ∑ 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘 < 𝑟 𝑗 , 𝑛 − \discerr 8 ⁢ log ⁡ ( 𝑇 ) \slope ¯ ⁢ ( 𝑟 𝑗 , 𝑛 − 𝑎 𝑘 ) ⁢ ≤ ( 𝑖 ) ⁢ ∑ ℓ

1 𝐾 8 ⁢ log ⁡ ( 𝑇 ) \slope ¯ ⁢ ℓ ⁢ \discerr ≤ 8 ⁢ log ⁡ ( 𝑇 ) ⁢ log ⁡ ( 𝐾 ) \slope ¯ ⁢ \discerr , (34)

where (i) follows because for all 𝑎 𝑘 < 𝑟 𝑗 , 𝑛 − \discerr , the 𝑎 𝑘 ’s distances from 𝑟 𝑗 , 𝑛 − 1 are at least \discerr , 3 ⁢ \discerr , 2 ⁢ \discerr ⁢ … . Symmetrically, we can show an identical bound for the set { 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘

𝑟 𝑗 , 𝑛 + \discerr } . Hence, combining Eqs. (32) and (34) we can conclude

𝑋 2 ≤ 𝒪 ~ ⁢ ( \discerr ⁢ 𝑇 + 1 \discerr ⁢ \slope ¯ ) . (35)

Here, similar to our bound in Eq. (30) for bounding 𝑋 1 , we hide all logarithmic factors using the notation 𝒪 ~ , and note that the constants \dualub 𝐹 , ( 𝑠 𝑗 , 𝑛 ) 𝑛 ∈ 𝑆 𝑗 , 𝑆 𝑗 are all absolute constants that depend only on the support 𝐹 𝑗 and corresponding sampling distribution 𝒑 𝑗 for value-cost pairs; see definitions of these absolute constants in Lemma LABEL:lem:Vstruct and LABEL:lem:dualvarub.

7.5.3 Bounding 𝑋 3 .

We first define

ℒ ¯

( 1 + 𝛾 ) ⁢ 𝜌 ⁢ \dualub 𝐹 + ( 1 + \dualub 𝐹 ) ⁢ \totv ¯ (36)

where \dualub 𝐹 is specified in Lemma LABEL:lem:dualvarub. Recalling the definition Δ 𝑘 ⁢ ( \cont )

max 𝜌 𝑗 ∈ [ 0 , 𝜌 ] ⁡ ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont ) − ℒ 𝑗 ⁢ ( 𝑎 𝑘 , \cont ) in Eq. (22), and − ( 1 + 𝛾 ) ⁢ 𝜌 ⁢ \dualub 𝐹 ≤ ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont ) ≤ ( 1 + \dualub 𝐹 ) ⁢ \totv ¯ for any 𝜌 𝑗 ∈ [ 0 , 𝜌 ] and context \cont (see Lemma LABEL:lem:dualvarub), it is easy to see

Δ 𝑘 ⁢ ( \cont ) ≤ ℒ ¯ ∀ 𝑘 ∈ [ 𝐾 ] , ∀ \cont . (37)

Then we bound 𝑋 3 as followed

𝑋 3

∑ 𝑘 ∈ [ 𝐾 ] ∑ 𝑡 > 𝐾 𝔼 ⁢ [ Δ 𝑘 ⁢ ( \cont ) ⁢ 𝕀 ⁢ { 𝜌 𝑗 , 𝑡

𝑎 𝑘 , 𝑁 𝑘 , 𝑡

𝑚 𝑘 ⁢ ( \cont ) } ]

≤ ( 𝑖 )
ℒ ¯ ⋅ ∑ 𝑘 ∈ [ 𝐾 ] ∑ 𝑡 > 𝐾 ℙ ⁢ ( 𝜌 𝑗 , 𝑡

𝑎 𝑘 , 𝑁 𝑘 , 𝑡

𝑚 𝑘 ⁢ ( \cont 𝑡 ) )

≤ ( 𝑖 ⁢ 𝑖 )

ℒ ¯ ⋅ ∑ 𝑘 ∈ [ 𝐾 ] ∑ 𝑡

𝐾 ℙ ( \totv ^ 𝑗 , 𝑡 ( 𝑎 𝑘 ) − 𝜆 𝑡 ⁢ 𝛾 + 𝜇 𝑡 1 + 𝜆 𝑡 𝑎 𝑘 + \ucb 𝑗 , 𝑡 ( 𝑎 𝑘 ) ≥ \totv ^ 𝑗 , 𝑡 ( 𝜌 𝑗 * ( 𝑡 ) ) − 𝜆 𝑡 ⁢ 𝛾 + 𝜇 𝑡 1 + 𝜆 𝑡 𝜌 𝑗 * ( 𝑡 ) + \ucb 𝑗 , 𝑡 ( 𝜌 𝑗 * ( 𝑡 ) ) ,

𝑁 𝑘 , 𝑡

𝑚 𝑘 ( \cont 𝑡 ) ) , (38)

where (i) follows from Eq. (37); in (ii), recall that we choose arm 𝜌 𝑗 , 𝑡

𝑎 𝑘 because the estimated UCB rewards of arm 𝑎 𝑘 is greater than that of any other arm including 𝜌 𝑗 * ⁢ ( 𝑡 ) according to the UCB-SGD (Algorithm LABEL:alg:omducb), or mathematically, \totv ^ 𝑗 , 𝑡 ⁢ ( 𝑎 𝑘 ) − 𝜆 𝑡 ⁢ 𝛾 + 𝜇 𝑡 1 + 𝜆 𝑡 ⁢ 𝑎 𝑘 + \ucb 𝑗 , 𝑡 ⁢ ( 𝑎 𝑘 ) ≥ \totv ^ 𝑗 , 𝑡 ⁢ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) ) − 𝜆 𝑡 ⁢ 𝛾 + 𝜇 𝑡 1 + 𝜆 𝑡 ⁢ 𝜌 𝑗 * ⁢ ( 𝑡 ) + \ucb 𝑗 , 𝑡 ⁢ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) ) . Here we also used the fact that 𝜌 𝑗 * ⁢ ( 𝑡 ) lies in the arm set \budgset ⁢ ( \discerr ) for all 𝑡 (see Remark Remark).

Now let \ucbtot ^ 𝑛 ⁢ ( 𝑎 𝑘 ) denote the average conversion of arm 𝑘 over its first 𝑛 pulls, i.e.

\ucbtot ^ 𝑛 ⁢ ( 𝑎 𝑘 )

\totv ^ 𝑗 , 𝜏 ⁢ ( 𝑎 𝑘 ) ⁢ for ⁢ 𝜏

min ⁡ { 𝑡 ∈ [ 𝑇 ] : 𝑁 𝑘 , 𝑡

𝑛 } (39)

where we recall \totv ^ 𝑗 , 𝜏 ⁢ ( 𝑎 𝑘 ) is the estimated conversion for arm 𝑎 𝑘 in channel 𝑗 during period 𝜏 as defined in Algorithm LABEL:alg:omducb. In other words, 𝜏 is the period during which arm 𝑎 𝑘 is pulled for the 𝑛 th time so \ucbtot ^ 𝑛 ⁢ ( 𝑎 𝑘 )

\totv ^ 𝑗 , 𝜏 ⁢ ( 𝑎 𝑘 ) .

Hence, we continue with Eq. (38) as followed:

ℙ ⁢ ( \totv ^ 𝑗 , 𝑡 ⁢ ( 𝑎 𝑘 ) − 𝜆 𝑡 ⁢ 𝛾 + 𝜇 𝑡 1 + 𝜆 𝑡 ⁢ 𝑎 𝑘 + \ucb 𝑗 , 𝑡 ⁢ ( 𝑎 𝑘 ) ≥ \totv ^ 𝑗 , 𝑡 ⁢ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) ) − 𝜆 𝑡 ⁢ 𝛾 + 𝜇 𝑡 1 + 𝜆 𝑡 ⁢ 𝜌 𝑗 * ⁢ ( 𝑡 ) + \ucb 𝑗 , 𝑡 ⁢ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) ) , 𝑁 𝑘 , 𝑡

𝑚 𝑘 ⁢ ( \cont 𝑡 ) )

≤

ℙ ( max 𝑛 : 𝑚 𝑘 ⁢ ( \cont 𝑡 ) < 𝑛 ≤ 𝑡 { \ucbtot ^ 𝑛 ( 𝑎 𝑘 ) + \ucb 𝑛 ( 𝑎 𝑘 ) − 𝜆 𝑡 ⁢ 𝛾 + 𝜇 𝑡 1 + 𝜆 𝑡 𝑎 𝑘 }

≥ min 𝑛 ′ : 1 ≤ 𝑛 ′ ≤ 𝑡 { \ucbtot ^ 𝑛 ′ ( 𝜌 𝑗 * ( 𝑡 ) ) + \ucb 𝑛 ′ ( 𝜌 𝑗 * ( 𝑡 ) ) − 𝜆 𝑡 ⁢ 𝛾 + 𝜇 𝑡 1 + 𝜆 𝑡 𝜌 𝑗 * ( 𝑡 ) } )

≤
∑ 𝑛

⌈ 𝑚 𝑘 ⁢ ( \cont 𝑡 ) ⌉ + 1 𝑡 ∑ 𝑛 ′

1 𝑡 ℙ ⁢ ( \ucbtot ^ 𝑛 ⁢ ( 𝑎 𝑘 ) + \ucb 𝑛 ⁢ ( 𝑎 𝑘 ) − 𝜆 𝑡 ⁢ 𝛾 + 𝜇 𝑡 1 + 𝜆 𝑡 ⁢ 𝑎 𝑘

\ucbtot ^ 𝑛 ′ ⁢ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) ) + \ucb 𝑛 ′ ⁢ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) ) − 𝜆 𝑡 ⁢ 𝛾 + 𝜇 𝑡 1 + 𝜆 𝑡 ⁢ 𝜌 𝑗 * ⁢ ( 𝑡 ) ) (40)

Now, when the event { \ucbtot ^ 𝑛 ⁢ ( 𝑎 𝑘 ) + \ucb 𝑛 ⁢ ( 𝑎 𝑘 ) − 𝜆 𝑡 ⁢ 𝛾 + 𝜇 𝑡 1 + 𝜆 𝑡 ⁢ 𝑎 𝑘

\ucbtot ^ 𝑛 ′ ⁢ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) ) + \ucb 𝑛 ′ ⁢ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) ) − 𝜆 𝑡 ⁢ 𝛾 + 𝜇 𝑡 1 + 𝜆 𝑡 ⁢ 𝜌 𝑗 * ⁢ ( 𝑡 ) } occurs, it is easy to see that one of the following events must also occur:

𝒢 1 , 𝑛

{ \ucbtot ¯ 𝑛 ⁢ ( 𝑎 𝑘 ) ≥ \totv ⁢ ( 𝑎 𝑘 ) + \ucb 𝑛 ⁢ ( 𝑎 𝑘 ) } for ⁢ 𝑛 ⁢ s.t. ⁢ 𝑚 𝑘 ⁢ ( \cont 𝑡 ) < 𝑛 ≤ 𝑡

𝒢 2 , 𝑛 ′

{ \ucbtot ¯ 𝑛 ′ ⁢ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) ) ≤ \totv ⁢ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) ) − \ucb 𝑛 ⁢ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) ) } for ⁢ 𝑛 ′ ⁢ s.t. ⁢ 1 ≤ 𝑛 ′ ≤ 𝑡

𝒢 3

{ \totv 𝑗 ⁢ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) ) − 𝜆 𝑡 ⁢ 𝛾 + 𝜇 𝑡 1 + 𝜆 𝑡 ⁢ 𝜌 𝑗 * ⁢ ( 𝑡 ) < \totv 𝑗 ⁢ ( 𝑎 𝑘 ) − 𝜆 𝑡 ⁢ 𝛾 + 𝜇 𝑡 1 + 𝜆 𝑡 ⁢ 𝑎 𝑘 + 2 ⋅ \ucb 𝑛 ⁢ ( 𝑎 𝑘 ) } (41)

Note that for 𝑛 > 𝑚 𝑘 ⁢ ( \cont 𝑡 ) , we have \ucb 𝑛 ⁢ ( 𝑎 𝑘 )

2 ⁢ log ⁡ ( 𝑇 ) 𝑛 < 2 ⁢ log ⁡ ( 𝑇 ) 𝑚 𝑘 ⁢ ( \cont 𝑡 )

Δ 𝑘 ⁢ ( \cont 𝑡 ) 2 since we defined 𝑚 𝑘 ⁢ ( \cont )

8 ⁢ log ⁡ ( 𝑇 ) Δ 𝑘 2 ⁢ ( \cont ) in Eq. (22). Therefore

\totv 𝑗 ⁢ ( 𝑎 𝑘 ) − 𝜆 𝑡 ⁢ 𝛾 + 𝜇 𝑡 1 + 𝜆 𝑡 ⁢ 𝑎 𝑘 + 2 ⋅ \ucb 𝑛 ⁢ ( 𝑎 𝑘 ) < \totv 𝑗 ⁢ ( 𝑎 𝑘 ) − 𝜆 𝑡 ⁢ 𝛾 + 𝜇 𝑡 1 + 𝜆 𝑡 ⁢ 𝑎 𝑘 ⏟

ℒ ⁢ ( 𝑎 𝑘 , \cont 𝑡 ) + Δ 𝑘 ⁢ ( \cont 𝑡 ) ⁢

( 𝑖 ) ⁢ \totv 𝑗 ⁢ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) ) − 𝜆 𝑡 ⁢ 𝛾 + 𝜇 𝑡 1 + 𝜆 𝑡 ⁢ 𝜌 𝑗 * ⁢ ( 𝑡 ) ⏟

ℒ ⁢ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) , \cont 𝑡 ) ⁣

max 𝑎 ∈ \budgset ⁢ ( \discerr ) ⁡ ℒ ⁢ ( 𝑎 , \cont 𝑡 )

where (i) follows from the definition of Δ 𝑘 ⁢ ( \cont )

max 𝑎 ∈ \budgset ⁢ ( \discerr ) ⁡ ℒ ⁢ ( 𝑎 , \cont ) − ℒ ⁢ ( 𝑎 𝑘 , \cont ) in Eq. (22) for any context \cont . This implies that event 𝒢 3 in Eq. (41) cannot hold for 𝑛

𝑚 𝑘 ⁢ ( \cont 𝑡 ) . Therefore

ℙ ⁢ ( \ucbtot ^ 𝑛 ⁢ ( 𝑎 𝑘 ) + \ucb 𝑛 ⁢ ( 𝑎 𝑘 ) − 𝜆 𝑡 ⁢ 𝛾 + 𝜇 𝑡 1 + 𝜆 𝑡 ⁢ 𝑎 𝑘

\ucbtot ^ 𝑛 ′ ⁢ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) ) + \ucb 𝑛 ′ ⁢ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) ) − 𝜆 𝑡 ⁢ 𝛾 + 𝜇 𝑡 1 + 𝜆 𝑡 ⁢ 𝜌 𝑗 * ⁢ ( 𝑡 ) ) ≤ ℙ ⁢ ( 𝒢 1 , 𝑛 ∪ 𝒢 2 , 𝑛 ′ ) . (42)

From the standard UCB analysis and the Azuma Hoeffding’s inequality, we have ℙ ⁢ ( 𝒢 1 , 𝑛 ) ≤ \totv ¯ 𝑇 4 and ℙ ⁢ ( 𝒢 2 , 𝑛 ′ ) ≤ \totv ¯ 𝑇 4 . Hence combining Eqs. (38) (40), (42) we can conclude

𝑋 3 ≤
∑ 𝑘 ∈ [ 𝐾 ] ∑ 𝑡 > 𝐾 ∑ 𝑛

⌈ 𝑚 𝑘 ⁢ ( \cont 𝑡 ) ⌉ + 1 𝑡 ∑ 𝑛 ′

1 𝑡 ( ℙ ⁢ ( 𝒢 1 , 𝑛 ) + ℙ ⁢ ( 𝒢 2 , 𝑛 ′ ) )

≤
∑ 𝑘 ∈ [ 𝐾 ] ∑ 𝑡 > 𝐾 ∑ 𝑛

⌈ 𝑚 𝑘 ⁢ ( \cont 𝑡 ) ⌉ + 1 𝑡 ∑ 𝑛 ′

1 𝑡 2 ⁢ \totv ¯ 𝑇 4

≤
2 ⁢ 𝐾 ⁢ \totv ¯ 𝑇

𝒪 ⁢ ( 1 \discerr ⁢ 𝑇 ) . (43)

∎

7.6 Proof for Theorem LABEL:thm:regret

Starting from Proposition LABEL:prop:ubbenchlag, we get

𝑇 ⋅ GL-OPT − 𝔼 ⁢ [ ∑ 𝑡 ∈ [ 𝑇 ] ∑ 𝑗 ∈ [ 𝑀 ] \totv 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 ) ]

≤

𝑀 ⁢ \totv ¯ ⁢ ( 𝑇 − 𝜏 𝐴 ) + ∑ 𝑗 ∈ [ 𝑀 ] 𝔼 ⁢ [ ∑ 𝑡 ∈ [ 𝜏 𝐴 ] ℒ 𝑗 ⁢ ( 𝜌 𝑗 * ⁢ ( 𝑡 ) , \cont 𝑡 ) − ℒ 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 , \cont 𝑡 ) ] + 𝔼 ⁢ [ ∑ 𝑡 ∈ [ 𝜏 𝐴 ] ( 𝜆 𝑡 ⁢ 𝑔 1 , 𝑡 + 𝜇 𝑡 ⁢ 𝑔 2 , 𝑡 ) ]

≤ ( 𝑖 )

𝑀 ⁢ \totv ¯ ⁢ ( 𝑇 − 𝜏 𝐴 ) + 𝒪 ⁢ ( \slope ¯ ⁢ 𝑇 + \discerr ⁢ 𝑇 + 1 \slope ¯ ⁢ \discerr ) + 𝒪 ⁢ ( 𝜂 ⁢ 𝑇 + 1 𝜂 ) (44)

where in (i) we applied Lemma LABEL:lem:ucbregret and LABEL:lem:boundcs. Taking 𝜂

1 / 𝑇 , \discerr

\slope ¯

𝑇 − 1 / 3 (i.e. 𝐾

𝒪 ⁢ ( 𝑇 1 / 3 ) yields 𝑇 ⋅ GL-OPT − 𝔼 ⁢ [ ∑ 𝑡 ∈ [ 𝑇 ] ∑ 𝑗 ∈ [ 𝑀 ] \totv 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 ) ] ≤ 𝒪 ⁢ ( 𝑇 2 / 3 ) . According to Lemma LABEL:lem:Vstruct, \totv 𝑗 ⁢ ( 𝜌 𝑗 ) is concave for all 𝑗 ∈ [ 𝑀 ] , so

𝒪 ⁢ ( 𝑇 − 1 / 3 ) ≥

GL-OPT − 1 𝑇 ⁢ ∑ 𝑡 ∈ [ 𝑇 ] 𝔼 ⁢ [ ∑ 𝑗 ∈ [ 𝑀 ] \totv 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 ) ]

≥

GL-OPT − 𝔼 ⁢ [ ∑ 𝑗 ∈ [ 𝑀 ] \totv 𝑗 ⁢ ( 1 𝑇 ⁢ ∑ 𝑡 ∈ [ 𝑇 ] 𝜌 𝑗 , 𝑡 ) ]

≥

GL-OPT − 𝔼 ⁢ [ ∑ 𝑗 ∈ [ 𝑀 ] \totv 𝑗 ⁢ ( 𝜌 ¯ 𝑗 , 𝑇 ) ] (45)

where in the final equality we used the definition 𝝆 ¯ 𝑇 as defined in Algorithm LABEL:alg:omducb.

Regarding ROI constraint satisfaction, consider

0 ⁢ ≤ ( 𝑖 )

1 𝑇 ⁢ ∑ 𝑡 ∈ [ 𝑇 ] 𝔼 ⁢ [ 𝑔 1 , 𝑡 ]

1 𝑇 ⁢ ∑ 𝑡 ∈ [ 𝑇 ] ∑ 𝑗 ∈ [ 𝑀 ] 𝔼 ⁢ [ \totv 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 ; 𝒛 𝑗 , 𝑡 ) − 𝛾 ⁢ 𝜌 𝑗 , 𝑡 ]

1 𝑇 ⁢ ∑ 𝑡 ∈ [ 𝑇 ] ∑ 𝑗 ∈ [ 𝑀 ] 𝔼 ⁢ [ \totv 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 ) − 𝛾 ⁢ 𝜌 𝑗 , 𝑡 ]

≤ ( 𝑖 ⁢ 𝑖 )

∑ 𝑗 ∈ [ 𝑀 ] 𝔼 ⁢ [ \totv 𝑗 ⁢ ( 1 𝑇 ⁢ ∑ 𝑡 ∈ [ 𝑇 ] 𝜌 𝑗 , 𝑡 ) − 𝛾 ⋅ 1 𝑇 ⁢ ∑ 𝑡 ∈ [ 𝑇 ] 𝜌 𝑗 , 𝑡 ]

∑ 𝑗 ∈ [ 𝑀 ] 𝔼 ⁢ [ \totv 𝑗 ⁢ ( 𝜌 ¯ 𝑗 , 𝑇 ) − 𝛾 ⁢ 𝜌 ¯ 𝑗 , 𝑇 ] . (46)

where (i) follows from Lemma 7.2; in (ii) we again applied concavity of \totv 𝑗 ⁢ ( 𝜌 𝑗 ) . We omit the analysis for the budget constraint as it is similar to the above. ∎

7.7 Additional Results for Section LABEL:sec:online Proposition 7.1

Assume Assumption LABEL:ass:feas holds, and recall 𝐳 𝑗

( 𝐯 𝑗 , 𝐝 𝑗 ) ∈ 𝐹 𝑗 is any realization of values and costs for channel 𝑗 ∈ [ 𝑀 ] . Then, for any channel 𝑗 ∈ [ 𝑀 ] , we have min 𝐳 𝑗 ∈ 𝐹 𝑗 ⁡ 𝑣 𝑗 , 1 𝑑 𝑗 , 1 > 𝛾 , where we recall the ordering 𝑣 𝑗 , 1 𝑑 𝑗 , 1 > 𝑣 𝑗 , 2 𝑑 𝑗 , 2 > ⋯ > 𝑣 𝑗 , 𝑚 𝑗 𝑑 𝑗 , 𝑚 𝑗 for any element 𝐳 𝑗

( 𝐯 𝑗 , 𝐝 𝑗 ) ∈ 𝐹 𝑗 (see Section LABEL:sec:online). Further, there exists some 𝜌 ~ ∈ ( 0 , 𝜌 ) s.t. for any per-channel budget 𝜌 𝑗 ≤ 𝜌 ~ , we have \totv 𝑗 ⁢ ( 𝜌 𝑗 ; 𝐳 𝑗 )

𝑣 𝑗 , 1 𝑑 𝑗 , 1 ⁢ 𝜌 𝑗

𝛾 ⁢ 𝜌 𝑗 for any 𝑗 ∈ [ 𝑀 ] .

Proof. Under Assumption LABEL:ass:feas, it is easy to see for any realization of value-cost pairs 𝒛 𝑗

( 𝒗 𝑗 , 𝒅 𝑗 ) there always exists an auction 𝑛 ∈ [ 𝑚 𝑗 ] whose value-to-cost ratio is at least 𝛾 , i.e. 𝑣 𝑗 , 𝑛

𝛾 ⁢ 𝑑 𝑗 , 𝑛 . Hence we know that 𝑣 𝑗 , 1 𝑑 𝑗 , 1 ≥ 𝑣 𝑗 , 𝑛 𝑑 𝑗 , 𝑛

𝛾 . Now, in Eq. (19) within the proof of Lemma LABEL:lem:Vstruct, we showed

\totv 𝑗 ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 )

𝒗 𝑗 ⊤ ⁢ 𝒙 𝑗 * ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 )

∑ 𝑛 ∈ [ 𝑚 𝑗 ] ( 𝑣 𝑗 , 𝑛 𝑑 𝑗 , 𝑛 ⁢ 𝜌 𝑗 + 𝑏 𝑗 , 𝑛 ) ⁢ 𝕀 ⁢ { 𝑑 𝑗 , 0 + ⋯ + 𝑑 𝑗 , 𝑛 − 1 ≤ 𝜌 𝑗 ≤ 𝑑 𝑗 , 0 + ⋯ + 𝑑 𝑗 , 𝑛 } ,

where 𝑑 𝑗 , 0

𝑣 𝑗 , 0

𝑏 𝑗 , 1

0 . This implies that for any 𝜌 𝑗 < 𝑑 𝑗 , 1 , we have \totv 𝑗 ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 )

𝑣 𝑗 , 1 𝑑 𝑗 , 1 ⁢ 𝜌 𝑗 > 𝛾 ⁢ 𝜌 𝑗 . Therefore, we can take 𝜌 ~

min 𝑗 ∈ [ 𝑀 ] ⁡ min 𝒛 𝑗 ∈ 𝐹 𝑗 ⁡ 𝑑 𝑗 , 1 , which ensures that for any 𝜌 𝑗 ≤ 𝜌 ~ and realization 𝒛 𝑗 ∈ 𝐹 𝑗 we have \totv 𝑗 ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 )

𝑣 𝑗 , 1 𝑑 𝑗 , 1 ⁢ 𝜌 𝑗

𝛾 ⁢ 𝜌 𝑗 for any channel 𝑗 ∈ [ 𝑀 ] . ∎

Lemma 7.2 (Constraint satisfaction)

Assume Assumption LABEL:ass:feas holds, and consider \safe

\safebudg

1 log ⁡ ( 𝑇 ) in Algorithm LABEL:alg:omducb. Then, for large enough 𝑇 we have

1 𝑇 ⁢ ∑ 𝑡 ∈ [ 𝑇 ] 𝑔 1 , 𝑡 ≥ 0 and 1 𝑇 ⁢ ∑ 𝑡 ∈ [ 𝑇 ] ∑ 𝑗 ∈ [ 𝑀 ] 𝜌 𝑗 , 𝑡 ≤ 𝜌 ,

where we recall 𝑔 1 , 𝑡

∑ 𝑗 ∈ [ 𝑀 ] ( \totv 𝑗 ⁢ ( 𝜌 𝑗 , 𝑡 ; 𝐳 𝑗 , 𝑡 ) − 𝛾 ⁢ 𝜌 𝑗 , 𝑡 ) .

Proof. Recall 𝜏 𝐴 ∈ [ 𝑇 ] defined in step 10 of Algorithm LABEL:alg:omducb.

If 𝜏 𝐴

𝑇 , then we know that Algorithm LABEL:alg:omducb does not exit the while loop, and therefore 𝑆 1 , 𝑡 − 𝛾 ⁢ 𝑀 ⁢ 𝜌 + \safe ⁢ \safebudg ⁢ ( 𝑇 − 𝑡 ) ≥ 0 for 𝑡

𝑇 , or equivalently 𝑆 1 , 𝑇 ≥ 𝛾 ⁢ 𝑀 ⁢ 𝜌 > 0 . Since we recall 𝑆 1 , 𝑇

∑ 𝑡 ∈ [ 𝑇 − 1 ] 𝑔 1 , 𝑡 , we can conclude that ∑ 𝑡 ∈ [ 𝑇 ] 𝑔 1 , 𝑡

𝑆 1 , 𝑇 + 𝑔 1 , 𝑇 ≥ 𝑀 ⁢ 𝜌 + 𝑔 1 , 𝑇 ≥ 0 since 𝑔 1 , 𝑇 ≥ − 𝛾 ⁢ 𝑀 ⁢ 𝜌 . Similarly, we also have 𝑆 2 , 𝑡 + 𝑀 ⁢ 𝜌 + \safebudg ⁢ ( 𝑇 − 𝑡 ) ≤ 𝜌 ⁢ 𝑇 for 𝑡

𝑇 , or equivalently 𝑆 2 , 𝑇 ≤ 𝜌 ⁢ 𝑇 − 𝑀 ⁢ 𝜌 where we used the fact that \safebudg

1 / log ⁡ ( 𝑇 ) < 𝜌 for large enough 𝑇 and 𝑀 ≥ 2 . Hence, recalling 𝑆 2 , 𝑇

∑ 𝑡 ∈ [ 𝑇 − 1 ] ∑ 𝑗 ∈ [ 𝑀 ] 𝜌 𝑗 , 𝑡 , we can conclude that ∑ 𝑡 ∈ [ 𝑇 ] ∑ 𝑗 ∈ [ 𝑀 ] 𝜌 𝑗 , 𝑡

𝑆 2 , 𝑇 + ∑ 𝑗 ∈ [ 𝑀 ] 𝜌 𝑗 , 𝑇 ≤ 𝜌 ⁢ 𝑇 − 𝑀 ⁢ 𝜌 + ∑ 𝑗 ∈ [ 𝑀 ] 𝜌 𝑗 , 𝑇 ≤ 𝜌 ⁢ 𝑇 since ∑ 𝑗 ∈ [ 𝑀 ] 𝜌 𝑗 , 𝑇 ≤ 𝑀 ⁢ 𝜌 .

If 𝜏 𝐴 < 𝑇 , then we know that at the “stopping time” 𝜏 𝐴 , the while loop in Algorithm LABEL:alg:omducb has not yet exited, so we have

𝑆 1 , 𝜏 𝐴 − 𝛾 ⁢ 𝑀 ⁢ 𝜌 + \safe ⁢ \safebudg ⁢ ( 𝑇 − 𝜏 𝐴 ) ≥ 0 ⁢ and ⁢ 𝑆 2 , 𝜏 𝐴 + 𝑀 ⁢ 𝜌 + 𝑀 ⁢ \safebudg ⁢ ( 𝑇 − 𝜏 𝐴 ) ≤ 𝜌 ⁢ 𝑇 (47)

Hence,

∑ 𝑡 ∈ [ 𝑇 ] 𝑔 1 , 𝑡

∑ 𝑡 ∈ [ 𝜏 𝐴 − 1 ] 𝑔 1 , 𝑡 + 𝑔 1 , 𝜏 𝐴 + ∑ 𝑡

𝜏 𝐴 + 1 𝑇 𝑔 1 , 𝑡

≥ ( 𝑖 )
𝛾 ⁢ 𝑀 ⁢ 𝜌 − \safe ⁢ \safebudg ⁢ ( 𝑇 − 𝜏 𝐴 ) + 𝑔 1 , 𝜏 𝐴 + ∑ 𝑡

𝜏 𝐴 + 1 𝑇 𝑔 1 , 𝑡

≥
𝛾 ⁢ 𝑀 ⁢ 𝜌 − \safe ⁢ \safebudg ⁢ ( 𝑇 − 𝜏 𝐴 ) − 𝛾 ⁢ 𝑀 ⁢ 𝜌 + ∑ 𝑡

𝜏 𝐴 + 1 𝑇 𝑔 1 , 𝑡

= ( 𝑖 ⁢ 𝑖 )
− \safe ⁢ \safebudg ⁢ ( 𝑇 − 𝜏 𝐴 ) + ∑ 𝑡

𝜏 𝐴 + 1 𝑇 ∑ 𝑗 ∈ [ 𝑀 ] ( \totv 𝑗 ⁢ ( \safebudg ; 𝒛 𝑗 , 𝑡 ) − 𝛾 ⁢ \safebudg )

≥ ( 𝑖 ⁢ 𝑖 ⁢ 𝑖 )
− \safe ⁢ \safebudg ⁢ ( 𝑇 − 𝜏 𝐴 ) + ∑ 𝑡

𝜏 𝐴 + 1 𝑇 ∑ 𝑗 ∈ [ 𝑀 ] ( \safebudg ⋅ min 𝒛 𝑗 ∈ 𝐹 𝑗 ⁡ 𝑣 𝑗 , 1 𝑑 𝑗 , 1 − 𝛾 ⁢ \safebudg )

− \safe ⁢ \safebudg ⁢ ( 𝑇 − 𝜏 𝐴 ) + ( 𝑇 − 𝜏 𝐴 ) ⁢ 𝑀 ⁢ ( \safebudg ⋅ min 𝒛 𝑗 ∈ 𝐹 𝑗 ⁡ 𝑣 𝑗 , 1 𝑑 𝑗 , 1 − 𝛾 ⁢ \safebudg )

≥ ( 𝑖 ⁢ 𝑣 )

0 (48)

where (i) follows from 𝑆 1 , 𝜏 𝐴 − 1

∑ 𝑡 ∈ [ 𝜏 𝐴 − 2 ] 𝑔 1 , 𝑡 and Eq. (47); (ii) follows from Algorithm LABEL:alg:omducb where we set 𝜌 𝑗 , 𝑡

\safebudg for all 𝑗 ∈ [ 𝑀 ] and 𝑡

𝜏 𝐴 + 1 ⁢ … ⁢ 𝑇 ; for (iii), assuming the 𝑗 th channel’s realized value cost pairs 𝒛 𝑗 , 𝑡 is the element 𝒛 𝑗 ∈ 𝐹 𝑗 , then Proposition 7.1 says \totv 𝑗 ⁢ ( \safebudg ; 𝒛 𝑗 , 𝑡 ) ≥ 𝑣 𝑗 , 1 𝑑 𝑗 , 1 ⁢ \safebudg since \safebudg

1 log ⁡ ( 𝑇 ) < 𝜌 ~ for large enough 𝑇 . Hence \totv 𝑗 ⁢ ( \safebudg ; 𝒛 𝑗 , 𝑡 ) ≥ min 𝒛 𝑗 ∈ 𝐹 𝑗 ⁡ 𝑣 𝑗 , 1 𝑑 𝑗 , 1 ⁢ \safebudg ; (iv) follows from the fact that min 𝒛 𝑗 ∈ 𝐹 𝑗 ⁡ 𝑣 𝑗 , 1 𝑑 𝑗 , 1 > 𝛾 according to Proposition 7.1, so 𝑀 ⁢ min 𝒛 𝑗 ∈ 𝐹 𝑗 ⁡ 𝑣 𝑗 , 1 𝑑 𝑗 , 1 ≥ 𝑀 ⁢ 𝛾 + \safe since \safe

1 log ⁡ ( 𝑇 ) ≤ 𝑀 ⁢ min 𝒛 𝑗 ∈ 𝐹 𝑗 ⁡ 𝑣 𝑗 , 1 𝑑 𝑗 , 1 − 𝑀 ⁢ 𝛾 for large enough 𝑇 .

Similarly, we have

∑ 𝑡 ∈ [ 𝑇 ] ∑ 𝑗 ∈ [ 𝑀 ] 𝜌 𝑗 , 𝑡

∑ 𝑡 ∈ [ 𝜏 𝐴 − 1 ] ∑ 𝑗 ∈ [ 𝑀 ] 𝜌 𝑗 , 𝑡 + ∑ 𝑗 ∈ [ 𝑀 ] 𝜌 𝑗 , 𝜏 𝐴 + ∑ 𝑡

𝜏 𝐴 + 1 𝑇 ∑ 𝑗 ∈ [ 𝑀 ] 𝜌 𝑗 , 𝑡

≤ ( 𝑖 )

𝜌 ⁢ 𝑇 − 𝑀 ⁢ 𝜌 − 𝑀 ⁢ \safebudg ⁢ ( 𝑇 − 𝜏 𝐴 ) + ∑ 𝑗 ∈ [ 𝑀 ] 𝜌 𝑗 , 𝜏 𝐴 + 𝑀 ⁢ ( 𝑇 − 𝜏 𝐴 ) ⁢ \safebudg

≤

𝜌 ⁢ 𝑇 − 𝑀 ⁢ 𝜌 − 𝑀 ⁢ \safebudg ⁢ ( 𝑇 − 𝜏 𝐴 ) + 𝑀 ⁢ 𝜌 + 𝑀 ⁢ ( 𝑇 − 𝜏 𝐴 ) ⁢ \safebudg

𝜌 ⁢ 𝑇 (49)

where (i) follows from 𝑆 2 , 𝜏 𝐴

∑ 𝑡 ∈ [ 𝜏 𝐴 − 1 ] ∑ 𝑗 ∈ [ 𝑀 ] 𝜌 𝑗 , 𝑡 and Eq. (47), as well as in Algorithm LABEL:alg:omducb we set 𝜌 𝑗 , 𝑡

\safebudg for all 𝑗 ∈ [ 𝑀 ] and 𝑡

𝜏 𝐴 , 𝜏 𝐴 + 1 ⁢ … ⁢ 𝑇 . ∎

Lemma 7.3

Let ( 𝜆 𝑡 , 𝜇 𝑡 ) 𝑡 ∈ [ 𝑇 ] be the dual variables generated by Algorithm LABEL:alg:omducb. Then for any 𝜆 , 𝜇 ∈ [ 0 , \dualub 𝐹 ] and 𝑡 ∈ [ 𝑇 ] we have

∑ 𝜏 ∈ [ 𝑡 ] ( 𝜆 𝜏 − 𝜆 ) ⁢ 𝑔 1 , 𝜏 ≤ 𝜂 ⁢ 𝑀 2 ⁢ \totv ¯ 2 2 ⋅ 𝑡 + 1 2 ⁢ 𝜂 ⁢ ( 𝜆 − 𝜆 1 ) 2

∑ 𝜏 ∈ [ 𝑡 ] ( 𝜇 𝜏 − 𝜇 ) ⁢ 𝑔 2 , 𝜏 ≤ 𝜂 ⁢ 𝜌 2 2 ⋅ 𝑡 + 1 2 ⁢ 𝜂 ⁢ ( 𝜇 − 𝜇 1 ) 2 . (50)

where we recall 𝑔 1 , 𝜏

∑ 𝑗 ∈ [ 𝑀 ] ( \totv 𝑗 , 𝜏 ⁢ ( 𝜌 𝑗 , 𝜏 ) − 𝛾 ⁢ 𝜌 𝑗 , 𝜏 ) and 𝑔 2 , 𝜏

𝜌 − ∑ 𝑗 ∈ [ 𝑀 ] 𝜌 𝑗 , 𝜏 .

Proof. We will show Eq. (50). Starting with the first inequality w.r.t. 𝜆 𝜏 ’s, we have

( 𝜆 𝜏 − 𝜆 ) ⁢ 𝑔 1 , 𝜏

( 𝜆 𝜏 + 1 − 𝜆 ) ⁢ 𝑔 1 , 𝜏 + ( 𝜆 𝜏 − 𝜆 𝜏 + 1 ) ⁢ 𝑔 1 , 𝜏 (51)

Since 𝜆 𝜏 + 1

Π [ 0 , \dualub 𝐹 ] ( 𝜆 𝜏 − 𝜂 𝑔 1 , 𝜏 ) +

arg min 𝜆 ∈ [ 0 , \dualub 𝐹 ] ( 𝜆 − ( 𝜆 𝜏 − 𝜂 𝑔 1 , 𝜏 ) ) 2 , we have

( 𝜆 𝜏 + 1 − ( 𝜆 𝜏 − 𝜂 ⁢ 𝑔 1 , 𝜏 ) ) ⋅ ( 𝜆 − 𝜆 𝜏 + 1 ) ≥ 0 ∀ 𝜆 ∈ [ 0 , \dualub 𝐹 ] . (52)

So we have

( 𝜆 𝜏 + 1 − 𝜆 ) ⁢ 𝑔 1 , 𝜏 ≤

1 𝜂 ⁢ ( 𝜆 𝜏 + 1 − 𝜆 𝜏 ) ⋅ ( 𝜆 − 𝜆 𝜏 + 1 )

1 2 ⁢ 𝜂 ⁢ ( ( 𝜆 − 𝜆 𝜏 ) 2 − ( 𝜆 − 𝜆 𝜏 + 1 ) 2 − ( 𝜆 𝜏 + 1 − 𝜆 𝜏 ) 2 ) . (53)

Plugging the above back into Eq. (51) we get

( 𝜆 𝜏 − 𝜆 ) ⁢ 𝑔 1 , 𝜏 ≤

( 𝜆 𝜏 − 𝜆 𝜏 + 1 ) ⁢ 𝑔 1 , 𝜏 + 1 2 ⁢ 𝜂 ⁢ ( ( 𝜆 − 𝜆 𝜏 ) 2 − ( 𝜆 − 𝜆 𝜏 + 1 ) 2 − ( 𝜆 𝜏 + 1 − 𝜆 𝜏 ) 2 )

≤

𝜂 2 ⁢ 𝑔 1 , 𝜏 2 + 1 2 ⁢ 𝜂 ⁢ ( ( 𝜆 − 𝜆 𝜏 ) 2 − ( 𝜆 − 𝜆 𝜏 + 1 ) 2 )

≤

𝜂 ⁢ 𝑀 2 ⁢ \totv ¯ 2 2 + 1 2 ⁢ 𝜂 ⁢ ( ( 𝜆 − 𝜆 𝜏 ) 2 − ( 𝜆 − 𝜆 𝜏 + 1 ) 2 ) , (54)

where the final inequality follows from the fact that \totv 𝑗 , 𝜏 ⁢ ( 𝜌 𝑗 , 𝜏 ) ≤ \totv ¯ for any 𝑗 ∈ [ 𝑀 ] and 𝜏 ∈ [ 𝑡 ] so 𝑔 1 , 𝜏 ≤ 𝑀 ⁢ \totv ¯ . Summing the above over 𝜏

1 ⁢ … ⁢ 𝑡 and telescoping we get

∑ 𝜏 ∈ [ 𝑡 ] ( 𝜆 𝜏 − 𝜆 ) ⁢ 𝑔 1 , 𝜏 ≤ 𝜂 ⁢ 𝑀 2 ⁢ \totv ¯ 2 2 ⋅ 𝑡 + 1 2 ⁢ 𝜂 ⁢ ( 𝜆 − 𝜆 1 ) 2 for ⁢ ∀ 𝜆 ∈ [ 0 , \dualub 𝐹 ] .

Following the same arguments above we can show

∑ 𝜏 ∈ [ 𝑡 ] ( 𝜇 𝜏 − 𝜇 ) ⁢ 𝑔 2 , 𝜏 ≤ 𝜂 ⁢ 𝜌 2 2 ⋅ 𝑇 + 1 2 ⁢ 𝜂 ⁢ ( 𝜇 − 𝜇 1 ) 2 for ⁢ ∀ 𝜇 ∈ [ 0 , \dualub 𝐹 ] .

∎

Proposition 7.4

Under Assumption LABEL:ass:feas, the advertiser’s per-channel only budget optimization problem, namely CH-OPT ⁢ ( ℐ \budg ) is a convex problem.

Proof. Recalling the CH-OPT ⁢ ( ℐ \budg ) in Eq. (3) and the definition of ℐ \budg in Eq. (2), we can write CH-OPT ⁢ ( ℐ \budg ) as

CH-OPT ⁢ ( ℐ \budg )

max ( 𝛾 𝑗 ) 𝑗 ∈ [ 𝑀 ] ∈ ℐ

∑ 𝑗 ∈ 𝑀 \totv 𝑗 ⁢ ( 𝜌 𝑗 )

s.t.

∑ 𝑗 ∈ 𝑀 \totv 𝑗 ⁢ ( 𝜌 𝑗 ) ≥ 𝛾 ⁢ ∑ 𝑗 ∈ 𝑀 𝜌 𝑗

∑ 𝑗 ∈ [ 𝑀 ] 𝜌 𝑗 ≤ 𝜌 , (55)

Here we used the definition \totv 𝑗 ⁢ ( 𝜌 𝑗 )

𝔼 ⁢ [ \totv 𝑗 ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 ) ] in Eq. (5), and \totc 𝑗 ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 )

𝜌 𝑗 for any 𝒛 𝑗 under Assumption LABEL:ass:feas. According to Lemma LABEL:lem:Vstruct, \totv 𝑗 ⁢ ( 𝜌 𝑗 ) is concave in 𝜌 𝑗 for any 𝑗 , so the objective of CH-OPT ⁢ ( ℐ \budg ) maximizes a concave function. For the feasibility region, assume 𝜌 𝑗 and 𝜌 𝑗 ′ are feasible, then defining 𝜌 𝑗 ′′

𝜃 ⁢ 𝜌 𝑗 + ( 1 − 𝜃 ) ⁢ 𝜌 𝑗 ′ for any 𝜃 ∈ [ 0 , 1 ] , we know that

∑ 𝑗 ∈ 𝑀 ( \totv 𝑗 ⁢ ( 𝜌 𝑗 ′′ ) − 𝛾 ⁢ 𝜌 𝑗 ′′ ) ⁢ ≥ ( 𝑖 )

∑ 𝑗 ∈ 𝑀 ( 𝜃 ⁢ \totv 𝑗 ⁢ ( 𝜌 𝑗 ) + ( 1 − 𝜃 ) ⁢ \totv 𝑗 ⁢ ( 𝜌 𝑗 ′ ) − 𝛾 ⁢ 𝜌 𝑗 ′′ )

𝜃 ⁢ ∑ 𝑗 ∈ 𝑀 ( \totv 𝑗 ⁢ ( 𝜌 𝑗 ) − 𝛾 ⁢ 𝜌 𝑗 ) + ( 1 − 𝜃 ) ⁢ ∑ 𝑗 ∈ 𝑀 ( \totv 𝑗 ⁢ ( 𝜌 𝑗 ′ ) − 𝛾 ⁢ 𝜌 𝑗 ′ )

≥ ( 𝑖 ⁢ 𝑖 )

where (i) follows from concavity of \totv 𝑗 ⁢ ( 𝜌 𝑗 ) and (ii) follows from feasiblity of 𝜌 𝑗 and 𝜌 𝑗 ′ . On the other hand it is apparent that ∑ 𝑗 ∈ [ 𝑀 ] 𝜌 𝑗 ′′ ≤ 𝜌 . Hence we conclude that for any 𝜌 𝑗 and 𝜌 𝑗 ′ feasible, 𝜌 𝑗 ′′

𝜃 ⁢ 𝜌 𝑗 + ( 1 − 𝜃 ) ⁢ 𝜌 𝑗 ′ is also feasible, so the feasible region of CH-OPT ⁢ ( ℐ \budg ) is convex. This concludes the statement of the proposition. ∎

8 Proofs for Section LABEL:sec:multi 8.1 Proof of Lemma LABEL:lem:Vstructmulti

Before we show the lemma, we first show the following claim is true:

Claim 1

Recall 𝑣 𝑗 , 𝑛 ⁢ ( 1 ) > … > 𝑣 𝑗 , 𝑛 ⁢ ( 𝐿 𝑗 , 𝑛 ) > 0 and 𝑑 𝑗 , 𝑛 ⁢ ( 1 ) > … > 𝑑 𝑗 , 𝑛 ⁢ ( 𝐿 𝑗 , 𝑛 ) > 0 for any channel 𝑗 ∈ [ 𝑀 ] and auction 𝑛 ∈ [ 𝑚 𝑗 ] . If auction 𝑛 in channel 𝑗 has increasing marginal values, i.e. for any realization 𝐳 𝑗

( 𝐯 𝑗 , 𝐝 𝑗 ) , for any 𝑛 ∈ [ 𝑚 𝑗 ] we have 𝑣 𝑗 , 𝑛 ⁢ ( ℓ − 1 ) − 𝑣 𝑗 , 𝑛 ⁢ ( ℓ ) 𝑑 𝑗 , 𝑛 ⁢ ( ℓ − 1 ) − 𝑑 𝑗 , 𝑛 ⁢ ( ℓ ) decreases in ℓ then 𝑣 𝑗 , 𝑛 ⁢ ( ℓ ) 𝑑 𝑗 , 𝑛 ⁢ ( ℓ ) also decreases in ℓ .

Proof. We prove this claim by induction. The base case is ℓ

𝐿 𝑗 , 𝑛 : it is easy to see

𝑣 𝑗 , 𝑛 ⁢ ( 𝐿 𝑗 , 𝑛 − 1 ) − 𝑣 𝑗 , 𝑛 ⁢ ( 𝐿 𝑗 , 𝑛 ) 𝑑 𝑗 , 𝑛 ⁢ ( 𝐿 𝑗 , 𝑛 − 1 ) − 𝑑 𝑗 , 𝑛 ⁢ ( 𝐿 𝑗 , 𝑛 )

𝑣 𝑗 , 𝑛 ⁢ ( 𝐿 𝑗 , 𝑛 ) 𝑑 𝑗 , 𝑛 ⁢ ( 𝐿 𝑗 , 𝑛 ) ⟹ 𝑣 𝑗 , 𝑛 ⁢ ( 𝐿 𝑗 , 𝑛 − 1 ) 𝑑 𝑗 , 𝑛 ⁢ ( 𝐿 𝑗 , 𝑛 − 1 )

𝑣 𝑗 , 𝑛 ⁢ ( 𝐿 𝑗 , 𝑛 ) 𝑑 𝑗 , 𝑛 ⁢ ( 𝐿 𝑗 , 𝑛 ) .

Now assume the induction hypothesis 𝑣 𝑗 , 𝑛 ⁢ ( ℓ ) 𝑑 𝑗 , 𝑛 ⁢ ( ℓ )

𝑣 𝑗 , 𝑛 ⁢ ( ℓ + 1 ) 𝑑 𝑗 , 𝑛 ⁢ ( ℓ + 1 )

⋯

𝑣 𝑗 , 𝑛 ⁢ ( 𝐿 𝑗 , 𝑛 ) 𝑑 𝑗 , 𝑛 ⁢ ( 𝐿 𝑗 , 𝑛 ) . Then, we have

𝑣 𝑗 , 𝑛 ⁢ ( ℓ ) 𝑑 𝑗 , 𝑛 ⁢ ( ℓ )

𝑣 𝑗 , 𝑛 ⁢ ( ℓ + 1 ) 𝑑 𝑗 , 𝑛 ⁢ ( ℓ + 1 ) ⟹

𝑑 𝑗 , 𝑛 ⁢ ( ℓ + 1 ) − 𝑑 𝑗 , 𝑛 ⁢ ( ℓ ) 𝑑 𝑗 , 𝑛 ⁢ ( ℓ )

𝑣 𝑗 , 𝑛 ⁢ ( ℓ + 1 ) − 𝑣 𝑗 , 𝑛 ⁢ ( ℓ ) 𝑣 𝑗 , 𝑛 ⁢ ( ℓ )

⟹

𝑑 𝑗 , 𝑛 ⁢ ( ℓ ) − 𝑑 𝑗 , 𝑛 ⁢ ( ℓ + 1 ) 𝑑 𝑗 , 𝑛 ⁢ ( ℓ ) < 𝑣 𝑗 , 𝑛 ⁢ ( ℓ ) − 𝑣 𝑗 , 𝑛 ⁢ ( ℓ + 1 ) 𝑣 𝑗 , 𝑛 ⁢ ( ℓ )

⟹

𝑣 𝑗 , 𝑛 ⁢ ( ℓ ) 𝑑 𝑗 , 𝑛 ⁢ ( ℓ ) < 𝑣 𝑗 , 𝑛 ⁢ ( ℓ ) − 𝑣 𝑗 , 𝑛 ⁢ ( ℓ + 1 ) 𝑑 𝑗 , 𝑛 ⁢ ( ℓ ) − 𝑑 𝑗 , 𝑛 ⁢ ( ℓ + 1 ) . (56)

Since 𝑣 𝑗 , 𝑛 ⁢ ( ℓ − 1 ) − 𝑣 𝑗 , 𝑛 ⁢ ( ℓ ) 𝑑 𝑗 , 𝑛 ⁢ ( ℓ − 1 ) − 𝑑 𝑗 , 𝑛 ⁢ ( ℓ ) decreases in ℓ we have

𝑣 𝑗 , 𝑛 ⁢ ( ℓ − 1 ) − 𝑣 𝑗 , 𝑛 ⁢ ( ℓ ) 𝑑 𝑗 , 𝑛 ⁢ ( ℓ − 1 ) − 𝑑 𝑗 , 𝑛 ⁢ ( ℓ )

𝑣 𝑗 , 𝑛 ⁢ ( ℓ ) − 𝑣 𝑗 , 𝑛 ⁢ ( ℓ + 1 ) 𝑑 𝑗 , 𝑛 ⁢ ( ℓ ) − 𝑑 𝑗 , 𝑛 ⁢ ( ℓ + 1 ) ⁢

( 𝑖 ) ⁢ 𝑣 𝑗 , 𝑛 ⁢ ( ℓ ) 𝑑 𝑗 , 𝑛 ⁢ ( ℓ )

⟹

𝑣 𝑗 , 𝑛 ⁢ ( ℓ − 1 ) 𝑑 𝑗 , 𝑛 ⁢ ( ℓ − 1 ) ⁢

( 𝑖 ⁢ 𝑖 ) ⁢ 𝑣 𝑗 , 𝑛 ⁢ ( ℓ ) 𝑑 𝑗 , 𝑛 ⁢ ( ℓ ) ,

where (i) follows from Eq. (56), and (ii) follows from the fact that 𝐴 𝐵 > 𝐶 𝐷 for 𝐴 , 𝐵 , 𝐶 , 𝐷 > 0 implies 𝐴 + 𝐶 𝐵 + 𝐷 > 𝐶 𝐷 where we let 𝐴

𝑣 𝑗 , 𝑛 ⁢ ( ℓ − 1 ) − 𝑣 𝑗 , 𝑛 ⁢ ( ℓ ) , 𝐵

𝑑 𝑗 , 𝑛 ⁢ ( ℓ − 1 ) − 𝑑 𝑗 , 𝑛 ⁢ ( ℓ ) , 𝐶

𝑣 𝑗 , 𝑛 ⁢ ( ℓ ) and 𝐷

𝑑 𝑗 , 𝑛 ⁢ ( ℓ ) . This concludes the proof. \halmos

Now we prove Lemma LABEL:lem:Vstructmulti. Similar to the proof of Lemma LABEL:lem:Vstruct, we only need to show for any realization 𝒛 𝑗

( 𝒗 𝑗 , 𝒅 𝑗 ) 𝑗 ∈ [ 𝑀 ] , the conversion function \totv 𝑗 + ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 )

𝒗 𝑗 ⊤ ⁢ 𝒙 𝑗 * , + ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 ) where 𝒙 𝑗 * , + ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 ) is defined as Eq. (LABEL:eq:budperchannelsolmulti) is piecewise linear, continuous, strictly increasing and concave.

For simplicity we use the shorthand notation 𝒙 𝑗 *

𝒙 𝑗 * , + ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 ) ∈ [ 0 , 1 ] ∑ 𝑛 ∈ [ 𝑚 𝑗 ] 𝐿 𝑗 , 𝑛 as the optimal solution to \totv 𝑗 + ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 ) , defined in Eq. (LABEL:eq:budperchannelsolmulti). By re-labeling the auction indices in channel 𝑗 ∈ [ 𝑀 ] such that 𝑣 𝑗 , 1 ⁢ ( 1 ) 𝑑 𝑗 , 1 ⁢ ( 1 )

𝑣 𝑗 , 2 ⁢ ( 1 ) 𝑑 𝑗 , 2 ⁢ ( 1 )

⋯

𝑣 𝑗 , 𝑚 𝑗 ⁢ ( 1 ) 𝑑 𝑗 , 𝑚 𝑗 ⁢ ( 1 ) , we claim that 𝒙 𝑗 * takes the following form:

𝑥 𝑗 , 𝑛 * ⁢ ( ℓ )

{ 1
if ⁢ ℓ

1 ⁢ and ⁢ ∑ 𝑛 ′ ∈ [ 𝑛 ] 𝑑 𝑗 , 𝑛 ′ ⁢ ( 1 ) ≤ 𝜌 𝑗

𝜌 𝑗 − ∑ 𝑛 ′ ∈ [ 𝑛 − 1 ] 𝑑 𝑗 , 𝑛 ′ ⁢ ( 1 ) 𝑑 𝑗 , 𝑛 ⁢ ( 1 )
if ⁢ ℓ

1 ⁢ and ⁢ ∑ 𝑛 ′ ∈ [ 𝑛 ] 𝑑 𝑗 , 𝑛 ′ ⁢ ( 1 )

𝜌 𝑗

otherwise (57)

which is analogous to that of Eq. (18) in the proof of Lemma LABEL:lem:Vstruct. In other words, in the optimal solution, an advertiser would only procure impressions who are in the first position in each auction, and also those with high value-to-cost ratios. With the above representation of 𝒙 𝑗 * , the rest of the proof follows exactly from that for Lemma LABEL:lem:Vstruct.

It now remains to show that Eq. (57) holds. We first argue by contradiction that in any auction, no impression other than the first would get procured, i.e. 𝑥 𝑗 , 𝑛 * ⁢ ( ℓ )

0 for any ℓ ∈ 2 ⁢ … ⁢ 𝐿 𝑗 , 𝑛 . Assume there exists some auction 𝑛 ∈ [ 𝑚 𝑗 ] and impression slot ℓ ′ ∈ 2 ⁢ … ⁢ 𝐿 𝑗 , 𝑛 such that 𝑥 𝑗 , 𝑛 * ⁢ ( ℓ ′ )

0 , then by the constraint that at most 1 impression can be procured, i.e. ∑ ℓ ∈ [ 𝐿 𝑗 , 𝑛 ] 𝑥 𝑗 , 𝑛 * ⁢ ( ℓ ) ≤ 1 in Eq. (LABEL:eq:budperchannelsolmulti), we know that 𝑥 𝑗 , 𝑛 * ⁢ ( 1 ) < 1 . Also, note that 𝑥 𝑗 , 𝑛 * ⁢ ( ℓ ′ ) incurs a cost of 𝑑 𝑗 , 𝑛 ⁢ ( ℓ ′ ) ⋅ 𝑥 𝑗 , 𝑛 * ⁢ ( ℓ ′ ) amongst the total per-channel budget 𝜌 𝑗 . If we instead use this cost on the first impression, then we will obtain a value increase of

𝑣 𝑗 , 𝑛 ⁢ ( 1 ) ⋅ 𝑑 𝑗 , 𝑛 ⁢ ( ℓ ′ ) ⋅ 𝑥 𝑗 , 𝑛 * ⁢ ( ℓ ′ ) 𝑑 𝑗 , 𝑛 ⁢ ( 1 ) − 𝑣 𝑗 , 𝑛 ⁢ ( ℓ ′ ) ⋅ 𝑥 𝑗 , 𝑛 * ⁢ ( ℓ ′ )

𝑑 𝑗 , 𝑛 ⁢ ( ℓ ′ ) ⋅ 𝑥 𝑗 , 𝑛 * ⁢ ( ℓ ′ ) ⋅ ( 𝑣 𝑗 , 𝑛 ⁢ ( 1 ) 𝑑 𝑗 , 𝑛 ⁢ ( 1 ) − 𝑣 𝑗 , 𝑛 ⁢ ( ℓ ′ ) 𝑑 𝑗 , 𝑛 ⁢ ( ℓ ′ ) )

0 ,

where the final inequality follows from the assumption that 𝑥 𝑗 , 𝑛 * ⁢ ( ℓ ′ ) > 0 , and the multi-item auction has increasing marginal values (see Definition LABEL:def:incrmarginal) so Claim 1 holds. This contradicts the optimality of 𝒙 𝑗 * , and hence 𝑥 𝑗 , 𝑛 * ⁢ ( ℓ )

0 for any ℓ ∈ 2 ⁢ … ⁢ 𝐿 𝑗 , 𝑛 , or in other words, a channel will only procure impressions ranked first. Hence, a channel’s procurement problem in Eq. (LABEL:eq:budperchannelsolmulti) can be restricted to the first impression in each auction, and thus similar to the proof of Lemma LABEL:lem:Vstruct, is an LP-relaxation to the 0-1 knapsack with budget 𝜌 𝑗 , and 𝑚 𝑗 items whose values are 𝑣 𝑗 , 1 ⁢ ( 1 ) ⁢ … ⁢ 𝑣 𝑗 , 𝑚 𝑗 ⁢ ( 1 ) with costs 𝑑 𝑗 , 1 ⁢ ( 1 ) ⁢ … ⁢ 𝑑 𝑗 , 𝑚 𝑗 ⁢ ( 1 ) . \halmos

Generated on Thu Jul 13 18:17:38 2023 by LATExml

Xet Storage Details

Size:: 107 kB
Xet hash:: 8f2713e7cc6db8128b7a801934b33e164b257988d71a4e957c9aaae0ac8395ab

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.

( 𝑣 𝑗 , 𝑛 ) 𝑛 ∈ [ 𝑚 𝑗 ] and 𝒅 𝑗

GL-OPT

Here, the decision variable 𝒙 𝑗 ∈ [ 0 , 1 ] 𝑚 𝑗 is a vector where 𝑥 𝑗 , 𝑛 denotes whether impression in auction 𝑛 for channel 𝑗 is procured. We remark that 𝒙 depends on the realization of 𝒛

Per-channel budget only option: ⁢ ℐ \budg

{ ( 𝛾 𝑗 , 𝜌 𝑗 ) 𝑗 ∈ [ 𝑀 ] ∈ ℝ + 2 × 𝑀 : 𝛾 𝑗

Per-channel target ROI only option: ⁢ ℐ \roi

{ ( 𝛾 𝑗 , 𝜌 𝑗 ) 𝑗 ∈ [ 𝑀 ] ∈ ℝ + 2 × 𝑀 : 𝛾 𝑗 ≥ 0 , 𝜌 𝑗

General option: ⁢ ℐ 𝐺

CH-OPT ⁢ ( ℐ )

𝒙 𝑗 * ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 ; 𝒛 𝑗 )

where 𝒙

\totv 𝑗 ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 ; 𝒛 𝑗 )

𝒗 𝑗 ⊤ ⁢ 𝒙 𝑗 * ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 ; 𝒛 𝑗 ) and \totv 𝑗 ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 )

\totc 𝑗 ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 ; 𝒛 𝑗 )

𝒅 𝑗 ⊤ ⁢ 𝒙 𝑗 * ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 ; 𝒛 𝑗 ) and \totc 𝑗 ⁢ ( 𝛾 𝑗 , 𝜌 𝑗 )

Here, the expectation is taken w.r.t. randomness in 𝒛 𝑗

Fix any option ℐ ∈ { ℐ \budg , ℐ \roi , ℐ 𝐺 } defined in Eq. (2), and let ( 𝜸 ~ , 𝝆 ~ ) ∈ ℐ be the optimal solution to CH-OPT ⁢ ( ℐ ) . Note that for the per-channel ROI only option ℐ \roi , we have 𝜌 ~ 𝑗

∞ and for the per-channel budget only we have 𝛾 ~ 𝑗

0 for all 𝑗 ∈ [ 𝑀 ] . Further, for any realization of value-cost pairs over all auctions 𝒛

where we used the definitions \totv 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 )

𝒗 𝑗 ⊤ ⁢ 𝒙 𝑗 * ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) and \totc 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 )

GL-OPT ≥ ∑ 𝑗 ∈ [ 𝑀 ] 𝔼 ⁢ [ 𝒗 𝑗 ⊤ ⁢ 𝒙 𝑗 * ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) ]

∑ 𝑗 ∈ 𝑀 [ \totv 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) ]

Let 𝜸 ~

1 + 3 ⁢ 𝑋 , and incurs total spend 0 + ( 1 + 𝑋 ) + 2 ⁢ ( 1 + 𝑋 )

In light of Lemma LABEL:lem:glbest, we only need to show CH-OPT ⁢ ( ℐ \budg ) ≥ GL-OPT . Let 𝒙 ~ ( 𝒛 )

{ 𝒙 ~ 𝑗 ( 𝒛 𝑗 ) ) } 𝑗 ∈ [ 𝑁 ] be the optimal solution to GL-OPT, and define 𝛾 ~ 𝑗

0 and 𝜌 ~ 𝑗

𝒗 𝑗 ⊤ ⁢ 𝒙 𝑗 * ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) and \totc 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 )

𝔼 ⁢ [ \totc 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) ]

𝔼 ⁢ [ 𝒅 𝑗 ⊤ ⁢ 𝒙 𝑗 * ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) ] ⁢ ≤ ( 𝑖 ) ⁢ 𝜌 ~ 𝑗

Here (i) follows from feasibility of 𝒙 ~ ( 𝒛 )

\totv 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 )

where (i) follows from feasibility of 𝒙 ~ ( 𝒛 )

Finally, we have CH-OPT ⁢ ( ℐ \budg ) ≥ ∑ 𝑗 ∈ 𝑀 𝔼 ⁢ [ \totv 𝑗 ⁢ ( 𝛾 ~ 𝑗 , 𝜌 ~ 𝑗 ; 𝒛 𝑗 ) ] ≥ ∑ 𝑗 ∈ 𝑀 𝔼 ⁢ [ 𝒗 𝑗 ⊤ ⁢ 𝒙 ~ 𝑗 ⁢ ( 𝒛 𝑗 ) ]

GL-OPT , where the last inequality follows from (10), and the final equality is because we assumed 𝒙 ~ ( 𝒛 )

In light of Lemma LABEL:lem:glbest, we only need to show CH-OPT ⁢ ( ℐ 𝐺 ) ≥ GL-OPT . Let ( 𝜸 ~ , 𝝆 ~ ) ∈ ℐ \budg , and by definition of ℐ \budg in Eq. (2) we have 𝛾 ~ 𝑗

CH-OPT ⁢ ( ℐ 𝐺 ) ≥ CH-OPT ⁢ ( ℐ \budg )

Let ( 𝜌 𝑗 * ) 𝑗 ∈ [ 𝑀 ] be the optimal per-channel budgets to CH-OPT ⁢ ( ℐ \budg ) , and define 𝜇 ¯ 𝑇

1 𝜏 𝐴 ⁢ ∑ 𝑡 ∈ [ 𝜏 𝐴 ] 𝜇 𝑡 as well as 𝜆 ¯ 𝑇

Here, (i) follows from Theorem LABEL:thm:budgonlysuff that states GL-OPT

CH-OPT ⁢ ( ℐ \budg ) and CH-OPT ⁢ ( ℐ \budg ) is apparently upper bounded by 𝑀 ⁢ \totv ¯ ; (ii) follows from the CH-OPT ⁢ ( ℐ \budg )

∑ 𝑗 ∈ [ 𝑀 ] \totv 𝑗 ⁢ ( 𝜌 𝑗 * ) and the definition of the Lagrangian in Eq. (LABEL:eq:Lagrangian); in (iii) we define 𝜌 𝑗 * ⁢ ( 𝑡 )

arg ⁡ max 𝜌 𝑗 ≥ 0 ⁡ ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont 𝑡 ) to be the optimal budget that maximizes the Lagrangian w.r.t. the dual variables \cont 𝑡

Recall 𝑔 1 , 𝑡

∑ 𝑗 ∈ [ 𝑀 ] ( \totv 𝑗 , 𝑡 ⁢ ( 𝜌 𝑗 , 𝑡 ; 𝒛 𝑗 , 𝑡 ) − 𝛾 ⁢ 𝜌 𝑗 , 𝑡 ) and 𝑔 2 , 𝑡

≤ \dualub 𝐹 ⁢ max ⁡ { 𝑀 ⁢ \totv ¯ , 𝜌 } + 𝑀 2 ⁢ \totv ¯ ⁢ 𝜌 ⋅ max ⁡ { 1 \safe ⁢ \safebudg , 1 𝜌 − 𝑀 ⁢ \safebudg } + ( 𝛾 ⁢ 𝑀 2 ⁢ \totv ¯ 2 + 𝜌 2 ) 2 ⋅ 𝜂 ⁢ 𝑇 + 1 2 ⁢ 𝜂 ⁢ \dualub 𝐹 2

where we recall \dualub 𝐹

where we used the fact that 𝜆 1

𝜇 1

Suppose that 𝜏 𝐴

𝑇 and thus 𝑀 ⁢ \totv ¯ ⁢ ( 𝑇 − 𝜏 𝐴 )

0 . Then, considering 𝜆

𝜇

∑ 𝑡 ∈ [ 𝜏 𝐴 − 1 ] 𝑔 1 , 𝑡 and 𝑆 2 , 𝜏 𝐴

∑ 𝑡 ∈ [ 𝜏 𝐴 − 1 ] ∑ 𝑗 ∈ [ 𝑀 ] 𝜌 𝑗 , 𝑡

If 𝑆 1 , 𝜏 𝐴 − 𝛾 ⁢ 𝑀 ⁢ 𝜌 + \safe ⁢ \safebudg ⁢ ( 𝑇 − 𝜏 𝐴 ) < 0 , then we have ∑ 𝑡 ∈ [ 𝜏 𝐴 − 1 ] 𝑔 1 , 𝑡 < 𝛾 ⁢ 𝑀 ⁢ 𝜌 − \safe ⁢ \safebudg ⁢ ( 𝑇 − 𝜏 𝐴 ) . Hence, considering 𝜆

where the final inequality uses the fact that 𝜏 𝐴 ≤ 𝑇 , 𝜆 ≤ \dualub 𝐹 , and 𝑔 1 , 𝑡 ≤ 𝑀 ⁢ \totv ¯ for any 𝑡 ∈ [ 𝑇 ] . Hence, similar to Eq. (15) by further taking 𝜇

where the final inequality uses the fact that 𝜏 𝐴 ≤ 𝑇 , 𝜆 ≤ \dualub 𝐹 , and 𝑔 2 , 𝑡 ≤ 𝜌 for any 𝑡 ∈ [ 𝑇 ] . Hence, similar to Eq. (15) by further taking 𝜆

We first show for any realization 𝒛

( 𝒛 𝑗 ) 𝑗 ∈ [ 𝑀 ]

𝑥 𝑗 , 𝑛 * ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 )

where we denote 𝑑 𝑗 , 0

\totv 𝑗 ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 )

𝒗 𝑗 ⊤ ⁢ 𝒙 𝑗 * ⁢ ( 𝜌 𝑗 ; 𝒛 𝑗 )

where we denote 𝑑 𝑗 , 0

0 and also 𝑏 𝑗 , 𝑛

∑ 𝑛 ′ ∈ [ 𝑛 − 1 ] 𝑣 𝑗 , 𝑛 ′ − 𝑣 𝑗 , 𝑛 𝑑 𝑗 , 𝑛 ⋅ ( ∑ 𝑛 ′ ∈ [ 𝑛 − 1 ] 𝑑 𝑗 , 𝑛 ′ ) and 𝑣 𝑗 , 0

0 . It is easy to check that any two line segments, say [ 𝑋 𝑛 − 1 , 𝑋 𝑛 ] and [ 𝑋 𝑛 , 𝑋 𝑛 + 1 ] where we write 𝑋 𝑛

𝑑 𝑗 , 0 + ⋯ + 𝑑 𝑗 , 𝑛 , intersect at 𝜌 𝑗

𝑋 𝑛 , because 𝑣 𝑗 , 𝑛 𝑑 𝑗 , 𝑛 ⁢ 𝜌 𝑗 + 𝑏 𝑗 , 𝑛

𝑣 𝑗 , 𝑛 + 1 𝑑 𝑗 , 𝑛 + 1 ⁢ 𝜌 𝑗 + 𝑏 𝑗 , 𝑛 + 1 at 𝜌 𝑗

Since \totv 𝑗 ⁢ ( 𝜌 𝑗 )

Finally, according to the definition of ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont )

𝔼 ⁢ [ ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont ; 𝒛 𝑗 ) ] and ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont ; 𝒛 𝑗 )

ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont )

ℒ 𝑗 ⁢ ( 𝜌 𝑗 , \cont )

where the slope \slope 𝑗 , 𝑛 ⁢ ( \cont )

( 1 + 𝜆 ) ⁢ 𝑠 ⁢ 𝑗 , 𝑛 − ( 𝜇 + 𝛾 ⁢ 𝜆 ) decreases in 𝑛 . Thus at the point 𝑟 𝑗 , 𝑛 *

max ⁡ { 𝑟 𝑗 , 𝑛 : 𝑛

≤
\dualub 𝐹 ⁢ max ⁡ { 𝑀 ⁢ \totv ¯ , 𝜌 } + 𝑀 2 ⁢ \totv ¯ ⁢ 𝜌 ⋅ max ⁡ { 1 \safe ⁢ \safebudg , 1 𝜌 − 𝑀 ⁢ \safebudg } + ( 𝛾 ⁢ 𝑀 2 ⁢ \totv ¯ 2 + 𝜌 2 ) 2 ⋅ 𝜂 ⁢ 𝑇 + 1 2 ⁢ 𝜂 ⁢ \dualub 𝐹 2

≤
∑ 𝑡 > 𝐾 : \cont 𝑡 ∉ \contset ⁢ ( \slope ¯ ) ∑ 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘 ∈ \adj 𝑗 − ⁢ ( \cont 𝑡 ) \slope ¯ ⁢ 𝜌 ⋅ 𝕀 ⁢ { 𝜌 𝑗 , 𝑡

= ( 𝑖 )
∑ 𝑡 > 𝐾 : \cont 𝑡 ∈ \contset 𝑛 / \contset ⁢ ( \slope ¯ ) ∑ 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘 < 𝑟 𝑗 , 𝑛 − 1 Δ 𝑘 ⁢ ( \cont 𝑡 ) ⁢ 𝕀 ⁢ { 𝜌 𝑗 , 𝑡

=
∑ 𝑡 > 𝐾 ∑ \cont ∈ \contset 𝑛 / \contset ⁢ ( \slope ¯ ) ∑ 𝑘 ∈ [ 𝐾 ] : 𝑎 𝑘 < 𝑟 𝑗 , 𝑛 − 1 Δ 𝑘 ⁢ ( \cont ) ⁢ 𝕀 ⁢ { \cont 𝑡

≤ ( 𝑖 ⁢ 𝑖 )
∑ 𝑡 > 𝐾 ∑ \cont ∈ \contset 𝑛 / \contset ⁢ ( \slope ¯ ) ( Δ 𝑘 ′ ⁢ ( \cont ) ⁢ 𝕀 ⁢ { \cont 𝑡

𝑎 𝑘 , 𝑁 𝑘 , 𝑡 ≤ 𝑚 𝑘 ⁢ ( \cont ( ℓ ′ ) ) }

≤
∑ 𝑡 > 𝐾 ∑ ℓ ′ ∈ [ ℓ ] 𝕀 ⁢ { \cont 𝑡

1 ⁢ … ⁢ ℓ ⁡ 𝑚 𝑘 ⁢ ( \cont ( ℓ ′ ) ) }

≤
max ℓ ′

=
8 ⁢ log ⁡ ( 𝑇 ) Δ 𝑘 ⁢ ( \cont ( 𝐿 ) ) ⁢