Optimization models in digital advertising

In optimizing their budgets across a portfolio of digital advertising channels, advertisers may adopt one of two strategies, depending on their principal financial constraint:
- With a Principal ROAS Constraint, all spend must adhere to some ROAS standard. In this sense, spend is limited by the ROAS that each of the constituent channels in its channel portfolio can support at some level of budget. Since the ROAS achieved on any given channel tends to decrease as the budget increases, an advertiser will spend as much as possible on a given channel to deliver some ROAS value. With this constraint, an advertiser’s total budget is captured by the sum of spend across channels at the desired level of ROAS.
- With a Principal Budget Constraint, the advertiser has some fixed budget that it will deploy and seeks to distribute it across its portfolio of channels for the highest possible aggregate level of ROAS. This strategy assumes that spend is also constrained on any given channel by a target ROAS, meaning that an advertiser won’t spend unprofitably just for the sake of deploying budget.
I call these principal constraints because both strategies involve the dual constraints of ROAS and overall, total spend:
- Advertisers generally focus on ROAS as their primary constraint when they are not budget-constrained: they are eager to spend more on digital advertising than they currently do. There is naturally an upper limit to the amount of money that can be deployed on advertising. Although for some advertisers, that limit may exceed what they are currently spending to such a degree that it is irrelevant: for instance, when an advertiser faces short-term ROAS timelines (eg., less than 30 days) and has access to a vast advertising credit facility, they may face no concrete budget constraint.
- Advertisers generally focus on budget as their primary constraint when they are deploying systemically significant sums of money on advertising in a steady state (meaning: performance is long-term stable, often for an established or legacy product with consistent revenues). But these advertisers similarly face a ROAS constraint; they wouldn’t allocate advertising spend to a channel at a loss simply for the sake of fully deploying the budget. But the ROAS constraint may be irrelevant if the ROAS they are generating is materially higher than their target.
I outline both of these strategies, conceptually, in Building a traffic composition strategy on mobile, published in 2019. In that piece, I term the ROAS Constraint strategy the Waterfall Budgeting Method and the Budget Constraint strategy the Distributed Budgeting Method. In considering both of these optimization strategies, I make a number of assumptions:
- An advertiser has visibility into historical spend-revenue curves (eg., the amount of attributed revenue generated at each level of spend) for each channel, \(\mathbf{b}_\text{i}\) its portfolio, \(\mathbf{b}\), and those curves are reliable indicators of future performance. In practice, this often isn’t true.
- For every channel, ROAS and budget are inversely correlated; when one increases, the other decreases. While empirically, this holds true as a general rule, it’s not true at all levels of spend for all channels (see The “Quality vs. Volume” fallacy in user acquisition for more).
- An advertiser may be constrained by either a channel-level ROAS target, \(\rho\), or an overall budget, \(\gamma\), but not both simultaneously. As I point out above, this oversimplifies reality, but these formalizations won’t accommodate both, or will assume that both are not relevant for any given strategy (eg., an advertiser with a primary ROAS constraint is operating so far below any concrete budget constraint that it is not at risk of exceeding it).
- An advertiser will not allow any given channel to decline below its ROAS threshold. In practice, this is often true, but in certain instances, an advertiser might operate specific channels below its ROAS target if its aggregate ROAS exceeds the target.
In this piece, I will present analytical formalizations of these optimization problems, along with Python implementations of channel portfolio optimization for both. The notation used here for these formalizations is:

The code that accompanies this article is published on GitHub.
The Waterfall Budgeting Method (Primary ROAS Constraint)
With the Waterfall Budgeting Method, the advertiser is constrained primarily by ROAS: it aims to spend as much on each channel as its ROAS target allows. This objective function can be formalized by:

This maximizes spend across all channels \(i \in {1, \dots, N}\) subject to the constraint that ROAS for any given channel, noted with \(\text{ROAS}_{b_i}(s_i)\), is greater than or equal to the ROAS target \(\rho\). Again: an advertiser might not apply the ROAS constraint at the level of each individual channel if the aggregate ROAS adheres to \(\rho\), although that choice would need to be justified by some other business objective (eg., crowding out a competitor on some specific channel).
As previously shown,

To establish the inequality, the equation can be rewritten as:

To define the inequality constraint function \(g_i(s_i)\) to satisfy \(g(x) \leq 0\), this can be rewritten as:

To solve this by introducing the Lagrangian multipliers, the constraint can be reformulated as:

This says that the objective of maximizing spend across all channels, \(i \in {1, \dots, N}\), is subject to the constraint that, for every channel \(i\), the spend \(s_i\) must not produce a ROAS less than \(\rho\), with the constraint for each channel represented by \(g_i(s_i)\).
Then, a set of Lagrange multipliers, \({\lambda_i}_{i=1}^N\), can be introduced to solve the constrained objective function analytically with:

Lagrange multipliers allow a constrained optimization problem to be solved as if it were unconstrained, where optima can be found by taking the first derivatives of the objective and constraint functions. A Lagrangian is a single function that incorporates both the objective and its constraints into a system of equations. In the Waterfall model, the objective is to maximize spend across all channels, and the constraints are the channel-level ROAS targets \(\rho_i\), which are all equivalent.
Each Lagrange multiplier effectively activates or deactivates its corresponding constraint. If a constraint is inactive, meaning the ROAS for that given channel is greater than the target \(\rho\), its Lagrange multiplier \(\lambda_i= 0\), eliminating its influence on the optimization.
If the constraint is binding, meaning the ROAS for that channel is exactly equal to \(\rho\), then its corresponding Lagrange multiplier satisfies \(\lambda_i \gt 0\). So the Lagrange multiplier can be interpreted as the rate at which the optimal value of the objective function (total spend) would increase if the constraint (the ROAS threshold) were relaxed: if \(\lambda_i \gt 0\), relaxing the constraint (lowering the ROAS requirement for that channel) would result in a greater value of the objective function (total spend), since with the constraint in place, spend can’t be increased at all. Flipping that around: it’s a measure of the degree to which the constraint impacts the objective. This is captured by the partial derivatives of the Lagrangian with respect to both \(s_i\) and \(\lambda_i\).
In the Waterfall method, no explicit budget limit is acknowledged, and it is assumed that each channel’s ROAS curve intersects the target \(\rho\). The goal of the optimization model is thus to allocate spend such that \(\text{ROAS}_{b_i}(s_i)=\rho\), implying \(\lambda_i \gt 0\) for all channels where the ROAS curve intersects \(\rho\).
To solve the equation as presented above, the partial derivative of the Lagrangian is taken with respect to both \(s_i\) and \(\lambda_i\):

Actually solving this requires knowledge of the revenue-spend curve by channel. This is captured in some functional form \(f_i(s_i)\), which yields the revenue generated by channel \(i\) at the spend level, \(s_i\). Then the optimal level of spend is found where \(f_i'(s_i) = \rho\), or: the next marginal dollar of spend yields ROAS of \(\rho\).

It’s important to emphasize this point: the Waterfall Method seeks to optimize marginal and not average ROAS by channel. Optimizing to average ROAS could involve wasted spend. The Waterfall Method will allocate budget to a channel until the ROAS on the next dollar spent on it declines below the target \(\rho\) and not so long as the average spend on the channel remains at or above \(\rho\).
Per the stated assumptions, we know the historical revenue values per channel at various levels of spend, \(\text{Revenue}_{b_i}(s_i)\). This is the input to ROAS. So there’s no need to impute a ROAS function onto each channel, as the historical values can be used (again: the assumption is that these are valid for future spend).
Consider three arbitrary, hypothetical historical spend-revenue curves:

The actual equations aren’t important; in practice, historical spend-revenue time series data would be used. What matters is that each equation can be differentiated to find the gradient that equals \(\rho\).

Assuming \(\rho\) is set to 1.2 (120% ROAS), the Waterfall Method can be solved programmatically by using Numpy’s np.gradient function to find the gradient for the highest spend value in the spend-revenue function that is closest to 1.2.

For the linear curve, the optimal spend value is simply the highest level of spend in the historical dataset ($2MM), since the spend-revenue curve is linear and therefore the gradient is stable throughout. For the other curves, the optimal spend levels are found before inflections, where the gradient decreases.
There are a few caveats here:
- Each of these curves represents historical spend-revenue data for different channels. There is no guarantee that future spending will produce returns consistent with the historical data. This is especially important for the first curve, where a team might unrealistically expect ROAS to scale linearly beyond the $2MM threshold.
- The team need not impose a further budget constraint on spend..
The Distributed Budgeting Method (Primary Budget Constraint)
To implement the Distributed Budgeting Method with budget as primary constraint, the approach is similar, except that \(\gamma\) is introduced to represent the advertiser’s available budget. The goal of the Distributed Budgeting Method is to maximize ROAS within this budget constraint.
The objective is then to maximize average portfolio ROAS subject to total spend being less than or equal to the budget, where the individual ROAS for each individual channel is more than or equal to the ROAS target and spend is positive (but can be $0):

Since the objective is a ratio (revenue over spend, or ROAS), it’s a fractional optimization problem, which standard solvers like NumPy-based optimizers don’t handle well. This can be converted into a more tractable form with the Charnes–Cooper transformation, which rescales the decision variables and removes the denominator from the objective. To do this, we can introduce two new variables: \(t\) and \(x\), where \(t\) is a ratio of total spend such that for any channel-level spend, \(\text{s}_i\), \(x\) is equal to \(\text{s}_i \cdot t\). This shifts the denominator into the constraints and converts the fractional optimization into something more manageable with a standard programmatic solver.

Then, the Lagrangian can be constructed with a single Lagrange multiplier \(\lambda\) for the normalization constant, \(\sum x = 1\) and \(u\) multipliers for each of the \(N\) channels to satisfy the channel-level ROAS constraints. We’ll use the same inequality constraint form of \(g_i(x) <= 0\) from before, so ROAS is subtracted from revenue. The Lagrangian is:

To solve this, the partial derivative of the Lagrangian is taken with respect to \(x_i\) and \(t\) and set to 0, solved using the chain rule:

Solving this in Python involves a relatively straightforward implementation of SciPy’s minimize optimizer. I have published the Python code for implementations of both the Waterfall and Distributed Budgeting Methods here. A few notes on the Python solutions:

- I imposed a $50,000 minimum spend on the log and logistic revenue curves to avoid instances where very low levels of spend produce unrealistic amounts of revenue;
- In the Distributed Budget Method, a total budget of $2MM is applied.
This post outlines the two optimization frameworks for digital ad budgeting that I first proposed in 2019: the Waterfall Method, which maximizes spend under a ROAS constraint, and the Distributed Budget Method, which maximizes ROAS under a budget constraint. Both are formalized using mathematical models and implemented in Python using hypothetical but not altogether unrealistic spend-revenue functions. Advertisers can adopt these approaches by modeling channel-level spend-revenue curves and applying the accompanying code to optimize their own budget allocations within their portfolio of channels. The linked GitHub repository includes all code and example data needed to customize these models.
[mathjax]
Comments: