DAU forecasting using cohort matrices

One challenge in forecasting DAU for a product is that various groups may exhibit different retention rates that vary meaningfully. An obvious example of this is users acquired from different channels, but it could also be true for different geographies, different platforms (eg., iOS vs. Android), and over time, with retention generally degrading with each subsequent cohort.
In order to accommodate this effect, retention rates should be applied to DAU projections for these groups, with the projections being aggregated into a global forecast. This is the purpose of Theseus, my open source Python library for marketing cohort analysis. In this post, I’ll unpack the analytical logic behind how Theseus works and provide an example of how to implement it in Python.
The atomic units of a DAU forecast are: (1) some group’s cohort sizes (eg., the number of people from some group that onboarded to the product across some period of time) and (2) the historical retention curve for that group. Each of these atomic units is represented as a vector over some timeline. The cohort vector captures the daily number of users from the group onboarding onto the product; the retention curve vector captures the historical daily retention rates for that group following onboarding. Each of these timelines — the cohort timeline and the retention curve timeline — can be arbitrarily long, and they are independent of each other (the cohort timeline doesn’t need to match the retention curve timeline). The notation used here for these atomic units is:

Note here that the retention rate vector would likely be generated by fitting a retention model to historical retention data for the group. More on that idea in this post.
With these components, it’s possible to construct a DAU matrix for the retention timeline \(\mathbf{D_r}\) that would capture the cohort decay over that period. A helpful place to start is an upper-triangular Toeplitz matrix, \(\mathbf{Z}\), of size \(D_r \times D_r\) with the retention rate vector running along the diagonal:

\(\mathbf{Z}\) here just populates a matrix with the retention curves padded with 0s on the left so that Day One retention (the first value of the retention curve) runs along the diagonal. In practical terms, Day One retention is 1, or 100%, since, tautologically, 100% of the cohort is present on the day of the cohort’s onboarding. In order to get to DAU, the retention rates must be broadcast to a matrix comprised of cohort sizes. This can be done by constructing a diagonal matrix, \( \mathbf{diag}(\mathbf{c}) \) from \(\mathbf{c}\):

It’s important to note here that, in order to broadcast the cohort sizes against the retention rates, \( \mathbf{diag}(\mathbf{c}) \) must be of size \(D_r \times D_r\). So if the cohort size vector is longer than the retention rate vector, it needs to be truncated; conversely, if it’s shorter, it needs to be padded with zeroes. The toy example above assumes that \(D_c \) is equal to \(D_r \), but note that, as previously stated, this isn’t a constraint.
Now, a third matrix of DAU values, \(\mathbf{DAU_{D_r}}\) can be created by multiplying \(\mathbf{Z}\) and \( \mathbf{diag}(\mathbf{c}) \):

This produces a square matrix of size \(D_r \times D_r\) (again, assuming \(D_c = D_r\)) that adjusts each cohort size by its corresponding daily retention curve value, with Day 1 retention being 100%. Here, each column in the matrix represents a calendar day and each row captures the DAU values of a cohort, padded according to the date of its onboarding. Summing each column would provide the total DAU on that calendar day, across all cohorts.
While this is useful data, and it is a projection, it only captures DAU over the length of the retention timeline, \(D_r \), starting from when the first cohort was onboarded. What would be more useful is a forecast across the retention timeline \(D_r \) for each cohort; in other words, each cohort’s DAU projected for the same number of days, regardless of when that cohort was onboarded. This is a banded cohort matrix, which provides a calendar view of per-cohort DAU.
This matrix has a shape of \(D_c \times (D_r + D_c – 1)\), where each row is that cohort’s full \(D_r\)-length DAU projection, padded with a zero for each cohort that preceded it. In order to arrive at this, the banded retention rate matrix, \(\mathbf{Z}_\text{banded}\) stacks the retention curve \(D_c\) times but pads each row \(i\) with \(i-1\) zeroes on the left and \(D_c – 1 + i\) zeroes on the right such that each row is of length \(D_r + D_c – 1\). To do this, we can define a shift-and-pad operator \(S^{(i)}\):

Again, this results in a matrix, \(\mathbf{Z}_\text{banded}\), of shape \(D_c \times (D_r + D_c – 1)\) where each row \(i\) has \(i – 1\) zeroes padded to the left and \((D_c – i)\) zeroes padded to the right so that every cohort’s full \(D_r\)-length retention curve is represented.
In order to derive the banded DAU matrix, \(\mathbf{DAU}_\text{banded}\), the banded retention matrix, \(\mathbf{Z}_\text{banded}\), is multiplied by \(\mathbf{c}^{\mathsf{T}}\), the transposed conversion rates vector. This works because \(\mathbf{Z}_\text{banded}\) has \(D_c\) rows:

Implementing this in Python is straightforward. The crux of the implementation is below (full code can be found here).
## create the retention curve and cohort size vectors
r = np.array( [ 1, 0.75, 0.5, 0.3, 0.2, 0.15, 0.12 ] ) ## retention rates
c = np.array( [ 500, 600, 1000, 400, 350 ] ) ## cohort sizes
D_r = len( r )
D_c = len( c )
calendar_days = D_c + D_r - 1
## create the banded retention matrix, Z_banded
Z_banded = np.zeros( ( D_c, calendar_days ) ) ## shape D_c * D_c + D_r - 1
for i in range( D_c ):
start_idx = i
end_idx = min( i + D_r, calendar_days )
Z_banded[ i, start_idx:end_idx ] = r[ :end_idx - start_idx ]
## create the DAU_banded matrix and get the total DAU per calendar day
DAU_banded = ( c[ :, np.newaxis ] ) * Z_banded
total_DAU = DAU_banded.sum( axis=0 )
The retention and cohort size values used are arbitrary. Graphing the stacked cohorts produces the following chart:

It’s straightforward to imaging how this method can be used to combine DAU schedules for different groups: a matrix for each group (eg., per-geography cohorts, with their separate cohort sizes and retention rates) could be constructed, with all of the matrices stacked vertically to provide a picture of total, global DAU.
[mathjax]
Comments: