Notes

Short pieces on forecasting methods, production systems, and lessons from running models at scale. Some are original; others first appeared in Foresight and are reposted here with permission. For peer-reviewed work, see Research.

Three cases where compositional modeling beats bottom-up aggregation

If you forecast a total and its parts separately, the parts won't add up to the total. This is well known. Most production systems ignore it anyway. They forecast each component independently, reconcile after the fact, and move on.

That is often fine. Sometimes it is not. Here are three situations where I have found it worth building coherence into the model directly rather than patching it downstream.

1. When the total is more predictable than the parts

Revenue mix by currency is a clean example. Total revenue in USD has strong seasonal patterns and relatively stable trends. The share attributable to EUR, GBP, or AUD individually is noisier. Exchange rate movements and regional booking behavior are hard to forecast well. If you forecast each currency's revenue independently, you get reasonable point estimates that sum to something different from your total revenue forecast. Reconciliation fixes the math but introduces artifacts. The adjusted shares inherit noise from both the component and total forecasts.

A compositional model (Dirichlet, logistic-normal, or something in that family) forecasts the shares directly on the simplex, then multiplies by a separate total forecast. The shares sum to one by construction. No reconciliation needed. The gain is that downstream consumers (treasury, FP&A) get numbers that are internally consistent without a post-processing step that nobody fully understands.

2. When substitution effects dominate

Consider a platform where customers choose among product categories and the total number of transactions is roughly fixed in the short run. A spike in Category A usually means a dip in Category B. Independent models do not capture this. They can both go up simultaneously, producing a total that overshoots reality.

Compositional models handle substitution naturally because the shares are jointly modeled. If one share increases, the others must decrease. This is not a side effect. It is the point. The constraint encodes the economic reality that these categories compete for the same pool of transactions.

Bottom-up aggregation can approximate this with a reconciliation step, but the reconciliation is doing the work that the compositional model does by construction. And reconciliation methods (MinT, OLS, etc.) optimize a statistical criterion, not the structural constraint. They get you close, but "close" means your shares sum to 1.003 or 0.997, and someone in finance will ask why.

3. When you need calibrated prediction intervals for shares

This is where the gap is widest. Bottom-up forecasting with reconciliation gives you point estimates that (roughly) cohere. Getting coherent prediction intervals is harder. You need the joint distribution of all components, including their correlations, to produce intervals for the shares that respect the simplex constraint.

A Bayesian compositional model gives you this directly. The posterior predictive distribution lives on the simplex, so any credible interval you compute for a share is automatically bounded between 0 and 1, and the intervals for all shares are jointly consistent. Try getting that from independent ARIMA models with ad hoc reconciliation.

When to skip it

If your components are weakly correlated, you do not care about the shares (only the levels), and nobody downstream needs the parts to add up exactly, independent forecasting with reconciliation is simpler and probably sufficient. The setup cost of a compositional model is not zero. You need priors on the simplex, a sampler that handles the constraint, and stakeholders who understand what "Dirichlet" means or at least trust that it works.

The decision rule I use: if someone will divide your forecasts to compute a share and then make a decision based on that share, model the shares directly.

Two-Part Forecasting for Time-Shifted Metrics
arXiv preprint Code & Stan models Supplementary material

The problem

Many sectors face a version of the same forecasting challenge: the date something is recorded does not match the date it happens. A booking is made on January 5 for a trip starting February 10. A purchase order is placed on Monday for delivery on Friday. A trade is executed today for settlement in two days.

Traditional forecasting approaches operate on a single time axis. They struggle to track how a metric transitions from one origin to another. Hierarchical methods reconcile forecasts at different organizational levels but do not handle the two-axis structure. Temporal aggregation scales forecasts up or down in granularity but does not distribute one metric across another time dimension.

We introduce a two-part methodology that treats forecasting as a time-shift operator. Part 1 projects total demand on the recording axis. Part 2 translates those forecasts to the consumption axis using a compositional time series model.

Methodology

Part 1: Total bookings. A univariate time series model (we used Prophet, but any reasonable method works) forecasts total daily bookings on the booking-date axis. This gives you projected volume by day, ignoring when those bookings will actually be consumed.

Part 2: Lead-time allocation. B-DARMA models the proportions of bookings falling into each lead-time bucket (0 months out, 1 month out, ..., 12 months out) as a compositional time series. The model captures how the mix of last-minute versus long-term bookings evolves over time, with monthly seasonality via Fourier terms and a linear trend for shifting booking behavior.

Combining the parts. Multiply each month's total bookings by the corresponding lead-time proportions, shift forward by the appropriate offset, and sum across all booking months that align with a given trip month. The result is a forecast on the trip-date axis derived from the booking-date axis.

Full mathematical derivations, including the additive log-ratio transformations, are in the supplementary material. Technical details on the B-DARMA specification are in Katz, Brusch, and Weiss (2024).

Data

We used two anonymized Airbnb datasets spanning January 2014 to December 2019 (pre-COVID). City A is a large metropolitan market with strong seasonal variability. City B is a midsized leisure destination with more moderate seasonality. Each dataset contains daily booking counts, trip dates, and lead times in months. We created 13 monthly lead-time buckets (0 to 12). Training period was 2014 through 2018; test period was all of 2019.

Results

We benchmarked against a bottom-up Prophet approach: separate univariate Prophet forecasts for each lead-time bucket, summed to get totals.

CityMethodBooking Date MAEBooking Date MAPELead-Time Mean L1
ATwo-Part5,0834.8%0.0229
ABottom-Up5,3365.07%0.0389
BTwo-Part1,4063.07%0.0300
BBottom-Up1,4553.15%0.0499

The two-part approach outperformed bottom-up Prophet on both axes in both markets. The compositional framework captures cross-bucket correlations that independent univariate forecasts miss. The improvement is most visible in the lead-time distributions: normalized L1 distance drops by roughly 40% in both cities.

Why this matters in practice

The modularity is the real advantage. Adjusting total forecasts in response to a macro shock or event does not require refitting the lead-time model. Scenario analysis becomes fast. And for short-horizon forecasting, where some future bookings are already known, existing reservations serve as a baseline while the two-part model projects additional bookings that might still materialize.

B-DARMA can also incorporate exogenous covariates on the trip-date side. A Super Bowl indicator or Easter flag can shift proportions in the relevant lead-time bucket, linking trip-date features to booking-date allocations under one framework.

Limitations

Splitting the process into two parts may miss interactions between total demand and lead-time behavior that a unified model could capture. B-DARMA assumes strictly positive proportions, so sparse or zero-valued lead-time buckets require care. And if lead times are static or not important to your decision, the added complexity is not justified.

References

Aitchison, J. (1986). The Statistical Analysis of Compositional Data. Chapman & Hall.
Armstrong, J.S. (2001). Combining Forecasts. In: Principles of Forecasting. Springer.
Hyndman, R.J., Ahmed, R.A., Athanasopoulos, G., & Shang, H.L. (2011). Optimal combination forecasts for hierarchical time series. Computational Statistics & Data Analysis, 55(9), 2579-2589.
Katz, H., Brusch, K.T., & Weiss, R.E. (2024). A Bayesian Dirichlet auto-regressive moving average model for forecasting lead times. International Journal of Forecasting, 40(4), 1556-1567.
Silvestrini, A. & Veredas, D. (2008). Temporal aggregation of univariate and multivariate time series models: A survey. Journal of Economic Surveys, 22(3), 458-497.
Taylor, S.J. & Letham, B. (2018). Forecasting at scale. The American Statistician, 72(1), 37-45.
Zheng, T. & Chen, R. (2017). Dirichlet ARMA models for compositional time series. Journal of Multivariate Analysis, 158, 31-46.