Probabilistic time series forecasting with compositional bayesian neural networks

Posted by Urs Köster, Software program Engineer, Google Analysis

Time collection issues are ubiquitous, from forecasting climate and site visitors patterns to understanding financial traits. Bayesian approaches begin with an assumption in regards to the information’s patterns (prior chance), amassing proof (e.g., new time collection information), and repeatedly updating that assumption to type a posterior chance distribution. Conventional Bayesian approaches like Gaussian processes (GPs) and Structural Time Sequence are extensively used for modeling time collection information, e.g., the generally used Mauna Loa CO2 dataset. Nonetheless, they usually depend on area specialists to painstakingly choose acceptable mannequin parts and could also be computationally costly. Options comparable to neural networks lack interpretability, making it obscure how they generate forecasts, and do not produce dependable confidence intervals.

To that finish, we introduce AutoBNN, a brand new open-source package deal written in JAX. AutoBNN automates the invention of interpretable time collection forecasting fashions, supplies high-quality uncertainty estimates, and scales successfully to be used on massive datasets. We describe how AutoBNN combines the interpretability of conventional probabilistic approaches with the scalability and adaptability of neural networks.

AutoBNN

AutoBNN is predicated on a line of analysis that over the previous decade has yielded improved predictive accuracy by modeling time collection utilizing GPs with realized kernel constructions. The kernel operate of a GP encodes assumptions in regards to the operate being modeled, such because the presence of traits, periodicity or noise. With realized GP kernels, the kernel operate is outlined compositionally: it’s both a base kernel (comparable to Linear, Quadratic, Periodic, Matérn or ExponentiatedQuadratic) or a composite that mixes two or extra kernel features utilizing operators comparable to Addition, Multiplication, or ChangePoint. This compositional kernel construction serves two associated functions. First, it’s easy sufficient {that a} consumer who’s an professional about their information, however not essentially about GPs, can assemble an affordable prior for his or her time collection. Second, methods like Sequential Monte Carlo can be utilized for discrete searches over small constructions and might output interpretable outcomes.

AutoBNN improves upon these concepts, changing the GP with Bayesian neural networks (BNNs) whereas retaining the compositional kernel construction. A BNN is a neural community with a chance distribution over weights reasonably than a set set of weights. This induces a distribution over outputs, capturing uncertainty within the predictions. BNNs deliver the next benefits over GPs: First, coaching massive GPs is computationally costly, and conventional coaching algorithms scale because the dice of the variety of information factors within the time collection. In distinction, for a set width, coaching a BNN will usually be roughly linear within the variety of information factors. Second, BNNs lend themselves higher to GPU and TPU {hardware} acceleration than GP coaching operations. Third, compositional BNNs may be simply mixed with conventional deep BNNs, which have the power to do characteristic discovery. One might think about “hybrid” architectures, during which customers specify a top-level construction of Add(Linear, Periodic, Deep), and the deep BNN is left to study the contributions from doubtlessly high-dimensional covariate data.

How would possibly one translate a GP with compositional kernels right into a BNN then? A single layer neural community will sometimes converge to a GP because the variety of neurons (or “width”) goes to infinity. Extra not too long ago, researchers have found a correspondence within the different course — many fashionable GP kernels (comparable to Matern, ExponentiatedQuadratic, Polynomial or Periodic) may be obtained as infinite-width BNNs with appropriately chosen activation features and weight distributions. Moreover, these BNNs stay near the corresponding GP even when the width could be very a lot lower than infinite. For instance, the figures under present the distinction within the covariance between pairs of observations, and regression outcomes of the true GPs and their corresponding width-10 neural community variations.

Comparability of Gram matrices between true GP kernels (prime row) and their width 10 neural community approximations (backside row).

Comparability of regression outcomes between true GP kernels (prime row) and their width 10 neural community approximations (backside row).

Lastly, the interpretation is accomplished with BNN analogues of the Addition and Multiplication operators over GPs, and enter warping to provide periodic kernels. BNN addition is straightforwardly given by including the outputs of the part BNNs. BNN multiplication is achieved by multiplying the activations of the hidden layers of the BNNs after which making use of a shared dense layer. We’re subsequently restricted to solely multiplying BNNs with the identical hidden width.

Utilizing AutoBNN

The AutoBNN package deal is on the market inside Tensorflow Chance. It’s carried out in JAX and makes use of the flax.linen neural community library. It implements all the base kernels and operators mentioned thus far (Linear, Quadratic, Matern, ExponentiatedQuadratic, Periodic, Addition, Multiplication) plus one new kernel and three new operators:

a OneLayer kernel, a single hidden layer ReLU BNN,
a ChangePoint operator that permits easily switching between two kernels,
a LearnableChangePoint operator which is similar as ChangePoint besides place and slope are given prior distributions and may be learnt from the information, and
a WeightedSum operator.

WeightedSum combines two or extra BNNs with learnable mixing weights, the place the learnable weights observe a Dirichlet prior. By default, a flat Dirichlet distribution with focus 1.0 is used.

WeightedSums permit a “smooth” model of construction discovery, i.e., coaching a linear mixture of many doable fashions directly. In distinction to construction discovery with discrete constructions, comparable to in AutoGP, this permits us to make use of customary gradient strategies to study constructions, reasonably than utilizing costly discrete optimization. As a substitute of evaluating potential combinatorial constructions in collection, WeightedSum permits us to judge them in parallel.

To simply allow exploration, AutoBNN defines numerous mannequin constructions that include both top-level or inner WeightedSums. The names of those fashions can be utilized as the primary parameter in any of the estimator constructors, and embody issues like sum_of_stumps (the WeightedSum over all the bottom kernels) and sum_of_shallow (which provides all doable combos of base kernels with all operators).

Illustration of the sum_of_stumps mannequin. The bars within the prime row present the quantity by which every base kernel contributes, and the underside row exhibits the operate represented by the bottom kernel. The ensuing weighted sum is proven on the precise.

The determine under demonstrates the strategy of construction discovery on the N374 (a time collection of yearly monetary information ranging from 1949) from the M3 dataset. The six base constructions have been ExponentiatedQuadratic (which is similar because the Radial Foundation Perform kernel, or RBF for brief), Matern, Linear, Quadratic, OneLayer and Periodic kernels. The determine exhibits the MAP estimates of their weights over an ensemble of 32 particles. The entire excessive chance particles gave a big weight to the Periodic part, low weights to Linear, Quadratic and OneLayer, and a big weight to both RBF or Matern.

Parallel coordinates plot of the MAP estimates of the bottom kernel weights over 32 particles. The sum_of_stumps mannequin was skilled on the N374 collection from the M3 dataset (insert in blue). Darker strains correspond to particles with increased likelihoods.

By utilizing WeightedSums because the inputs to different operators, it’s doable to specific wealthy combinatorial constructions, whereas conserving fashions compact and the variety of learnable weights small. For instance, we embody the sum_of_products mannequin (illustrated within the determine under) which first creates a pairwise product of two WeightedSums, after which a sum of the 2 merchandise. By setting among the weights to zero, we are able to create many various discrete constructions. The overall variety of doable constructions on this mannequin is 2¹⁶, since there are 16 base kernels that may be turned on or off. All these constructions are explored implicitly by coaching simply this one mannequin.

Illustration of the “sum_of_products” mannequin. Every of the 4 WeightedSums have the identical construction because the “sum_of_stumps” mannequin.

We’ve discovered, nevertheless, that sure combos of kernels (e.g., the product of Periodic and both the Matern or ExponentiatedQuadratic) result in overfitting on many datasets. To stop this, we have now outlined mannequin lessons like sum_of_safe_shallow that exclude such merchandise when performing construction discovery with WeightedSums.

For coaching, AutoBNN supplies AutoBnnMapEstimator and AutoBnnMCMCEstimator to carry out MAP and MCMC inference, respectively. Both estimator may be mixed with any of the six chance features, together with 4 primarily based on regular distributions with totally different noise traits for steady information and two primarily based on the damaging binomial distribution for rely information.

Outcome from operating AutoBNN on the Mauna Loa CO2 dataset in our instance colab. The mannequin captures the development and seasonal part within the information. Extrapolating into the long run, the imply prediction barely underestimates the precise development, whereas the 95% confidence interval step by step will increase.

To suit a mannequin like within the determine above, all it takes is the next 10 strains of code, utilizing the scikit-learn–impressed estimator interface:

import autobnn as ab

mannequin = ab.operators.Add(
    bnns=(ab.kernels.PeriodicBNN(width=50),
          ab.kernels.LinearBNN(width=50),
          ab.kernels.MaternBNN(width=50)))

estimator = ab.estimators.AutoBnnMapEstimator(
    mannequin, 'normal_likelihood_logistic_noise', jax.random.PRNGKey(42),
    durations=[12])

estimator.match(my_training_data_xs, my_training_data_ys)
low, mid, excessive = estimator.predict_quantiles(my_training_data_xs)

Conclusion

AutoBNN supplies a strong and versatile framework for constructing refined time collection prediction fashions. By combining the strengths of BNNs and GPs with compositional kernels, AutoBNN opens a world of potentialities for understanding and forecasting advanced information. We invite the group to attempt the colab, and leverage this library to innovate and resolve real-world challenges.

Acknowledgements

AutoBNN was written by Colin Carroll, Thomas Colthurst, Urs Köster and Srinivas Vasudevan. We wish to thank Kevin Murphy, Brian Patton and Feras Saad for his or her recommendation and suggestions.

What's Hot

Important Pages:

Probabilistic time series forecasting with compositional bayesian neural networks – Google Research Blog

AutoBNN

Utilizing AutoBNN

Conclusion

Acknowledgements

Related Posts