AR-Net: A simple autoregressive neural network for time series

December 02, 2019

What the research is:

A new framework that combines the best of both traditional statistical models and neural network models for time series modeling, which is prevalent in many important applications, such as forecasting and anomaly detection. Classical models such as autoregression (AR) exploit the inherent characteristics of a time series, leading to a more concise model. This is possible because the model makes strong assumptions about the data, such as the true order of the AR process. These models, however, do not scale well for a large volume of training data, particularly if there are long-range dependencies or complex interactions.

To overcome the scalability challenges, sequence-to-sequence models have become popular in natural language processing. RNN-based methods, in particular, allow for a more expressive model without requiring elaborate features. While these models scale well to applications with rich data, they can be overly complex for typical time series data, resulting in the lack of interpretability. We needed a scalable, extensible, and interpretable model to bridge both the statistical and deep learning-based approaches. Our proposed framework models proven classic AR methods using a feedforward neural network approach. The feedforward model is not only as interpretable as AR models but also scalable and easier to use.

How it works:

Our basic building block, termed AR-Net, has two distinct advantages over its traditional counterpart:

AR-Net scales well to large orders, making it possible to estimate long-range dependencies (important in high-resolution monitoring applications, such as those in the data center domain).
AR-Net automatically selects and estimates the important coefficients of a sparse AR process, eliminating the need to know the true order of the AR process.

Consider a time series y1 , ... , yt, expressed as an AR process. In order to predict the next time step yt, each of the p past values of y is multiplied by a learned weight wi (called AR coefficient).

For a large p (called order), the traditional approach can become impractically slow to train. However, a large order is required for monitoring high-resolution millisecond or second-level data. To overcome the scalability challenge, we train a neural network with stochastic gradient descent to learn the AR coefficients. If we know the true order of the process, AR-Net effectively learns near-identical weights as classic AR implementations and is equally good at predicting the next value of the time series.

Left: AR-equivalent neural network without hidden layers (simplest form of AR-Net). Right: AR-inspired neural network with n hidden layers (general AR-Net).

If the order is unknown, AR-Net automatically learns the relevant weights, even if the underlying data is generated by a noisy and extremely sparse AR process. We achieve this by introducing a small regularization factor of the learned weights. In such a sparse setting, AR-Net clearly outperforms classic AR.

AR-Net effectively learns the sparse weights, setting the irrelevant weights to zero. Classic AR overestimates the irrelevant weights. Fitted on data generated by a noisy AR-3 process with sparsity (lags 1, 3, and 10 are non-zero).

Why it matters

Our work demonstrates that modeling time series with neural networks can be just as interpretable as doing so using classical methods. Furthermore, we make it computationally tractable and simple for the practitioner to fit a sparse AR model of a high order. This makes it possible to model temporal data without having to determine the true order of the underlying AR process, allowing the model to automatically learn accurate long-range dependencies without overfitting.

Computational time to fit a classic AR implementation (statsmodels in Python) and AR-Net (using PyTorch in Python).

We call our model AR-Net because it can seamlessly be expanded to include any arbitrary number of hidden layers. Adding layers will improve the predictive power of the model — but at the expense of interpretability. Our goal here is to show that even the simplest form of AR-Net is a strong alternative to classic AR implementations, particularly when dealing with sparse or high-order AR processes. This work paves the way for creating a deep learning model that semi-explicitly incorporates time series dynamics, such as autoregression, trend shifts, and seasonality. Building on existing open source tools such as Prophet and PyTorch will help make this feasible. We are excited about how our work may help empower time series practitioners in their daily work.