ARIMA vs SARIMA: Forecasting Time Series Data

published on 05 January 2024

Forecasting time series data is crucial yet challenging. We can all agree that having accurate forecasts enables better decision making.

This article explores two of the most popular statistical methods - ARIMA and SARIMA models - equipping you with the knowledge to effectively model and predict any time series.

You will discover the key components and structural differences between ARIMA and SARIMA, learn practical strategies for parameter selection and model building, see a case study comparing their predictive accuracy, and find best practices to advance your own time series analysis.

Introduction to Forecasting Time Series Data

Time series analysis involves studying data collected over time to uncover meaningful statistics and patterns. It has applications across many fields, from economics and finance to operations and scientific modeling.

Some key concepts in time series forecasting include:

Exploring Time Series Analysis

Time series data is information collected over uniform time intervals, such as daily, weekly or monthly. Analysis of time series data can help identify trends, cycles, and patterns over time. Common applications include sales forecasting, budget planning, inventory management and predictive maintenance.

Fundamental Concepts in Time Series Forecasting

Several important ideas in time series analysis include:

  • Stationarity - The statistical properties of the series like the mean and variance are constant over time. Non-stationary data needs to be differenced to make it stationary.
  • Seasonality - A seasonal pattern exists when a time series is influenced by seasonal factors such as seasons, months, days etc.
  • Autocorrelation - Values in the time series are correlated with values from previous time steps.

The ARIMA Model for Time Series Forecasting

An ARIMA model is a class of statistical models for analyzing and forecasting time series data.

It explicitly caters to trend, cyclicality and autocorrelation in time series. The model consists of 3 components:

  • AR (Auto-Regressive) - Regression on the lagged values of the time series
  • I (Integrated) - Usage of differencing to make the time series stationary
  • MA (Moving Average) - Regression on the lagged forecast errors

SARIMA Time Series Forecasting Overview

SARIMA is an extension of ARIMA that explicitly supports seasonality. The model adds three additional components to ARIMA:

  • Seasonal AR term
  • Seasonal Difference term
  • Seasonal MA term

It provides greater flexibility and accuracy for time series with inherent seasonal patterns.

Is ARIMA good for time series forecasting?

The ARIMA (AutoRegressive Integrated Moving Average) model is a very popular statistical method for analyzing and forecasting time series data. Here are some key reasons why ARIMA can be an excellent choice:

  • Specifically designed for time series data: ARIMA models time series components like trends, seasonality, cycles, and residuals to make forecasts. This makes it more suitable for time series than more general machine learning models.
  • Handles non-stationarity well: Time series data often needs to be differenced and made stationary before modeling. ARIMA inherently handles non-stationarity through its "integrated" component.
  • Flexible yet simple: ARIMA has only 3 parameters - p, d, and q. This makes model selection and tuning intuitive. Combining these parameters in creative ways also allows capturing complex time series dynamics.
  • Performs well on linear data: ARIMA shines when the time series demonstrates a predominantly linear behavior over time. The model is able to capture the correlations and patterns.
  • Widely used with excellent resources: Being applied for decades, ARIMA is a mature, trusted method. There exist ample tutorials and support resources for practical implementation.

So in summary, ARIMA should definitely be on the shortlist of models to try because of its specialized time series capabilities. It works great when the fundamental data behavior is linear and demonstrates clear seasonality, trends or cycles over time that can be effectively modeled.

What are the limitations of SARIMA model?

The SARIMA model has a few key limitations to be aware of:

Complexity

  • SARIMA models can be more complex to specify and fit compared to simpler models like ARIMA. There are more parameters to select and optimize.
  • This complexity also makes SARIMA models harder to interpret. Understanding the effect of each parameter on the final forecasts is more difficult.

Data Requirements

  • SARIMA models need larger datasets to reliably estimate parameters and make accurate forecasts. The seasonal components require more historical data.
  • As a rule of thumb, you need at least 2-3 full seasonal cycles of data. For daily data, this means 2-3 years of history.

Assumptions

  • Like ARIMA, SARIMA still makes assumptions of stationarity. The time series needs to be stationary after accounting for seasonal differencing.
  • Unexpected changes in seasonality patterns can cause poor forecasts if the model doesn't adapt quickly enough.

So in summary, SARIMA is more complex and harder to apply than ARIMA. Make sure you have sufficient data to reliably fit a SARIMA, and monitor if the seasonal patterns change over time. Simpler models like ARIMA may be preferred if complexity or data availability is an issue.

Which is better LSTM or ARIMA for time series forecasting?

When comparing LSTM and ARIMA models for time series forecasting, the key factor is the length of the data window period used for training and testing.

Generally, ARIMA models perform better for longer data windows, while LSTM models are more effective for shorter windows. This is because:

  • ARIMA models explicitly model trends and seasonality in time series data. The longer the historical data, the better ARIMA can uncover patterns to make accurate future predictions.
  • LSTM models utilize deep learning to automatically extract features from time series. But with limited data, their generalization capability is constrained, leading to worse performance.

A study compared ARIMA and LSTM models on an air passengers dataset using different data window lengths. The key findings were:

  • 30-day window: ARIMA MAPE 3.4% vs LSTM MAPE 11.6%. ARIMA significantly outperformed LSTM.
  • 3-month average window: ARIMA MAPE 2.8% vs LSTM MAPE 5.1%. ARIMA was still noticeably better.

So when forecasting further into the future, ARIMA is the preferred model. But for very short-term predictions, LSTM may be comparable or even better than ARIMA in some cases. The optimal model depends on the specific time series and intended forecast horizon.

In summary, ARIMA is generally better for longer data windows, while LSTM is better suited for shorter windows. But performance also depends on time series characteristics and intended prediction period.

Why using SARIMA?

SARIMA models are useful for time series forecasting because they can capture seasonality and trends in the data. Here are some key reasons to use SARIMA over ARIMA:

  • SARIMA (Seasonal ARIMA) adds seasonal components to the ARIMA model, allowing it to handle seasonality in the time series. This makes SARIMA better suited for data with seasonal patterns like sales data, weather data, etc.
  • SARIMA models are flexible and can handle multiple seasonal periods like yearly, quarterly, monthly, weekly, daily, etc. ARIMA cannot directly model multiple seasonalities.
  • SARIMA builds on the ARIMA framework so it inherits all the strengths of ARIMA models like handling trends, lags, autocorrelations etc. This makes it more versatile for real-world time series.
  • With SARIMA, there is no need for manual deseasonalization of the data before modeling. The seasonal components are built into the model itself, making the workflow simpler.
  • SARIMA models tend to produce better forecasts compared to ARIMA on seasonal data, with lower errors. The seasonal adjustments make the forecasts more accurate.

So in summary, SARIMA expands the ARIMA framework to handle multiple seasonalities out-of-the-box. This makes it very useful for producing accurate forecasts on real-world seasonal data. The seasonal capabilities and flexibility of SARIMA makes it preferable over regular ARIMA models in most cases.

sbb-itb-ceaa4ed

ARIMA vs SARIMA: Structural Differences and Applications

Distinguishing Model Components and Parameters

ARIMA models aim to describe autocorrelations in the data by incorporating autoregressive (AR) and moving average (MA) components. In contrast, SARIMA expands upon ARIMA by adding additional terms to handle seasonality.

Specifically, SARIMA has three additional hyperparameters - seasonal autoregressive (SAR), seasonal differencing (D), and seasonal moving average (SMA). So while ARIMA has parameters (p, d, q), SARIMA has (p, d, q) x (P, D, Q). The seasonal components allow SARIMA to model cyclical patterns in time series.

For example, e-commerce sales often demonstrate weekly or yearly seasonality. By tuning the seasonal parameters, SARIMA can capture these recurring spikes and dips. ARIMA lacks dedicated seasonal components, making SARIMA better suited for seasonal data.

Assessing the Degree of Differencing

Both ARIMA and SARIMA use differencing to make time series data stationary before fitting models. However, SARIMA applies differencing on both the regular and seasonal components.

The degree of seasonal differencing depends on the frequency of patterns in the data. For example, differencing by 7 for weekly seasonality or 12 for yearly seasonality. Proper seasonal differencing removes these repetitive fluctuations, enabling better forecasts.

So SARIMA can difference both the base series and the seasonal component, providing more flexibility when handling trends and seasonality simultaneously. ARIMA is limited to regular differencing.

Strategies for Parameter Selection in ARIMA and SARIMA

While ARIMA relies on plotting ACF and PACF charts, SARIMA takes a more systematic grid search approach given the higher number of potential parameters.

For seasonal data, values for (P, D, Q) can be guessed from seasonality frequency, while (p, d, q) can be set using ACF/PACF inspection. Then grid searches fine-tune the parameters by minimizing AIC/BIC. Cross-validation also helps prevent overfitting.

Overall, SARIMA requires more methodical parameter tuning but can learn richer patterns. ARIMA order selection is faster but less robust for seasonal series.

Interpreting Coefficients in ARIMA and SARIMA Models

Coefficients in SARIMA models can be harder to interpret directly since they incorporate complex seasonal dynamics. Specialized statistical tests help assess coefficient significance.

But in ARIMA, coefficients have more straightforward correlations with lagged values. The AR terms relate the current value to previous values, while MA terms capture residual errors. This simpler structure makes ARIMA easier to analyze manually.

So SARIMA delivers predictive power through intricate seasonal modeling, while ARIMA enables clearer coefficient analysis via sparse autoregression. Their interpretability tradeoffs depend on use cases.

Evaluating Predictive Modeling with ARIMA and SARIMA

ARIMA and SARIMA are popular statistical models used for time series forecasting. When building these models, it's important to properly evaluate their performance to choose the best one for your data. Here are some best practices for model evaluation and comparison:

Benchmarking with In-Sample Testing

When first developing ARIMA and SARIMA models, start by benchmarking performance on the training dataset using in-sample testing. This gives a baseline metric for how well each model fits the data used to train it.

Compare performance metrics like AIC, RMSE, and MAPE on the training set. Lower values indicate better fit. This initial benchmark helps select the best model specifications to focus on.

The Importance of Out-of-Sample Testing

Though in-sample testing is useful for initial model comparisons, the true test is measuring performance on new unseen data. This is called out-of-sample or holdout testing.

After finalizing model specifications using the training dataset, evaluate them on a holdout test set. This simulates how your models would perform when making real forecasts.

Significantly worse test performance indicates overfitting. Comparing out-of-sample metrics is crucial for choosing the best forecasting model.

Information Criteria for Model Comparison

Information criteria like AIC and BIC quantify model fit while penalizing complexity to prevent overfitting. Lower values indicate a model that better fits the training data.

Information criteria combined with out-of-sample testing gives a complete picture of predictive accuracy on new data. Use both to select the best ARIMA or SARIMA model.

Measuring Prediction Accuracy

When evaluating ARIMA and SARIMA models on a test set, RMSE and MAPE are common accuracy metrics for time series data.

RMSE measures the standard deviation of errors. MAPE indicates error as a percentage of actual values. Together they quantify predictive performance on new data.

Choose the model with lowest RMSE and MAPE on test data. This results in the most accurate forecasts with least errors.

Following structured validation and comparison procedures leads to the most reliable ARIMA or SARIMA predictive model for time series forecasting.

Case Study: Forecasting with ARIMA and SARIMA

Defining the Forecasting Problem

This case study will forecast monthly airline passenger data using both ARIMA and SARIMA models. The business objective is to accurately predict future passenger traffic to inform staffing levels, inventory management, and marketing campaigns. The available dataset contains 144 observations of monthly totals of international airline passengers from 1949 to 1960.

Exploratory Data Analysis for Time Series

Initial analysis shows the time series exhibits:

  • An increasing trend over time as air travel became more popular
  • Seasonality patterns within each year as travel demand changes

Before modeling, we'll need to remove these components to stationarize the time series.

Developing ARIMA and SARIMA Models

We'll develop a SARIMA model to explicitly model seasonality, and an ARIMA model without seasonal components as a benchmark.

The SARIMA model selected after iterative testing was SARIMA(1,1,1)x(0,1,1,12) with non-zero parameter estimates, indicating:

  • AR term of 1 lag
  • I term of 1 lag
  • MA term of 1 lag
  • Seasonal period of 12 months
  • Seasonal I term of 1 lag
  • Seasonal MA term of 1 lag

The ARIMA model selected was ARIMA(1,1,1) with non-zero parameter estimates, containing:

  • AR term of 1 lag
  • I term of 1 lag
  • MA term of 1 lag

Diagnostic checks indicate both models are reasonably well fit with no significant issues.

Comparative Evaluation of Forecasting Models

The SARIMA model produces significantly lower forecast errors overall. Comparing test set MAPE values:

  • SARIMA MAPE: 11.15%
  • ARIMA MAPE: 18.32%

The SARIMA model better accounts for the seasonal fluctuations, leading to greater accuracy.

Selecting the Optimal Forecasting Model

Given the large improvement in predictive accuracy from modeling seasonality, the SARIMA(1,1,1)x(0,1,1,12) model is selected as the optimal forecasting model for this time series.

The SARIMA model will provide more precise estimates of future passenger traffic, better informing operational decisions. Its explicit handling of seasonal patterns makes it well-suited for aviation demand forecasting.

Practical Implementation of SARIMA in R

Understanding the SARIMA Formula

The SARIMA formula builds upon the ARIMA model by adding terms to account for seasonality. The full SARIMA model is denoted SARIMA(p, d, q)(P, D, Q)m, where:

  • p - Order of the non-seasonal AR (AutoRegressive) term
  • d - Degree of first differencing involved
  • q - Order of the non-seasonal MA (Moving Average) term
  • P - Order of the seasonal AR term
  • D - Degree of seasonal differencing
  • Q - Order of the seasonal MA term
  • m - Number of time steps for a single seasonal period

So in a nutshell, SARIMA handles both non-seasonal and seasonal components in a time series. The non-seasonal (p, d, q) part works the same as in ARIMA. The seasonal (P, D, Q)m part specifically models the seasonal cycles.

Before fitting a SARIMA model, the time series data needs to be stationary. This involves:

  • Removing trends - using differencing or data transforms
  • Removing seasonality - using seasonal differencing
  • Checking stationarity - with Dickey-Fuller test

Once trends and seasonal cycles are removed, what's left is stationary residual noise that can be modeled with SARIMA.

Here is sample R code to make a time series stationary:

# Load libraries
library(forecast)  

# Import data
data <- airpassengers

# Difference data
differenced <- diff(data, differences = 1)

# Seasonal difference 
final <- diff(differenced, lag=12, differences=1)

# Dickey-Fuller test
adf.test(final)  

Modeling with SARIMA in R

We can now fit a SARIMA model in R. Here is an example:

# SARIMA model 
model <- arima(final, order=c(1, 0, 1), seasonal = list(order = c(0, 1, 1), period = 12))

# Summary of model
summary(model)  

# Forecast with model
forecasts <- forecast(model, h=20) 

# Plot forecasts
plot(forecasts)

We specify the SARIMA parameters for the non-seasonal and seasonal components. Then we can view the model summary, make forecasts, and plot prediction intervals.

Interpreting Results Using StatsModels ADF Documentation

The Dickey-Fuller test output from the ADF.test function indicates whether a time series is stationary. By checking the p-values and test statistics against the critical values listed in the StatsModels ADF documentation, we can determine if stationarity has been achieved. If not, additional differencing is required before proceeding with SARIMA modeling.

Advanced Considerations: ARIMA vs SARIMA vs SARIMAX

Understanding the Extended SARIMAX Model

The SARIMAX model builds on SARIMA by incorporating exogenous variables, allowing it to account for external factors that may influence the time series. This makes SARIMAX well-suited for forecasting time series with clear seasonal patterns as well as outside variables that have a demonstrated effect.

Some key aspects of the SARIMAX model include:

  • Allows the inclusion of exogenous regressors to improve predictive accuracy
  • Can handle multiple seasonal cycles (e.g. daily + weekly seasons)
  • Provides greater flexibility in modeling complex time series
  • Requires more data and expertise to implement effectively

Overall, SARIMAX expands the capabilities of SARIMA models for specialized cases involving additional regressors. It is best suited for experts looking to maximize predictive power.

Comparing ARIMA vs SARIMAX

The main differences between ARIMA and SARIMAX are:

  • ARIMA models the time series based only on its own past values and errors. SARIMAX includes additional explanatory variables.
  • ARIMA can only account for trends and seasonality inherently found in the time series data itself. SARIMAX can model external influences.
  • ARIMA is simpler to implement but may not capture all relevant relationships. SARIMAX is more complex but can improve accuracy.

In summary, SARIMAX builds on ARIMA by allowing the inclusion of exogenous regressors. This added flexibility comes at the cost of increased complexity.

Choosing Between SARIMA and SARIMAX

The choice depends primarily on whether there are measurable external factors with known effects on the time series:

  • If no additional regressors can be identified, use SARIMA. The simpler model will suffice.
  • If clear exogenous variables are available, use SARIMAX to improve predictive accuracy.

Additionally, SARIMAX requires more data and expertise than SARIMA. So the choice also depends on data availability and the analyst's capability with advanced models.

Case Scenarios for SARIMAX Application

SARIMAX is well-suited for forecasting time series with external influences like:

  • Product sales affected by market advertising spend
  • Call center volume impacted by media events
  • Energy usage related to weather patterns
  • Retail revenue correlated to macroeconomic indicators

In these cases, the exogenous factors are measurable over time. By incorporating them directly, SARIMAX can account for their demonstrated effects on the time series and improve predictive accuracy.

The key is identifying relevant exogenous variables and having enough historical data. With good expertise and data availability, SARIMAX delivers excellent forecasts.

Conclusion: Best Practices in Statistical Modeling for Time Series

Summarizing Key Insights on ARIMA and SARIMA Forecasting

ARIMA and SARIMA models are useful statistical techniques for forecasting time series data that exhibits trends and seasonality. Key takeaways when deciding between the two include:

  • Use ARIMA models for data without seasonality, and SARIMA models when clear seasonal patterns exist. SARIMA builds on ARIMA by incorporating seasonal components.
  • Carefully check time series plots and conduct pre-testing like the ADF test before applying models, to determine if transformations like differencing are needed to make the data stationary.
  • Use auto.arima() or similar functions to automatically select model orders, saving time over manual tuning of (p, d, q) and (P, D, Q) hyperparemeters.
  • Validate fitted models by checking residual plots and conducting out-of-sample forecasting. Select the most accurate model for the final production environment.

Future Directions in Time Series Forecasting

While ARIMA and SARIMA provide a flexible and interpretable approach to modeling trends and seasonality, more advanced techniques can further improve accuracy:

  • SARIMAX extends SARIMA by including exogenous variables, allowing additional predictors like prices or promotions.
  • Deep learning models like LSTMs offer a non-linear modeling alternative, automatically learning complex data patterns.
  • Ensembles can combine multiple models to leverage their individual strengths.

Regardless of technique, the same best practices apply - careful data inspection, testing, validation and monitoring models over time as new data arrives.

Related posts

Read more