SARIMA for Time Series

SARIMA – Seasonal Autoregressive Integrated Moving Average: SARIMA extends ARIMA by incorporating a seasonal pattern that repeats at regular intervals. If there are recurring patterns in the data. It considers how the past seasonal patterns contribute to forecasting future values.

e.g., sales increasing every holiday season SARIMA captures this seasonality.

  1. Autoregressive (AR): Looks at how a value relates to its past value. If today’s temperature is high, it’s likely tomorrow’s will be high too.
  2. Integrated (I): Deals with trends by differencing—subtracting each value from the previous one. If temperatures are generally rising over time, SARIMA helps look at how much they’re going up.
  3. Moving Average (MA):Considers the error from previous predictions to improve future predictions. If yesterday’s prediction was a bit off, adjust today’s prediction to be closer to the actual value.
  4. Seasonal (S): Addresses repeating patterns, like daily or yearly seasons. Acknowledges that it’s colder in winter and hotter in summer, adjusting predictions accordingly.

Understanding SARIMA in Simple Steps

  1. Exploratory Data Analysis (EDA): It understands the characteristics of the time series data. Plot the data to identify trends, seasonality, and any patterns.
  2. Stationarity Check: Ensure the data has constant statistical properties over time. It  Applies differencing to make the data stationary. This involves subtracting each value from its previous value.
  3. Autocorrelation Analysis: Understand the correlation between the current value and past values. Plot Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) to identify the optimal values for AR and MA components.
  4. Model Fitting: We can use the identified parameters to fit the ARIMA or SARIMA model and can train the model using historical data, considering autoregressive, moving average, and integration components.
  5. Prediction and Evaluation: Make predictions using the fitted model and assess its performance and compare the predicted values with actual values to evaluate how well the model captures the underlying patterns in the data.

ARIMA for Time series data

ARIMA (Autoregressive Integrated Moving Average):

In an autoregressive model, the forecast of a variable is based on its linear combination of past values of the same variable.

  1. Autoregressive (AR):  ARIMA looks at the relationship between the current value of a time series and its past value. It considers how the past values of a variable contribute to predicting its future values.
    • If yesterday’s stock prices influenced today’s, ARIMA captures this influence.
  2. Integrated (I): It aims to make the time series data stationary, meaning its statistical properties like mean and variance remain constant over time. It checks if the data has a trend or changing statistical properties.
    • If the data is not stationary, ARIMA applies differencing to make it more predictable.
  3. Moving Average (MA): It involves considering the past forecast errors to predict future values. Instead of looking at past values of the variable, ARIMA looks at past errors in predicting the variable.
    • It considers how the errors from previous predictions influence the current prediction.

Understanding ARIMA in Simple Steps:

  1. Start with the Data:
    • Data should be more like having a series of numbers, like daily stock prices or monthly website visits—something that changes over time.
  2. Understand Trends:
    • ARIMA looks at the overall trend in the data. It analysis if the trend is generally going up or down. This helps to understand the baseline behavior.
  3. Deal with Trends:
    • If there’s a trend, ARIMA helps to remove it by differencing.
  4. Autoregressive Part:
    • It checks how today’s data relates to the data from the past. If there’s a clear connection, it uses this to make predictions.
  5. Moving Average Part:
    • ARIMA considers how accurate past predictions were and adjusts for any mistakes made. It’s a way of learning from past experiences.
  6. Combine Everything for Predictions:
    • The algorithm combines the insights from trends, historical data, and past predictions to make an informed guess about what might happen next

PCA algorithm

Principal Component Analysis is a dimensionality reduction technique commonly used in machine learning and data analysis. Its primary goal is to transform high-dimensional data into a lower-dimensional representation, capturing the most important information. This reduction in dimensionality can lead to improved computational efficiency, visualization, and often better model performance.

steps to perform PCA:

  1. Standardize the Data: If the features in the dataset have different scales, it is important to standardize them by subtracting the mean and dividing by the standard deviation to give each feature equal importance.
  2. Compute the Covariance Matrix: Calculate the covariance matrix of the standardized data. The covariance matrix provides information about how variables change together.
  3. Compute Eigenvectors and Eigenvalues: Calculate the eigenvectors and eigenvalues of the covariance matrix. The eigenvectors represent the principal components, and the eigenvalues indicate the amount of variance captured by each principal component.
  4. Sort Eigenvectors by Eigenvalues: Sort the eigenvectors in descending order based on their corresponding eigenvalues. The higher the eigenvalue, the more variance is captured by the corresponding eigenvector.
  5. Select Principal Components: Choose the top k eigenvectors to form the new feature space. Typically, it would select the number of principal components that capture a sufficiently high percentage of the total variance.

Example:

Suppose the data on people’s heights and weights. We can find that most of the variation, along a diagonal line, representing a combination of height and weight. PCA helps us focus on this main trend and ignore less important details.