## About Cointegration

As in the stat workshop supporting the loss forecasting, my analysts and I are frequently asked to quantify the “correlation” between time series. In the summary below, I will briefly convey a statistical method other than “correlation”, namely cointegration, to describe the relationship between time series.

In the empirical finance, it is a popular practice for many financial practitioners to use correlation describing a relationship between multiple time series. However, this approach has been criticized in that a relationship might be wrongfully inferred due to the existence of other latent causal factors. In this case, cointegration, proposed by Engle and Granger (1987), becomes an alternative to characterize this correlated nature between time series.

In layman’s term, cointegration describes if two or more time series are moving with a common trend. In the statistical definition, assumed two time series X_t and Y_t individually integrated of order one, i.e. I(1), if a linear combination of X_t and Y_t, e.g. Z_t = X_t + B * Y_t, is stationary, i.e. I(0), then these two series X_t and Y_t are defined to be co-integrated. Since the idea of co-integration is concerned with a co-movement / long-term equilibrium among multiple time series, it is also used to test the Efficient Market Hypothesis (EMH) in econometrics.

In the cointegration analysis, most practices are fallen into two major categories, either the minimization of certain variances or the maximization of certain correlations. For instance, the single equation approach, such as the one suggested by Engel and Granger (1987), looks for the linear combination of X_t and Y_t with minimum variance and therefore belongs to the first category. On the other hand, reduced rank system based approach, such as the one proposed by Johansen (1988), belongs to the second category in that it looks for the linear combination of X_t and Y_t with maximum correlation.

Following the logic outlined by Engle and Granger, it is straightforward to formulate an augmented Dickey-Fuller (ADF) test for the cointegration analysis between two time series, although other unit-root test such as Phillips–Perron or Durbin–Watson should also suffice. Given X_t and Y_t integrated of order one, the first step is to estimate a simple linear regression Y_t = a + B * X_t + e_t, in which e_t is just the residual term. Afterwards, ADF test is used to check the existence of unit roots in e_t. If the unit-root hypothesis for e_t is rejected, then e_t is I(0) and therefore stationary, implying that X_t and Y_t are co-integrated. Otherwise, co-integration between X_t and Y_t is not concluded. This approach is attractive is that it is extremely easy to implement and understand. However, this method is only appropriate for a system with only two time series and one possible cointegrating relationship.

Johansen test for cointegration is a maximum likelihood estimation procedure based on the Vector Autoregressive (VAR) model that allows for dynamic interactions between two or more series and therefore is more general than the previous approach. Consider a VAR model with order p such that Y_t = A_1 * Y_t-1 + … + A_p * Y_t-p + e_t, where Y_t is a vector of variables integrated of order one and e_t is a vector of innovations. Without the loss of generality, the VAR can be re-written as delta_Y_t = PI * Y_t-1 + SUM[GAMMA_i * delta_Y_t-i]. The whole idea of Johansen test is to decompose PI into two n by r matrices, α and β, such that PI = α * β` and β` * Y_t is stationary. r is the number of co-integrating relations (the cointegrating rank) and each column of β is the cointegrating vector. A likelihood ratio test can be formulated to test the null hypothesis of r co-integrating vectors against the alternative hypothesis of n cointegrating vectors. While Johansen’s method is a more powerful test for cointegration, the drawback is more complicated implementation and interpretation.