The co-integration is an important statistical concept behind the statistical arbitrage strategy named “Pairs Trading”. While projecting a stock price with time series models is by all means difficult, it is technically feasible to find a pair of (or even a portfolio of) stocks sharing the common trend such that a linear combination of two series is stationary, which is so-called co-integration. The underlying logic of Pairs Trading is to monitor movements of co-integrated stocks and to look for trading opportunities when the divergence presents. Under the mean-reversion assumption, the stock price would tend to move back to the long-term equilibrium. As a result, the spread between two co-integrated stock prices would eventually converge. Furthermore, given the stationarity of the spread between co-integrated stocks, it becomes possible to forecast such spread with time series models.

Below shows a R utility function helping to identify pairwise co-integrations based upon the Johansen Test out of a arbitrary number of stock prices provided in a list of tickers.

For instance, based on a starting date on 2010/01/01 and a list of tickers for major US banks, we are able to identify 23 pairs of co-integrated stock prices out of 78 pairwise combinations. It is interesting to see that stock prices of two regional players, e.g. Fifth Third and M&T, are highly co-integrated, as visualized in the chart below.

pkgs <- list("quantmod", "doParallel", "foreach", "urca") lapply(pkgs, require, character.only = T) registerDoParallel(cores = 4) jtest <- function(t1, t2) { start <- sd getSymbols(t1, from = start) getSymbols(t2, from = start) j <- summary(ca.jo(cbind(get(t1)[, 6], get(t2)[, 6]))) r <- data.frame(stock1 = t1, stock2 = t2, stat = j@teststat[2]) r[, c("pct10", "pct5", "pct1")] <- j@cval[2, ] return(r) } pair <- function(lst) { d2 <- data.frame(t(combn(lst, 2))) stat <- foreach(i = 1:nrow(d2), .combine = rbind) %dopar% jtest(as.character(d2[i, 1]), as.character(d2[i, 2])) stat <- stat[order(-stat$stat), ] # THE PIECE GENERATING * CAN'T BE DISPLAYED PROPERLY IN WORDPRESS rownames(stat) <- NULL return(stat) } sd <- "2010-01-01" tickers <- c("FITB", "BBT", "MTB", "STI", "PNC", "HBAN", "CMA", "USB", "KEY", "JPM", "C", "BAC", "WFC") pair(tickers) stock1 stock2 stat pct10 pct5 pct1 coint 1 STI JPM 27.207462 12.91 14.9 19.19 *** 2 FITB MTB 21.514142 12.91 14.9 19.19 *** 3 MTB KEY 20.760885 12.91 14.9 19.19 *** 4 HBAN KEY 19.247719 12.91 14.9 19.19 *** 5 C BAC 18.573168 12.91 14.9 19.19 ** 6 HBAN JPM 18.019051 12.91 14.9 19.19 ** 7 FITB BAC 17.490536 12.91 14.9 19.19 ** 8 PNC HBAN 16.959451 12.91 14.9 19.19 ** 9 FITB BBT 16.727097 12.91 14.9 19.19 ** 10 MTB HBAN 15.852456 12.91 14.9 19.19 ** 11 PNC JPM 15.822610 12.91 14.9 19.19 ** 12 CMA BAC 15.685086 12.91 14.9 19.19 ** 13 HBAN BAC 15.446149 12.91 14.9 19.19 ** 14 BBT MTB 15.256334 12.91 14.9 19.19 ** 15 MTB JPM 15.178646 12.91 14.9 19.19 ** 16 BBT HBAN 14.808770 12.91 14.9 19.19 * 17 KEY BAC 14.576440 12.91 14.9 19.19 * 18 FITB JPM 14.272424 12.91 14.9 19.19 * 19 STI BAC 14.253971 12.91 14.9 19.19 * 20 FITB PNC 14.215647 12.91 14.9 19.19 * 21 MTB BAC 13.891615 12.91 14.9 19.19 * 22 MTB PNC 13.668863 12.91 14.9 19.19 * 23 KEY JPM 12.952239 12.91 14.9 19.19 *