Phillips-Ouliaris Test For Cointegration

In a project of developing PPNR balance projection models, I tried to use the Phillips-Ouliaris (PO) test to investigate the cointegration between the historical balance and a set of macro-economic variables and noticed that implementation routines of PO test in various R packages, e.g. urca and tseries, would give different results. After reading through the original paper “Asymptotic Properties of Residual Based Tests for Co-Integration” by P. Phillips again, I started realizing that the po.test() function in the tseries package and the ca.po() function in the urca package are implementing different types of Phillips-Ouliaris cointegration tests. In other words, the so-called “Phillips-Ouliaris Cointegration test” is not A statistical test but a set of statistical tests with different assumptions, formulations, critical values, and implications.

Let’s start with simulating cointegrated series, as below.

set.seed(2019)
x <- cumsum(rnorm(200, sd = 0.5)) 
y <- cumsum(rnorm(200, sd = 0.5)) + 1
z <- x + y + rnorm(200, sd = 0.5)

First of all, the po.test() function from the tseries package is applied to simulated series with following observations:
1. As the position of each series is changed in the po.test() function, we will get different testing results.
2. Results are determined by which series on the most left-hand side.

The reason is that the po.test() function is testing the cointegration with Phillip’s Z_alpha test, which is the second residual-based test described in P171 of the paper. For this test, critical values in tables Ia – Ic in P189 are used to reject the Null of No Cointegration. Because the po.test() will use the series at the first position to derive the residual used in the test, results would be determined by the series on the most left-hand side.

tseries::po.test(cbind(x, y, z), demean = TRUE, lshort = TRUE)
# Phillips-Ouliaris demeaned = -186.03, Truncation lag parameter = 1, p-value = 0.01

tseries::po.test(cbind(z, x, y), demean = TRUE, lshort = TRUE)
# Phillips-Ouliaris demeaned = -204.7, Truncation lag parameter = 1, p-value = 0.01

tseries::po.test(cbind(z, y, x), demean = TRUE, lshort = TRUE)
# Phillips-Ouliaris demeaned = -204.7, Truncation lag parameter = 1, p-value = 0.01

The Phillips-Ouliaris test implemented in the ca.po() function from the urca package is different. In the ca.po() function, there are two cointegration tests implemented, namely “Pu” and “Pz” tests. Although both the ca.po() function and the po.test() function are supposed to do the Phillips-Ouliaris test，outcomes from both functions are completely different.

Below shows results of the Pu test, which is a Variance Ratio test and the fourth residual-based test described in P171 of the paper. For this test, critical values in tables IIIa – IIIc in P191 are used to reject the Null of No Cointegration. Similar to Phillip’s Z_alpha test, the Pu test also is not invariant to the position of each series and therefore would give different outcomes based upon the series on the most left-hand side.

urca::ca.po(cbind(x, y, z), demean = "constant", lag = "short", type = "Pu")
# The value of the test statistic is: 72.8124

urca::ca.po(cbind(z, x, y), demean = "constant", lag = "short", type = "Pu")
# The value of the test statistic is: 194.5645

urca::ca.po(cbind(z, y, x), demean = "constant", lag = "short", type = "Pu")
# The value of the test statistic is: 194.5645

At last, let’s look at the Pz test implemented in the ca.po() function. For this test, critical values in tables IVa – IVc in P192 are used to reject the Null of No Cointegration. As a multivariate trace statistic, the Pz test has its appeal that the outcome won’t change by the position of each series, as shown below.

urca::ca.po(cbind(x, y, z), demean = "constant", lag = "short", type = "Pz")
# The value of the test statistic is: 219.2746

urca::ca.po(cbind(z, x, y), demean = "constant", lag = "short", type = "Pz")
# The value of the test statistic is: 219.2746

Calculating ACF with Data Step Only

In SAS/ETS, it is trivial to calculate ACF of a time series with ARIMA procedure. However, the downside is that, in addition to ACF, you will get more outputs than necessary without knowing the underlying mechanism. The SAS macro below is a clean routine written with simple data steps showing each step how to calculate ACF and generating nothing but a table with ACF and the related lag without using SAS/ETS module at all. It is easy to write a wrapper around this macro for any further analysis.

%macro acf(data = , var = , out = acf);
***********************************************************;
* SAS MACRO CALCULATING AUTOCORRELATION FUNCTION WITH     *;
* DATA STEP ONLY                                          *;
* ======================================================= *;
* INPUT PAREMETERS:                                       *;
*  DATA : INPUT SAS DATA TABLE                            *;
*  VAR  : THE TIME SERIES TO TEST FOR INDEPENDENCE        *;
* ======================================================= *;
* OUTPUT:                                                 *;
*  OUT : A OUTPUT SAS DATA TABLE WITH ACF AND LAG         *;
* ======================================================= *;
* AUTHOR: WENSUI.LIU@53.COM                               *;
***********************************************************;

%local nobs;
data _1 (keep = &var);
  set &data end = eof;
  if eof then do;
    call execute('%let nobs = '||put(_n_, 8.)||';');
  end;
run;

proc sql noprint;
  select mean(&var) into :mean_x from _last_;
quit;

%do i = 1 %to %eval(&nobs - 1);

  data _2(keep = _:);
    set _1;
    _x = &var;
    _lag = lag&i.(_x);
  run;

  proc sql ;
  create table
    _3 as
  select
    (_x - &mean_x) ** 2               as _den,
    (_x - &mean_x) * (_lag - &mean_x) as _num
  from
    _last_;

  create table
    _4 as
  select
    &i                    as lag,
    sum(_num) / sum(_den) as acf
  from
    _last_;

  %if &i = 1 %then %do;
  create table 
    &out as
  select
    *
  from
    _4;
  %end;
  %else %do;
  insert into &out
  select
    *
  from
    _4;
  %end;

  drop table _2, _3, _4;
  quit;
%end;

%mend acf;

A More Flexible Ljung-Box Test in SAS

Ljung-Box test is an important diagnostic to check if residuals from the time series model are independently distributed. In SAS / ETS module, it is easy to perform Ljung-Box with ARIMA procedure. However, test outputs are only provided for Lag 6, 12, 18, and so on, which cannot be changed by any option.

data one;
  do i = 1 to 100;
    x = uniform(1);
	output;
  end;
run;

proc arima data = one;
  identify var = x whitenoise = ignoremiss;
run;
quit;
/*
                            Autocorrelation Check for White Noise

 To        Chi-             Pr >
Lag      Square     DF     ChiSq    --------------------Autocorrelations--------------------
  6        5.49      6    0.4832     0.051    -0.132     0.076    -0.024    -0.146     0.064
 12        6.78     12    0.8719     0.050     0.076    -0.046    -0.025    -0.016    -0.018
 18       10.43     18    0.9169     0.104    -0.053     0.063     0.038    -0.085    -0.065
 24       21.51     24    0.6083     0.007     0.178     0.113    -0.046     0.180     0.079
*/

The SAS macro below is a more flexible way to perform Ljung-Box test for any number of lags. As shown in the output, test results for Lag 6 and 12 are identical to the one directly from ARIMA procedure.

%macro LBtest(data = , var = , lags = 4);
***********************************************************;
* SAS MACRO PERFORMING LJUNG-BOX TEST FOR INDEPENDENCE    *;
* ======================================================= *;
* INPUT PAREMETERS:                                       *;
*  DATA : INPUT SAS DATA TABLE                            *;
*  VAR  : THE TIME SERIES TO TEST FOR INDEPENDENCE        *;
*  LAGS : THE NUMBER OF LAGS BEING TESTED                 *;
* ======================================================= *;
* AUTHOR: WENSUI.LIU@53.COM                               *;
***********************************************************;

%local nlag; 

data _1 (keep = &var);
  set &data end = eof;
  if eof then do;
    call execute('%let nlag = '||put(_n_ - 1, 8.)||';');
  end;
run;

proc arima data = _last_;
  identify var = &var nlag = &nlag outcov = _2 noprint;
run;
quit;

%do i = 1 %to &lags;
  data _3;
    set _2;
	where lag > 0 and lag <= &i;
  run;

  proc sql noprint;
    create table
	  _4 as
	select
      sum(corr * corr / n) * (&nlag + 1) * (&nlag + 3) as _chisq,
	  1 - probchi(calculated _chisq, &i.)              as _p_chisq,
	  &i                                               as _df
	from
	  _last_;
  quit;

  %if &i = 1 %then %do;
  data _5;
    set _4;
  run;
  %end;
  %else %do;
  data _5;
    set _5 _4;
  run;
  %end;
%end;

title;
proc report data = _5 spacing = 1 headline nowindows split = "*";
  column(" * LJUNG-BOX TEST FOR WHITE NOISE *
           * H0: RESIDUALS ARE INDEPENDENTLY DISTRIBUTED UPTO LAG &lags * "
          _chisq _df _p_chisq);
  define _chisq   / "CHI-SQUARE" width = 20 format = 15.10;
  define _df      / "DF"         width = 10 order;
  define _p_chisq / "P-VALUE"    width = 20 format = 15.10;
run;

%mend LBtest;

%LBtest(data = one, var = x, lags = 12);

/*
             LJUNG-BOX TEST FOR WHITE NOISE
 H0: RESIDUALS ARE INDEPENDENTLY DISTRIBUTED UPTO LAG 12

           CHI-SQUARE         DF              P-VALUE
 ------------------------------------------------------
         0.2644425904          1         0.6070843322
         2.0812769288          2         0.3532290858
         2.6839655476          3         0.4429590625
         2.7428168168          4         0.6017432831
         5.0425834917          5         0.4107053939
         5.4851972398          6         0.4832476224
         5.7586229652          7         0.5681994829
         6.4067856029          8         0.6017645131
         6.6410385135          9         0.6744356312
         6.7142471241         10         0.7521182318
         6.7427585395         11         0.8195164211
         6.7783018413         12         0.8719097622
*/

SAS Macro for Engle-Granger Co-integration Test

In the coursework of time series analysis, we’ve been taught that a time series regression of Y on X could be valid only when both X and Y are stationary due to the so-call “spurious regression problem”. However, one exception holds that if X and Y, albeit non-stationary, share a common trend such that their trends can be cancelled each other out, then X and Y are co-integrated and the regression of Y on X is valid. As a result, it is important to test the co-integration between X and Y.

Following the definition of co-integration, it is straightforward to formulate a procedure of the co-integration test. First of all, construct a linear combination between Y and X such that e = Y – (a + b * X). Secondly, test if e is stationary with ADF test. If e is stationary, then X and Y are co-integrated. This two-stage procedure is also called Engle-Granger co-integration test.

Below is a SAS macro implementing Engle-Granger co-integration test to show the long-term relationship between GDP and other macro-economic variables, e.g. Personal Consumption and Personal Disposable Income.

SAS Macro

%macro eg_coint(data = , y = , xs = );
*********************************************************************;
* THIS SAS MACRO IMPLEMENTATION ENGLE-GRANGER COINTEGRATION TEST IN *;
* A BATCH MODE TO PROCESS MANY TIME SERIES                          *;
*********************************************************************;
* INPUT PARAMETERS:                                                 *;
*   DATA: A INPUT SAS DATASET                                       *;
*   Y   : A DEPENDENT VARIABLE IN THE COINTEGRATION REGRESSION      *;
*   X   : A LIST OF INDEPENDENT VARIABLE IN THE COINTEGRATION       *;
*         REGRESSION                                                *;
*********************************************************************;
* AUTHOR: WENSUI.LIU@53.COM                                         *;
*********************************************************************;

options nocenter nonumber nodate mprint mlogic symbolgen
        orientation = landscape ls = 150 formchar = "|----|+|---+=|-/\<>*";

%local sig loop;

%let sig = 0.1;

%let loop = 1;

%do %while (%scan(&xs, &loop) ne %str());

  %let x = %scan(&xs, &loop);

  ods listing close;
  ods output FitStatistics = _fit;
  proc reg data = &data;
    model &y = &x;
    output out = _1 residual = r;
  run;
  quit;

  proc sql noprint;
    select cvalue2 into :r2 from _fit where upcase(label2) = "R-SQUARE";
  quit;

  proc arima data = _1;
    ods output stationaritytests = _adf1 (where = (upcase(type) = "ZERO MEAN" and lags = 1) drop = rho probrho fvalue probf);
    identify var = r stationarity = (adf = 1);
  run;
  quit;
  ods listing;

  %if &loop = 1 %then %do;
    data _adf;
      format vars $32. lterm_r2 best12. flg_coint $3.;
      set _adf1 (drop = type lags);
      vars = upcase("&x");
      lterm_r2 = &r2;
      if probtau < &sig then flg_coint = "YES";
      else flg_coint = "NO";
    run;
  %end;
  %else %do;
    data _adf;
      set _adf _adf1 (in = new drop = type lags);
      if new then do;
        vars = upcase("&x");
        lterm_r2 = &r2;
        if probtau < &sig then flg_coint = "YES";
          else flg_coint = "NO";
      end;
    run;
  %end;
    
  %let loop = %eval(&loop + 1);
%end;

proc sort data = _last_;
  by descending flg_coint probtau;
run;

proc report data = _last_ box spacing = 1 split = "/" nowd;
  COLUMN("ENGLE-GRANGER COINTEGRATION TEST BETWEEN %UPCASE(&Y) AND EACH VARIABLE BELOW/ "
         vars lterm_r2 flg_coint tau probtau);
  define vars      / "VARIABLES"                      width = 35;
  define lterm_r2  / "LONG-RUN/R-SQUARED"             width = 15 format =  9.4 center;
  define flg_coint / "COINTEGRATION/FLAG"             width = 15 center;
  define tau       / "TAU STATISTIC/FOR ADF TEST"     width = 20 format = 15.4;
  define probtau   / "P-VALUE FOR/ADF TEST"           width = 15 format =  9.4 center;
run;

%mend eg_coint;

%eg_coint(data = sashelp.citiqtr, y = gdp, xs = gyd gc);

SAS Output

----------------------------------------------------------------------------------------------------------
|                  ENGLE-GRANGER COINTEGRATION TEST BETWEEN GDP AND EACH VARIABLE BELOW                  |
|                                                                                                        |
|                                       LONG-RUN      COINTEGRATION         TAU STATISTIC   P-VALUE FOR  |
|VARIABLES                              R-SQUARED         FLAG               FOR ADF TEST    ADF TEST    |
|--------------------------------------------------------------------------------------------------------|
|GC                                 |      0.9985   |      YES      |             -2.8651|      0.0051   |
|-----------------------------------+---------------+---------------+--------------------+---------------|
|GYD                                |      0.9976   |      YES      |             -1.7793|      0.0715   |
----------------------------------------------------------------------------------------------------------

From the output, it is interesting to see that GDP in U.S. is driven more by Personal Consumption than by Personal Disposable Income.

SAS Macro to Test Stationarity in Batch

To determine if a time series is stationary or has the unit root, three methods can be used:

A. The most intuitive way, which is also sufficient in most cases, is to eyeball the ACF (Autocorrelation Function) plot of the time series. The ACF pattern with a fast decay might imply a stationary series.
B. Statistical tests for Unit Roots, e.g. ADF (Augmented Dickey–Fuller) or PP (Phillips–Perron) test, could be employed as well. With the Null Hypothesis of Unit Root, a statistically significant outcome might suggest a stationary series.
C. In addition to the aforementioned tests for Unit Roots, statistical tests for stationarity, e.g. KPSS (Kwiatkowski–Phillips–Schmidt–Shin) test, might be an useful complement as well. With the Null Hypothesis of stationarity, a statistically insignificant outcome might suggest a stationary series.

By testing both the unit root and stationarity, the analyst should be able to have a better understanding about the data nature of a specific time series.

The SAS macro below is a convenient wrapper of stationarity tests for many time series in the production environment. (Please note that this macro only works for SAS 9.2 or above.)

%macro stationary(data = , vars =);
***********************************************************;
* THIS SAS MACRO IS TO DO STATIONARITY TESTS FOR MANY     *;
* TIME SERIES                                             *;
* ------------------------------------------------------- *;
* INPUT PARAMETERS:                                       *;
*   DATA: A INPUT SAS DATASET                             *;
*   VARS: A LIST OF TIME SERIES                           *;
* ------------------------------------------------------- *;
* AUTHOR: WENSUI.LIU@53.COM                               *;
***********************************************************;

options nocenter nonumber nodate mprint mlogic symbolgen
        orientation = landscape ls = 150 formchar = "|----|+|---+=|-/\<>*";

%local sig loop;

%let sig = 0.1;

%let loop = 1;

%do %while (%scan(&vars, &loop) ne %str());

  %let x = %scan(&vars, &loop);

  proc sql noprint;
    select int(12 * ((count(&x) / 100) ** 0.25)) into :nlag1 from &data;

    select int(max(1, (count(&x) ** 0.5) / 5)) into :nlag2 from &data;
  quit;

  ods listing close;
  ods output kpss = _kpss (drop = model lags rename = (prob = probeta))
             adf = _adf  (drop = model lags rho probrho fstat probf rename = (tau = adf_tau probtau = adf_probtau))
             philperron = _pp  (drop = model lags rho probrho rename = (tau = pp_tau probtau = pp_probtau));
  proc autoreg data = &data;
    model &x = / noint stationarity = (adf = &nlag1, phillips = &nlag2, kpss = (kernel = nw lag = &nlag1));
  run;
  quit;
  ods listing;

  proc sql noprint;
    create table
      _1 as 
    select
      upcase("&x")           as vars length = 32,
      upcase(_adf.type)      as type,
      _adf.adf_tau,
      _adf.adf_probtau,
      _pp.pp_tau,
      _pp.pp_probtau,
      _kpss.eta,
      _kpss.probeta,
      case 
        when _adf.adf_probtau < &sig or _pp.pp_probtau < &sig or _kpss.probeta > &sig then "*"
        else " "
      end                    as _flg,
      &loop                  as _i,
      monotonic()            as _j
    from 
      _adf inner join _pp on _adf.type = _pp.type inner join _kpss on _adf.type = _kpss.type;
  quit;

  %if &loop = 1 %then %do;
    data _result;
      set _1;
    run;
  %end;
  %else %do;
    proc append base = _result data = _1;
    run;
  %end;

  proc datasets library = work nolist;
    delete _1 _adf _pp _kpss / memtype = data;
  quit;

  %let loop = %eval(&loop + 1);
%end;

proc sort data = _result;
  by _i _j;
run;

proc report data = _result box spacing = 1 split = "/" nowd;
  column("STATISTICAL TESTS FOR STATIONARITY/ "
         vars type adf_tau adf_probtau pp_tau pp_probtau eta probeta _flg);
  define vars        / "VARIABLES/ "                  width = 20 group order order = data;
  define type        / "TYPE/ "                       width = 15 order order = data;
  define adf_tau     / "ADF TEST/FOR/UNIT ROOT"       width = 10 format = 8.2;
  define adf_probtau / "P-VALUE/FOR/ADF TEST"         width = 10 format = 8.4 center;
  define pp_tau      / "PP TEST/FOR/UNIT ROOT"        width = 10 format = 8.2;
  define pp_probtau  / "P-VALUE/FOR/PP TEST"          width = 10 format = 8.4 center;
  define eta         / "KPSS TEST/FOR/STATIONARY"     width = 10 format = 8.2;
  define probeta     / "P-VALUE/FOR/KPSS TEST"        width = 10 format = 8.4 center;
  define _flg        / "STATIONARY/FLAG"              width = 10 center;
run;

%mend stationary;

Granger Causality Test

# READ QUARTERLY DATA FROM CSV
library(zoo)
ts1 <- read.zoo('Documents/data/macros.csv', header = T, sep = ",", FUN = as.yearqtr)

# CONVERT THE DATA TO STATIONARY TIME SERIES
ts1$hpi_rate <- log(ts1$hpi / lag(ts1$hpi))
ts1$unemp_rate <- log(ts1$unemp / lag(ts1$unemp))
ts2 <- ts1[1:nrow(ts1) - 1, c(3, 4)]

# METHOD 1: LMTEST PACKAGE
library(lmtest)
grangertest(unemp_rate ~ hpi_rate, order = 1, data = ts2)
# Granger causality test
#
# Model 1: unemp_rate ~ Lags(unemp_rate, 1:1) + Lags(hpi_rate, 1:1)
# Model 2: unemp_rate ~ Lags(unemp_rate, 1:1)
#   Res.Df Df      F  Pr(>F)
# 1     55
# 2     56 -1 4.5419 0.03756 *
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# METHOD 2: VARS PACKAGE
library(vars)
var <- VAR(ts2, p = 1, type = "const")
causality(var, cause = "hpi_rate")$Granger
#         Granger causality H0: hpi_rate do not Granger-cause unemp_rate
#
# data:  VAR object var
# F-Test = 4.5419, df1 = 1, df2 = 110, p-value = 0.0353

# AUTOMATICALLY SEARCH FOR THE MOST SIGNIFICANT RESULT
for (i in 1:4)
  {
  cat("LAG =", i)
  print(causality(VAR(ts2, p = i, type = "const"), cause = "hpi_rate")$Granger)
  }

Yet Another Blog in Statistical Computing

Time Series