## Archive for **May 2015**

## Granger Causality Test

# READ QUARTERLY DATA FROM CSV library(zoo) ts1 <- read.zoo('Documents/data/macros.csv', header = T, sep = ",", FUN = as.yearqtr) # CONVERT THE DATA TO STATIONARY TIME SERIES ts1$hpi_rate <- log(ts1$hpi / lag(ts1$hpi)) ts1$unemp_rate <- log(ts1$unemp / lag(ts1$unemp)) ts2 <- ts1[1:nrow(ts1) - 1, c(3, 4)] # METHOD 1: LMTEST PACKAGE library(lmtest) grangertest(unemp_rate ~ hpi_rate, order = 1, data = ts2) # Granger causality test # # Model 1: unemp_rate ~ Lags(unemp_rate, 1:1) + Lags(hpi_rate, 1:1) # Model 2: unemp_rate ~ Lags(unemp_rate, 1:1) # Res.Df Df F Pr(>F) # 1 55 # 2 56 -1 4.5419 0.03756 * # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # METHOD 2: VARS PACKAGE library(vars) var <- VAR(ts2, p = 1, type = "const") causality(var, cause = "hpi_rate")$Granger # Granger causality H0: hpi_rate do not Granger-cause unemp_rate # # data: VAR object var # F-Test = 4.5419, df1 = 1, df2 = 110, p-value = 0.0353 # AUTOMATICALLY SEARCH FOR THE MOST SIGNIFICANT RESULT for (i in 1:4) { cat("LAG =", i) print(causality(VAR(ts2, p = i, type = "const"), cause = "hpi_rate")$Granger) }

## SAS Macro for Jarque-Bera Normality Test

%macro jarque_bera(data = , var = ); **********************************************; * SAS macro to do Jarque-Bera Normality Test *; * ------------------------------------------ *; * wensui.liu@53.com *; **********************************************; options mprint mlogic symbolgen nodate; ods listing close; ods output moments = m1; proc univariate data = &data normal; var &var.; run; proc sql noprint; select nvalue1 into :n from m1 where upcase(compress(label1, ' ')) = 'N'; select put(nvalue1, best32.) into :s from m1 where upcase(compress(label1, ' ')) = 'SKEWNESS'; select put(nvalue2, best32.) into :k from m1 where upcase(compress(label2, ' ')) = 'KURTOSIS'; quit; data _temp_; jb = ((&s) ** 2 + (&k) ** 2 / 4) / 6 * &n; pvalue = 1 - probchi(jb, 2); put jb pvalue; run; ods listing; proc print data = _last_ noobs; run; %mend jarque_bera;

## Read A Block of Spreadsheet with R

In R, there are two ways to read a block of the spreadsheet, e.g. xlsx file, as the one shown below.

The xlsx package provides the most intuitive interface with readColumns() function by explicitly defining the starting and the ending columns and rows.

library(xlsx) file <- loadWorkbook("C:\\Documents and Settings\\Administrator\\Desktop\\test.xlsx") df1 <- readColumns(getSheets(file)[[1]], startColumn = 3, endColumn = 5, startRow = 5, endRow = 8, header = T) df1 # X Y Z # 1 1 A 2015-01-01 # 2 2 B 2015-02-01 # 3 3 C 2015-03-01

However, if we can define a named range for the block in the excel, the XLConnect package might be more convenient. In the example below, we first defined a range named as “block” within the spreadsheet and then called this named range with readNamedRegionFromFile() function without the necessity of specifying rows and columns.

library(XLConnect) df2 <- readNamedRegionFromFile("C:\\Documents and Settings\\Administrator\\Desktop\\test.xlsx", "block") df2 # X Y Z # 1 1 A 2015-01-01 # 2 2 B 2015-02-01 # 3 3 C 2015-03-01

## To Difference or Not To Difference?

In the textbook of time series analysis, we’ve been taught to difference the time series in order to have a stationary series, which can be justified by various plots and statistical tests.

In the real-world time series analysis, things are not always as clear as shown in the textbook. For instance, although the ACF plot shows a not-so-slow decay pattern, ADF test however can’t reject the null hypothesis of a unit root. In such cases, many analysts might tend to difference the time series to be on the safe side in their view.

However, is it really a safe practice to difference a time series anyway to have a stationary series to model? In the example below, I will show that inappropriately differencing a time series would lead the model development to an undesirable direction.

First of all, let’s simulate an univariate series under the Gaussian distributional assumption. By theory, this series has to be stationary.

> library(urca) > library(forecast) > library(normwhn.test) > x <- rnorm(100) > par(mfrow = c(2, 1)) > acf(x) > pacf(x) > whitenoise.test(x) [1] "no. of observations" [1] 100 [1] "T" [1] 50 [1] "CVM stat MN" [1] 0.8687478 [1] "tMN" [1] -0.9280931 [1] "test value" [1] 0.6426144 > x.adf <- ur.df(x, type = c("none"), selectlags = "BIC") > summary(x.adf) ############################################### # Augmented Dickey-Fuller Test Unit Root Test # ############################################### Test regression none Call: lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag) Residuals: Min 1Q Median 3Q Max -1.75385 -0.60585 -0.03467 0.61702 3.10100 Coefficients: Estimate Std. Error t value Pr(>|t|) z.lag.1 -1.008829 0.143635 -7.024 3.1e-10 *** z.diff.lag 0.002833 0.101412 0.028 0.978 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.9501 on 96 degrees of freedom Multiple R-squared: 0.5064, Adjusted R-squared: 0.4961 F-statistic: 49.25 on 2 and 96 DF, p-value: 1.909e-15 Value of test-statistic is: -7.0235 Critical values for test statistics: 1pct 5pct 10pct tau1 -2.6 -1.95 -1.61 > x.pkss <- ur.kpss(x, type = "mu", lags = "short") > summary(x.pkss) ####################### # KPSS Unit Root Test # ####################### Test is of type: mu with 4 lags. Value of test-statistic is: 0.4136 Critical value for a significance level of: 10pct 5pct 2.5pct 1pct critical values 0.347 0.463 0.574 0.739 > auto.arima(x, ic = 'bic') Series: x ARIMA(0,0,0) with zero mean sigma^2 estimated as 0.8829: log likelihood=-135.67 AIC=273.34 AICc=273.38 BIC=275.94

As shown in the above output:

1) Since x is simulated with the normal assumption, the series should be a white noise by definition.

2) ACF plot shows no auto-correlation at all, as it should.

3) In ADF test, the null hypothesis of unit root is rejected.

4) In PKSS test, the null hypothesis of stationarity is not rejected.

5) The output from auto.arima() suggests an ARIMA(0, 0, 0) model, which is completely in line with the assumption.

However, what would happen if we take the difference of x anyway?

> difx <- diff(x) > par(mfrow = c(2, 1)) > acf(difx) > pacf(difx) > whitenoise.test(difx) [1] "no. of observations" [1] 99 [1] "T" [1] 49 [1] "CVM stat MN" [1] 1.669876 [1] "tMN" [1] 4.689132 [1] "test value" [1] 0.01904923 > auto.arima(difx, ic = 'bic') Series: difx ARIMA(0,0,1) with zero mean Coefficients: ma1 -0.9639 s.e. 0.0327 sigma^2 estimated as 0.901: log likelihood=-136.64 AIC=277.27 AICc=277.4 BIC=282.46

The above output is quite interesting in a way that we just artificially “created” a model by over-differencing the white noise series.

1) After over-differenced, the series is not a white noise anymore with the null hypothesis rejected, e.g. p-value = 0.02.

2) In addition, the auto.arima() suggests that an ARIMA(0, 0, 1) model might fit the data.