Yet Another Blog in Statistical Computing

I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

Archive for May 2013

Rmagic, A Handy Interface Bridging Python and R

Rmagic (http://ipython.org/ipython-doc/dev/config/extensions/rmagic.html) is the ipython extension that utilizes rpy2 in the back-end and provides a convenient interface accessing R from ipython. Compared with the generic use of rpy2, the rmagic extension allows users to exchange objects between ipython and R in a more flexible way and to run a single R function or a block of R code conveniently.

Below is an example demonstrating a simple use case how to push a pandas DataFrame object into R, convert it to a R data.frame, and then transfer back to a new pandas DataFrame object again.

In [1]: import pandas as pd

In [2]: # READ DATA INTO PANDAS DATAFRAME

In [3]: pydf1 = pd.read_table('../data/csdata.txt', header = 0)

In [4]: print pydf1.describe()
           LEV_LT3     TAX_NDEB      COLLAT1        SIZE1        PROF2  \
count  4421.000000  4421.000000  4421.000000  4421.000000  4421.000000   
mean      0.090832     0.824537     0.317354    13.510870     0.144593   
std       0.193872     2.884129     0.227150     1.692520     0.110908   
min       0.000000     0.000000     0.000000     7.738052     0.000016   
25%       0.000000     0.349381     0.124094    12.316970     0.072123   
50%       0.000000     0.566577     0.287613    13.539574     0.120344   
75%       0.011689     0.789128     0.472355    14.751119     0.187515   
max       0.998372   102.149483     0.995346    18.586632     1.590201   

           GROWTH2          AGE          LIQ        IND2A        IND3A  \
count  4421.000000  4421.000000  4421.000000  4421.000000  4421.000000   
mean     13.619633    20.366433     0.202813     0.611626     0.190228   
std      36.517739    14.538997     0.233256     0.487435     0.392526   
min     -81.247627     6.000000     0.000000     0.000000     0.000000   
25%      -3.563235    11.000000     0.034834     0.000000     0.000000   
50%       6.164303    17.000000     0.108544     1.000000     0.000000   
75%      21.951632    25.000000     0.291366     1.000000     0.000000   
max     681.354187   210.000000     1.000182     1.000000     1.000000   

             IND4A        IND5A  
count  4421.000000  4421.000000  
mean      0.026917     0.099073  
std       0.161859     0.298793  
min       0.000000     0.000000  
25%       0.000000     0.000000  
50%       0.000000     0.000000  
75%       0.000000     0.000000  
max       1.000000     1.000000  

In [5]: # CONVERT PANDAS DATAFRAME TO R DATA.FRAME

In [6]: %load_ext rmagic

In [7]: col = pydf1.columns

In [8]: %R -i pydf1,col colnames(pydf1) <- unlist(col); print(is.matrix(pydf1))
[1] TRUE

In [9]: %R rdf <- data.frame(pydf1); print(is.data.frame(rdf))
[1] TRUE

In [10]: %R print(summary(rdf))
    LEV_LT3           TAX_NDEB           COLLAT1           SIZE1       
 Min.   :0.00000   Min.   :  0.0000   Min.   :0.0000   Min.   : 7.738  
 1st Qu.:0.00000   1st Qu.:  0.3494   1st Qu.:0.1241   1st Qu.:12.317  
 Median :0.00000   Median :  0.5666   Median :0.2876   Median :13.540  
 Mean   :0.09083   Mean   :  0.8245   Mean   :0.3174   Mean   :13.511  
 3rd Qu.:0.01169   3rd Qu.:  0.7891   3rd Qu.:0.4724   3rd Qu.:14.751  
 Max.   :0.99837   Max.   :102.1495   Max.   :0.9953   Max.   :18.587  
     PROF2              GROWTH2             AGE              LIQ         
 Min.   :0.0000158   Min.   :-81.248   Min.   :  6.00   Min.   :0.00000  
 1st Qu.:0.0721233   1st Qu.: -3.563   1st Qu.: 11.00   1st Qu.:0.03483  
 Median :0.1203435   Median :  6.164   Median : 17.00   Median :0.10854  
 Mean   :0.1445929   Mean   : 13.620   Mean   : 20.37   Mean   :0.20281  
 3rd Qu.:0.1875148   3rd Qu.: 21.952   3rd Qu.: 25.00   3rd Qu.:0.29137  
 Max.   :1.5902009   Max.   :681.354   Max.   :210.00   Max.   :1.00018  
     IND2A            IND3A            IND4A             IND5A        
 Min.   :0.0000   Min.   :0.0000   Min.   :0.00000   Min.   :0.00000  
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:0.00000  
 Median :1.0000   Median :0.0000   Median :0.00000   Median :0.00000  
 Mean   :0.6116   Mean   :0.1902   Mean   :0.02692   Mean   :0.09907  
 3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:0.00000   3rd Qu.:0.00000  
 Max.   :1.0000   Max.   :1.0000   Max.   :1.00000   Max.   :1.00000  

In [11]: # CONVER R DATA.FRAME BACK TO PANDAS DATAFRAME

In [12]: %R -d rdf

In [13]: pydf2 = pd.DataFrame(rdf)

In [14]: print pydf2.describe()
           LEV_LT3     TAX_NDEB      COLLAT1        SIZE1        PROF2  \
count  4421.000000  4421.000000  4421.000000  4421.000000  4421.000000   
mean      0.090832     0.824537     0.317354    13.510870     0.144593   
std       0.193872     2.884129     0.227150     1.692520     0.110908   
min       0.000000     0.000000     0.000000     7.738052     0.000016   
25%       0.000000     0.349381     0.124094    12.316970     0.072123   
50%       0.000000     0.566577     0.287613    13.539574     0.120344   
75%       0.011689     0.789128     0.472355    14.751119     0.187515   
max       0.998372   102.149483     0.995346    18.586632     1.590201   

           GROWTH2          AGE          LIQ        IND2A        IND3A  \
count  4421.000000  4421.000000  4421.000000  4421.000000  4421.000000   
mean     13.619633    20.366433     0.202813     0.611626     0.190228   
std      36.517739    14.538997     0.233256     0.487435     0.392526   
min     -81.247627     6.000000     0.000000     0.000000     0.000000   
25%      -3.563235    11.000000     0.034834     0.000000     0.000000   
50%       6.164303    17.000000     0.108544     1.000000     0.000000   
75%      21.951632    25.000000     0.291366     1.000000     0.000000   
max     681.354187   210.000000     1.000182     1.000000     1.000000   

             IND4A        IND5A  
count  4421.000000  4421.000000  
mean      0.026917     0.099073  
std       0.161859     0.298793  
min       0.000000     0.000000  
25%       0.000000     0.000000  
50%       0.000000     0.000000  
75%       0.000000     0.000000  
max       1.000000     1.000000

Written by statcompute

May 31, 2013 at 10:25 pm

Posted in PYTHON, S+/R

Import All Text Files in A Folder with Parallel Execution

Sometimes, we might need to import all files, e.g. *.txt, with the same data layout in a folder without knowing each file name and then combine all pieces together. With the old method, we can use lapply() and do.call() functions to accomplish the task. However, when there are a large number of such files and each file size is large, it could be computational expensive.

With the foreach package, we are able to split the data import task into pieces and then distribute them to multiple CPUs with the parallel execution, as shown in the code below.

setwd('../data/csdata')
files <- list.files(pattern = "[.]txt$")

library(rbenchmark)
benchmark(replications = 10, order = "user.self",
  LAPPLY = {
    read.all <- lapply(files, read.table, header = TRUE)
    data1 <- do.call(rbind, read.all)
  },
  FOREACH = {
    library(doParallel)
    registerDoParallel(cores = 4)
    library(foreach)
    data2 <- foreach(i = files, .combine = rbind) %dopar% read.table(i, header = TRUE)
  }
)

library(compare)
all.equal(data1, data2)

From the output below, it is shown that both methods generated identical data.frames. However, the method with foreach() is much more efficient than the method with lapply() due to the parallelism.

     test replications elapsed relative user.self sys.self user.child sys.child
2 FOREACH           10   0.689    1.000     0.156    0.076      1.088     0.308
1  LAPPLY           10   1.078    1.565     1.076    0.004      0.000     0.000

Attaching package: ‘compare’

The following object is masked from ‘package:base’:

    isTRUE

[1] TRUE

Written by statcompute

May 26, 2013 at 11:55 pm

Posted in S+/R

Test Drive of Parallel Computing with R

Today, I did a test run of parallel computing with snow and multicore packages in R and compared the parallelism with the single-thread lapply() function.

In the test code below, a data.frame with 20M rows is simulated in a Ubuntu VM with 8-core CPU and 10-G memory. As the baseline, lapply() function is employed to calculate the aggregation by groups. For the comparison purpose, parLapply() function in snow package and mclapply() in multicore package are also used to generate the identical aggregated data.

n <- 20000000
set.seed(2013)
df <- data.frame(id = sample(20, n, replace = TRUE), x = rnorm(n), y = runif(n), z = rpois(n, 1))

library(rbenchmark)
benchmark(replications = 5, order = "user.self",
  LAPPLY = {
  cat('LAPPLY...\n')
  df1 <- data.frame(lapply(split(df[-1], df[1]), colMeans))
  },
  SNOW = {
  library(snow)
  cat('SNOW...\n')
  cl <- makeCluster(8, type = "SOCK")
  df2 <- data.frame(parLapply(cl, split(df[-1], df[1]), colMeans))
  stopCluster(cl)
  },
  MULTICORE = {
  cat('MULTICORE...\n')
  library(multicore)
  df3 <- data.frame(mclapply(split(df[-1], df[1]), colMeans, mc.cores = 8))
  }
)

library(compare)
all.equal(df1, df2)
all.equal(df1, df3)

Below is the benchmark output. As shown, the parallel solution, e.g. SNOW or MULTICORE, is 3 times more efficient than the baseline solution, e.g. LAPPLY, in terms of user time.

       test replications elapsed relative user.self sys.self user.child
3 MULTICORE            5 101.075    1.000    48.587    6.620    310.771
2      SNOW            5 127.715    1.264    53.192   13.685      0.012
1    LAPPLY            5 184.738    1.828   179.855    4.880      0.000
  sys.child
3     7.764
2     0.740
1     0.000

Attaching package: ‘compare’

The following object is masked from ‘package:base’:

    isTRUE

[1] TRUE
[1] TRUE

In order to illustrate the CPU usage, multiple screenshots have also been taken to show the difference between parallelism and single-thread.

In the first screenshot, it is shown that only 1 out of 8 CPUs is used at 100% with lapply() function and the rest 7 are idle.
Screenshot from 2013-05-25 22:14:18

In the second screenshot, it is shown that all 8 CPUs are used at 100% with parLapply() function in the snow package.
Screenshot from 2013-05-25 22:16:47

In the third screenshot, it is also shown that all 8 CPUs are used at 100% with mulapply() function in the multicore package.
Screenshot from 2013-05-25 22:18:40

Written by statcompute

May 25, 2013 at 11:40 pm

Posted in S+/R

Conversion between Factor and Dummies in R

data(iris)
str(iris)
# OUTPUT: 
# 'data.frame': 150 obs. of  5 variables:
#  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

### CONVERT THE FACTOR TO DUMMIES ### 
library(caret)
dummies <- predict(dummyVars(~ Species, data = iris), newdata = iris)
head(dummies, n = 3)
# OUTPUT:
#   Species.setosa Species.versicolor Species.virginica
# 1              1                  0                 0
# 2              1                  0                 0
# 3              1                  0                 0

### CONVERT DUMMIES TO THE FACTOR ###
header <- unlist(strsplit(colnames(dummies), '[.]'))[2 * (1:ncol(dummies))]
species <- factor(dummies %*% 1:ncol(dummies), labels = header)
str(species)
# OUTPUT:
#  Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

### COMPARE THE ORIGINAL AND THE CALCULATED FACTORS ###
library(compare)
all.equal(species, iris$Species)
# OUTPUT:
# [1] TRUE

Written by statcompute

May 18, 2013 at 10:25 pm

Posted in S+/R

A Prototype of Monotonic Binning Algorithm with R

I’ve been asked many time if I have a piece of R code implementing the monotonic binning algorithm, similar to the one that I developed with SAS (https://statcompute.wordpress.com/2012/06/10/a-sas-macro-implementing-monotonic-woe-transformation-in-scorecard-development) and with Python (https://statcompute.wordpress.com/2012/12/08/monotonic-binning-with-python). Today, I finally had time to draft a quick prototype with 20 lines of R code, which is however barely useable without the further polish. But it is still a little surprising to me how efficient it can be to use R in algorithm prototyping, much sleeker than SAS macro.

library(sas7bdat)
library(Hmisc)

bin <- function(x, y){
  n <- min(50, length(unique(x)))
  repeat {
    n   <- n - 1
    d1  <- data.frame(x, y, bin = cut2(x, g = n)) 
    d2  <- aggregate(d1[-3], d1[3], mean)
    cor <- cor(d2[-1], method = "spearman")
    if(abs(cor[1, 2]) == 1) break
  }
  d2[2] <- NULL
  colnames(d2) <- c('LEVEL', 'RATE')
  head <- paste(toupper(substitute(y)), " RATE by ", toupper(substitute(x)), sep = '')
  cat("+-", rep("-", nchar(head)), "-+\n", sep = '')
  cat("| ", head, ' |\n', sep = '')
  cat("+-", rep("-", nchar(head)), "-+\n", sep = '')
  print(d2)
  cat("\n")
}

data <- read.sas7bdat("C:\\Users\\liuwensui\\Downloads\\accepts.sas7bdat")
attach(data)

bin(bureau_score, bad)
bin(age_oldest_tr, bad)
bin(tot_income, bad)
bin(tot_tr, bad)

R output:

+--------------------------+
| BAD RATE by BUREAU_SCORE |
+--------------------------+
       LEVEL       RATE
1  [443,618) 0.44639376
2  [618,643) 0.38446602
3  [643,658) 0.31835938
4  [658,673) 0.23819302
5  [673,686) 0.19838057
6  [686,702) 0.17850288
7  [702,715) 0.14168378
8  [715,731) 0.09815951
9  [731,752) 0.07212476
10 [752,776) 0.05487805
11 [776,848] 0.02605210

+---------------------------+
| BAD RATE by AGE_OLDEST_TR |
+---------------------------+
       LEVEL       RATE
1  [  1, 34) 0.33333333
2  [ 34, 62) 0.30560928
3  [ 62, 87) 0.25145068
4  [ 87,113) 0.23346304
5  [113,130) 0.21616162
6  [130,149) 0.20036101
7  [149,168) 0.19361702
8  [168,198) 0.15530303
9  [198,245) 0.11111111
10 [245,308) 0.10700389
11 [308,588] 0.08730159

+------------------------+
| BAD RATE by TOT_INCOME |
+------------------------+
           LEVEL      RATE
1 [   0,   2570) 0.2498715
2 [2570,   4510) 0.2034068
3 [4510,8147167] 0.1602327

+--------------------+
| BAD RATE by TOT_TR |
+--------------------+
    LEVEL      RATE
1 [ 0,12) 0.2672370
2 [12,22) 0.1827676
3 [22,77] 0.1422764

Written by statcompute

May 4, 2013 at 4:33 pm

A SAS Macro for Scorecard Evaluation with Weights

On 09/28/2012, I posted a SAS macro evaluation the scorecard performance, e.g. KS / AUC statistics (https://statcompute.wordpress.com/2012/09/28/a-sas-macro-for-scorecard-performance-evaluation). However, this macro is not generic enough to handle cases with a weighting variable. In a recent project that I am working on, there is a weight variable attached to each credit applicant due to the reject inference. Therefore, there is no excuse for me to continue neglecting the necessity of developing another SAS macro that can take care of the weight variable with any positive values in the scorecard evaluation. Below is a quick draft of the macro. You might have to tweak it a little to suit your needs in the production.

%macro wt_auc(data = , score = , y = , weight = );
***********************************************************;
* THE MACRO IS TO EVALUATE THE SEPARATION POWER OF A      *;
* SCORECARD WITH WEIGHTS                                  *;
* ------------------------------------------------------- *;
* PARAMETERS:                                             *;
*  DATA  : INPUT DATASET                                  *;
*  SCORE : SCORECARD VARIABLE                             *;
*  Y     : RESPONSE VARIABLE IN (0, 1)                    *;
*  WEIGHT: WEIGHT VARIABLE WITH POSITIVE VALUES           *; 
* ------------------------------------------------------- *;
* OUTPUTS:                                                *;
*  A SUMMARY TABLE WITH KS AND AUC STATISTICS             *;
* ------------------------------------------------------- *;
* CONTACT:                                                *;
*  WENSUI.LIU@53.COM                                      *;
***********************************************************;
options nocenter nonumber nodate mprint mlogic symbolgen
        orientation = landscape ls = 125 formchar = "|----|+|---+=|-/\<>*";

data _tmp1 (keep = &score &y &weight);
  set &data;
  where &score ~= . and &y in (0, 1) and &weight >= 0;
run;

*** CAPTURE THE DIRECTION OF SCORE ***;
ods listing close;
ods output spearmancorr = _cor;
proc corr data = _tmp1 spearman;
  var &y;
  with &score;
run;
ods listing;

data _null_;
  set _cor;
  if &y >= 0 then do;
    call symput('desc', 'descending');
  end;
  else do;
    call symput('desc', ' ');
  end;
run;

proc sql noprint;
create table
  _tmp2 as    
select
  &score                                         as _scr,
  sum(&y)                                        as _bcnt,
  count(*)                                       as _cnt,
  sum(case when &y = 1 then &weight else 0 end)  as _bwt,
  sum(case when &y = 0 then &weight else 0 end)  as _gwt
from
  _tmp1
group by
  &score;

select
  sum(_bwt) into :bsum
from
  _tmp2;
  
select
  sum(_gwt) into :gsum
from
  _tmp2;

select
  sum(_cnt) into :cnt
from
  _tmp2;    
quit;
%put &cnt;

proc sort data = _tmp2;
  by &desc _scr;
run;

data _tmp3;
  set _tmp2;
  by &desc _scr;
  retain _gcum _bcum _cntcum;
  _gcum + _gwt;
  _bcum + _bwt;
  _cntcum + _cnt;
  _gpct = _gcum / &gsum;
  _bpct = _bcum / &bsum;
  _ks   = abs(_gpct - _bpct) * 100;
  _rank = int(_cntcum / ceil(&cnt / 10)) + 1;
run;

proc sort data = _tmp3 sortsize = max;
  by _gpct _bpct;
run;

data _tmp4;
  set _tmp3;
  by _gpct _bpct;
  if last._gpct then do;
    _idx + 1;
    output;
  end;
run;

proc sql noprint;
create table
  _tmp5 as
select
  a._gcum as gcum,
  (b._gpct - a._gpct) * (b._bpct + a._bpct) / 2 as dx
from
  _tmp4 as a, _tmp4 as b
where
  a._idx + 1 = b._idx;

select
  sum(dx) format = 15.8 into :AUC
from
  _tmp5;

select
  max(_ks) format = 15.8 into :KS_STAT
from
  _tmp3;

select
  _scr format = 6.2 into :KS_SCORE
from
  _tmp3
where
  _ks = (select max(_ks) from _tmp3);

create table
  _tmp6 as
select
  _rank                       as rank,
  min(_scr)                   as min_scr,
  max(_scr)                   as max_scr,
  sum(_cnt)                   as cnt,
  sum(_gwt + _bwt)            as wt,
  sum(_gwt)                   as gwt,
  sum(_bwt)                   as bwt,
  sum(_bwt) / calculated wt   as bad_rate
from
  _tmp3
group by
  _rank;    
quit;  

proc report data = _last_ spacing = 1 split = "/" headline nowd;
  column("GOOD BAD SEPARATION REPORT FOR %upcase(%trim(&score)) IN DATA %upcase(%trim(&data))/
          MAXIMUM KS = %trim(&ks_stat) AT SCORE POINT %trim(&ks_score) and AUC STATISTICS = %trim(&auc)/ /"
          rank min_scr max_scr cnt wt gwt bwt bad_rate);
  define rank       / noprint order order = data;
  define min_scr    / "MIN/SCORE"             width = 10 format = 9.2        analysis min center;
  define max_scr    / "MAX/SCORE"             width = 10 format = 9.2        analysis max center;
  define cnt        / "RAW/COUNT"             width = 10 format = comma9.    analysis sum;
  define wt         / "WEIGHTED/SUM"          width = 15 format = comma14.2  analysis sum;
  define gwt        / "WEIGHTED/GOODS"        width = 15 format = comma14.2  analysis sum;
  define bwt        / "WEIGHTED/BADS"         width = 15 format = comma14.2  analysis sum;
  define bad_rate   / "BAD/RATE"              width = 10 format = percent9.2 order center;
  rbreak after / summarize dol skip;
run;

proc datasets library = work nolist;
  delete _tmp: / memtype = data;
run;
quit;

***********************************************************;
*                     END OF THE MACRO                    *;
***********************************************************;
%mend wt_auc;

In the first demo below, a weight variable with fractional values is tested.

*** TEST CASE OF FRACTIONAL WEIGHTS ***;
data one;
  set data.accepts;
  weight = ranuni(1);
run;

%wt_auc(data = one, score = bureau_score, y = bad, weight = weight);
/*
                   GOOD BAD SEPARATION REPORT FOR BUREAU_SCORE IN DATA ONE
       MAXIMUM KS = 34.89711721 AT SCORE POINT 678.00 and AUC STATISTICS = 0.73521009

    MIN        MAX            RAW        WEIGHTED        WEIGHTED        WEIGHTED    BAD
   SCORE      SCORE         COUNT             SUM           GOODS            BADS    RATE
 -------------------------------------------------------------------------------------------
    443.00     619.00         539          276.29          153.16          123.13   44.56%
    620.00     644.00         551          273.89          175.00           98.89   36.11%
    645.00     660.00         544          263.06          176.88           86.18   32.76%
    661.00     676.00         555          277.26          219.88           57.38   20.70%
    677.00     692.00         572          287.45          230.41           57.04   19.84%
    693.00     707.00         510          251.51          208.25           43.26   17.20%
    708.00     724.00         576          276.31          243.89           32.42   11.73%
    725.00     746.00         566          285.53          262.73           22.80    7.98%
    747.00     772.00         563          285.58          268.95           16.62    5.82%
    773.00     848.00         546          272.40          264.34            8.06    2.96%
 ========== ========== ========== =============== =============== ===============
    443.00     848.00       5,522        2,749.28        2,203.49          545.79
*/

In the second demo, a weight variable with positive integers is also tested.

*** TEST CASE OF INTEGER WEIGHTS ***;
data two;
  set data.accepts;
  weight = rand("poisson", 20);
run;

%wt_auc(data = two, score = bureau_score, y = bad, weight = weight);
/*
                   GOOD BAD SEPARATION REPORT FOR BUREAU_SCORE IN DATA TWO
       MAXIMUM KS = 35.58884479 AT SCORE POINT 679.00 and AUC STATISTICS = 0.73725030

    MIN        MAX            RAW        WEIGHTED        WEIGHTED        WEIGHTED    BAD
   SCORE      SCORE         COUNT             SUM           GOODS            BADS    RATE
 -------------------------------------------------------------------------------------------
    443.00     619.00         539       10,753.00        6,023.00        4,730.00   43.99%
    620.00     644.00         551       11,019.00        6,897.00        4,122.00   37.41%
    645.00     660.00         544       10,917.00        7,479.00        3,438.00   31.49%
    661.00     676.00         555       11,168.00        8,664.00        2,504.00   22.42%
    677.00     692.00         572       11,525.00        9,283.00        2,242.00   19.45%
    693.00     707.00         510       10,226.00        8,594.00        1,632.00   15.96%
    708.00     724.00         576       11,497.00       10,117.00        1,380.00   12.00%
    725.00     746.00         566       11,331.00       10,453.00          878.00    7.75%
    747.00     772.00         563       11,282.00       10,636.00          646.00    5.73%
    773.00     848.00         546       10,893.00       10,598.00          295.00    2.71%
 ========== ========== ========== =============== =============== ===============
    443.00     848.00       5,522      110,611.00       88,744.00       21,867.00
*/

As shown, the macro works well in both cases. Please feel free to let me know if it helps in your cases.

Written by statcompute

May 4, 2013 at 12:22 pm