Poisson and Negative Binomial regressions are two popular approaches to model frequency measures in the operational loss and can be implemented in Python with the statsmodels package as below:
In [1]: import pandas as pd In [2]: import statsmodels.api as sm In [3]: import statsmodels.formula.api as smf In [4]: df = pd.read_csv("AutoCollision.csv") In [5]: # FITTING A POISSON REGRESSION In [6]: poisson = smf.glm(formula = "Claim_Count ~ Age + Vehicle_Use", data = df, family = sm.families.Poisson(sm.families.links.log)) In [7]: poisson.fit().summary() Out[7]: <class 'statsmodels.iolib.summary.Summary'> """ Generalized Linear Model Regression Results ============================================================================== Dep. Variable: Claim_Count No. Observations: 32 Model: GLM Df Residuals: 21 Model Family: Poisson Df Model: 10 Link Function: log Scale: 1.0 Method: IRLS Log-Likelihood: -204.40 Date: Tue, 08 Dec 2015 Deviance: 184.72 Time: 20:31:27 Pearson chi2: 184. No. Iterations: 9 ============================================================================================= coef std err z P>|z| [95.0% Conf. Int.] --------------------------------------------------------------------------------------------- Intercept 2.3702 0.110 21.588 0.000 2.155 2.585 Age[T.21-24] 1.4249 0.118 12.069 0.000 1.193 1.656 Age[T.25-29] 2.3465 0.111 21.148 0.000 2.129 2.564 Age[T.30-34] 2.5153 0.110 22.825 0.000 2.299 2.731 Age[T.35-39] 2.5821 0.110 23.488 0.000 2.367 2.798 Age[T.40-49] 3.2247 0.108 29.834 0.000 3.013 3.437 Age[T.50-59] 3.0019 0.109 27.641 0.000 2.789 3.215 Age[T.60+] 2.6391 0.110 24.053 0.000 2.424 2.854 Vehicle_Use[T.DriveLong] 0.9246 0.036 25.652 0.000 0.854 0.995 Vehicle_Use[T.DriveShort] 1.2856 0.034 37.307 0.000 1.218 1.353 Vehicle_Use[T.Pleasure] 0.1659 0.041 4.002 0.000 0.085 0.247 ============================================================================================= """ In [8]: # FITTING A NEGATIVE BINOMIAL REGRESSION In [9]: nbinom = smf.glm(formula = "Claim_Count ~ Age + Vehicle_Use", data = df, family = sm.families.NegativeBinomial(sm.families.links.log)) In [10]: nbinom.fit().summary() Out[10]: <class 'statsmodels.iolib.summary.Summary'> """ Generalized Linear Model Regression Results ============================================================================== Dep. Variable: Claim_Count No. Observations: 32 Model: GLM Df Residuals: 21 Model Family: NegativeBinomial Df Model: 10 Link Function: log Scale: 0.0646089484752 Method: IRLS Log-Likelihood: -198.15 Date: Tue, 08 Dec 2015 Deviance: 1.4436 Time: 20:31:27 Pearson chi2: 1.36 No. Iterations: 11 ============================================================================================= coef std err z P>|z| [95.0% Conf. Int.] --------------------------------------------------------------------------------------------- Intercept 2.2939 0.153 14.988 0.000 1.994 2.594 Age[T.21-24] 1.4546 0.183 7.950 0.000 1.096 1.813 Age[T.25-29] 2.4133 0.183 13.216 0.000 2.055 2.771 Age[T.30-34] 2.5636 0.183 14.042 0.000 2.206 2.921 Age[T.35-39] 2.6259 0.183 14.384 0.000 2.268 2.984 Age[T.40-49] 3.2408 0.182 17.760 0.000 2.883 3.598 Age[T.50-59] 2.9717 0.183 16.283 0.000 2.614 3.329 Age[T.60+] 2.6404 0.183 14.463 0.000 2.283 2.998 Vehicle_Use[T.DriveLong] 0.9480 0.128 7.408 0.000 0.697 1.199 Vehicle_Use[T.DriveShort] 1.3402 0.128 10.480 0.000 1.090 1.591 Vehicle_Use[T.Pleasure] 0.3265 0.128 2.548 0.011 0.075 0.578 ============================================================================================= """
Although Quasi-Poisson regressions is not currently supported by the statsmodels package, we are still able to estimate the model with the rpy2 package by using R in the back-end. As shown in the output below, parameter estimates in Quasi-Poisson model are identical to the ones in standard Poisson model. In case that we want a flexible model approach for frequency measures in the operational loss forecast without pursuing more complex Negative Binomial model, Quasi-Poisson regression can be considered a serious contender.
In [11]: # FITTING A QUASI-POISSON REGRESSION In [12]: import rpy2.robjects as ro In [13]: from rpy2.robjects import pandas2ri In [14]: pandas2ri.activate() In [15]: rdf = pandas2ri.py2ri_pandasdataframe(df) In [16]: qpoisson = ro.r.glm('Claim_Count ~ Age + Vehicle_Use', data = rdf, family = ro.r('quasipoisson(link = "log")')) In [17]: print ro.r.summary(qpoisson) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.3702 0.3252 7.288 3.55e-07 *** Age21-24 1.4249 0.3497 4.074 0.000544 *** Age25-29 2.3465 0.3287 7.140 4.85e-07 *** Age30-34 2.5153 0.3264 7.705 1.49e-07 *** Age35-39 2.5821 0.3256 7.929 9.49e-08 *** Age40-49 3.2247 0.3202 10.072 1.71e-09 *** Age50-59 3.0019 0.3217 9.331 6.42e-09 *** Age60+ 2.6391 0.3250 8.120 6.48e-08 *** Vehicle_UseDriveLong 0.9246 0.1068 8.660 2.26e-08 *** Vehicle_UseDriveShort 1.2856 0.1021 12.595 2.97e-11 *** Vehicle_UsePleasure 0.1659 0.1228 1.351 0.191016 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for quasipoisson family taken to be 8.774501) Null deviance: 6064.97 on 31 degrees of freedom Residual deviance: 184.72 on 21 degrees of freedom AIC: NA Number of Fisher Scoring iterations: 4