Yet Another Blog in Statistical Computing

I can calculate the motion of heavenly bodies but not the madness of people. -Isaac Newton

Modeling Severity in Operational Losses with Python

When modeling severity measurements in the operational loss with Generalized Linear Models, we might have a couple choices based on different distributional assumptions, including Gamma, Inverse Gaussian, and Lognormal. However, based on my observations from the empirical work, the differences in parameter estimates among these three popular candidates are rather immaterial from the practical standpoint.

Below is a demonstration showing how to model the severity with the insurance data under aforementioned three distributions. As shown, albeit with inferential differences, three models show similar coefficients.

In [1]: # LOAD PACKAGES

In [2]: import pandas as pd

In [3]: import numpy as np

In [4]: import statsmodels.api as sm

In [5]: import statsmodels.formula.api as smf

In [6]: df = pd.read_csv("AutoCollision.csv")

In [7]: df.head()
Out[7]:
     Age Vehicle_Use  Severity  Claim_Count
0  17-20    Pleasure    250.48           21
1  17-20  DriveShort    274.78           40
2  17-20   DriveLong    244.52           23
3  17-20    Business    797.80            5
4  21-24    Pleasure    213.71           63

In [8]: # FIT A GAMMA REGRESSION

In [9]: gamma = smf.glm(formula = "Severity ~ Age + Vehicle_Use", data = df, family = sm.families.Gamma(sm.families.links.log))

In [10]: gamma.fit().summary()
Out[10]:
<class 'statsmodels.iolib.summary.Summary'>
"""
                 Generalized Linear Model Regression Results
==============================================================================
Dep. Variable:               Severity   No. Observations:                   32
Model:                            GLM   Df Residuals:                       21
Model Family:                   Gamma   Df Model:                           10
Link Function:                    log   Scale:                 0.0299607547345
Method:                          IRLS   Log-Likelihood:                -161.35
Date:                Sun, 06 Dec 2015   Deviance:                      0.58114
Time:                        12:59:17   Pearson chi2:                    0.629
No. Iterations:                     8
=============================================================================================
                                coef    std err          z      P>|z|      [95.0% Conf. Int.]
---------------------------------------------------------------------------------------------
Intercept                     6.2413      0.101     61.500      0.000         6.042     6.440
Age[T.21-24]                 -0.2080      0.122     -1.699      0.089        -0.448     0.032
Age[T.25-29]                 -0.2303      0.122     -1.881      0.060        -0.470     0.010
Age[T.30-34]                 -0.2630      0.122     -2.149      0.032        -0.503    -0.023
Age[T.35-39]                 -0.5311      0.122     -4.339      0.000        -0.771    -0.291
Age[T.40-49]                 -0.3820      0.122     -3.121      0.002        -0.622    -0.142
Age[T.50-59]                 -0.3741      0.122     -3.057      0.002        -0.614    -0.134
Age[T.60+]                   -0.3939      0.122     -3.218      0.001        -0.634    -0.154
Vehicle_Use[T.DriveLong]     -0.3573      0.087     -4.128      0.000        -0.527    -0.188
Vehicle_Use[T.DriveShort]    -0.5053      0.087     -5.839      0.000        -0.675    -0.336
Vehicle_Use[T.Pleasure]      -0.5886      0.087     -6.801      0.000        -0.758    -0.419
=============================================================================================
"""

In [11]: # FIT A INVERSE GAUSSIAN REGRESSION

In [12]: igauss = smf.glm(formula = "Severity ~ Age + Vehicle_Use", data = df, family = sm.families.InverseGaussian(sm.families.links.log))

In [13]: igauss.fit().summary()
Out[13]:
<class 'statsmodels.iolib.summary.Summary'>
"""
                 Generalized Linear Model Regression Results
==============================================================================
Dep. Variable:               Severity   No. Observations:                   32
Model:                            GLM   Df Residuals:                       21
Model Family:         InverseGaussian   Df Model:                           10
Link Function:                    log   Scale:               8.73581523073e-05
Method:                          IRLS   Log-Likelihood:                -156.44
Date:                Sun, 06 Dec 2015   Deviance:                    0.0015945
Time:                        13:01:14   Pearson chi2:                  0.00183
No. Iterations:                     7
=============================================================================================
                                coef    std err          z      P>|z|      [95.0% Conf. Int.]
---------------------------------------------------------------------------------------------
Intercept                     6.1776      0.103     59.957      0.000         5.976     6.379
Age[T.21-24]                 -0.1475      0.116     -1.269      0.204        -0.375     0.080
Age[T.25-29]                 -0.1632      0.116     -1.409      0.159        -0.390     0.064
Age[T.30-34]                 -0.2079      0.115     -1.814      0.070        -0.433     0.017
Age[T.35-39]                 -0.4732      0.108     -4.361      0.000        -0.686    -0.261
Age[T.40-49]                 -0.3299      0.112     -2.954      0.003        -0.549    -0.111
Age[T.50-59]                 -0.3206      0.112     -2.866      0.004        -0.540    -0.101
Age[T.60+]                   -0.3465      0.111     -3.115      0.002        -0.565    -0.128
Vehicle_Use[T.DriveLong]     -0.3334      0.084     -3.992      0.000        -0.497    -0.170
Vehicle_Use[T.DriveShort]    -0.4902      0.081     -6.055      0.000        -0.649    -0.332
Vehicle_Use[T.Pleasure]      -0.5743      0.080     -7.206      0.000        -0.731    -0.418
=============================================================================================
"""

In [14]: # FIT A LOGNORMAL REGRESSION

In [15]: df['Log_Severity'] = np.log(df.Severity)

In [16]: lognormal = smf.glm(formula = "Log_Severity ~ Age + Vehicle_Use", data = df, family = sm.families.Gaussian())

In [17]: lognormal.fit().summary()
Out[17]:
<class 'statsmodels.iolib.summary.Summary'>
"""
                 Generalized Linear Model Regression Results
==============================================================================
Dep. Variable:           Log_Severity   No. Observations:                   32
Model:                            GLM   Df Residuals:                       21
Model Family:                Gaussian   Df Model:                           10
Link Function:               identity   Scale:                 0.0265610360381
Method:                          IRLS   Log-Likelihood:                 19.386
Date:                Sun, 06 Dec 2015   Deviance:                      0.55778
Time:                        13:02:12   Pearson chi2:                    0.558
No. Iterations:                     4
=============================================================================================
                                coef    std err          z      P>|z|      [95.0% Conf. Int.]
---------------------------------------------------------------------------------------------
Intercept                     6.1829      0.096     64.706      0.000         5.996     6.370
Age[T.21-24]                 -0.1667      0.115     -1.447      0.148        -0.393     0.059
Age[T.25-29]                 -0.1872      0.115     -1.624      0.104        -0.413     0.039
Age[T.30-34]                 -0.2163      0.115     -1.877      0.061        -0.442     0.010
Age[T.35-39]                 -0.4901      0.115     -4.252      0.000        -0.716    -0.264
Age[T.40-49]                 -0.3347      0.115     -2.904      0.004        -0.561    -0.109
Age[T.50-59]                 -0.3267      0.115     -2.835      0.005        -0.553    -0.101
Age[T.60+]                   -0.3467      0.115     -3.009      0.003        -0.573    -0.121
Vehicle_Use[T.DriveLong]     -0.3481      0.081     -4.272      0.000        -0.508    -0.188
Vehicle_Use[T.DriveShort]    -0.4903      0.081     -6.016      0.000        -0.650    -0.331
Vehicle_Use[T.Pleasure]      -0.5726      0.081     -7.027      0.000        -0.732    -0.413
=============================================================================================
"""
 
Advertisements

Written by statcompute

December 6, 2015 at 2:50 pm

%d bloggers like this: