Identification of Key Drivers of Net Promoter Score Using a Statistical Classification Model

managerial


Introduction
Net Promoter Score (NPS) is a popular metric used in a variety of industries for measuring customer advocacy.Introduced by Reichheld (2003), NPS measures the likelihood that an existing customer will recommend a company to another prospective customer.NPS is derived from a single question that may be included as part of a larger customer survey.The single question asks the customer to use a scale of 0 to 10 to rate their willingness and intention to recommend the company to another person.Ratings of 9 and 10 are used to characterize so-called 'promoters,' ratings of 0 through 6 characterize 'detractors,' and ratings of 7 and 8 characterize 'passives.'The NPS is calculated as the percentage of respondents that are promoters minus the percentage of respondents that are detractors.The idea behind the labels given to customers is as follows.Promoters are thought to be extremely satisfied customers that see little to no room for improvement, and consequently would offer persuasive recommendations that could lead to new revenue.The passive ratings, on the other hand, begin to hint at room for improvement and consequently the effectiveness of a recommendation from a Passive may be muted by explicit or implied caveats.Ratings at the low end are thought to be associated with negative experiences that might cloud a recommendation and likely scare off prospective new customers.Additional discussion on the long history of NPS can be found in Hayes (2008).Some implementations of NPS methodology use reduced 5-point or 7-point scales that align with traditional Likert scales.However it is implemented, the hope is that movements in NPS are positively correlated with revenue growth for the company.While Reichheld's research presented some evidence of that, other findings are not as corroborative (Kenningham et al., 2007).Regardless of whether there is a predictive relationship between NPS and revenue growth, implementing policies and programs within a company that improve NPS is an intuitively sensible thing to do [see, for example, Vavra (1997)].A difficult and important question, however, is how to identify key drivers of NPS.Calculating NPS alone does not do this.This chapter is an illustrative tutorial that demonstrates how a statistical classification model can be used to identify key drivers of NPS.Our premise is that the classification model, the data it operates on, and the analyses it provides could usefully form components of a Decision Support System that can not only provide both snapshot and longitudinal analyses of NPS performance, but also enable analyses that can help suggest company initiatives aimed toward lifting the NPS.We assume that the NPS question was asked as part of larger survey that also probed customer satisfaction levels with respect to various dimensions of the company's services.We develop a predictive classification model for customer advocacy (promoter, passive or detractor) as a function of these service dimensions.A novelty associated with our classification model is the optional use of constraints on the parameter estimates to enforce a monotonic property.We provide a detailed explanation of how to fit the model using the SAS software package and show how the fitted model can be used to develop company policies that have promise for improving the NPS.Our primary objective is to teach an interested practitioner how to use customer survey data together with a statistical classifier to identify key drivers of NPS.We present a case study that is based on a real-life data collection and analysis project to illustrate the step-by-step process of building the linkage between customer satisfaction data and NPS.

Logistic regression
In this section we provide a brief review of logistic and multinomial regression.Allen and Rao (2000) is a good reference that contains more detail than we provide, and additionally has example applications pertaining to customer satisfaction modeling.

Binomial logistic regression
The binomial logistic regression model assumes that the response variable is binary (0/1).This could be the case, for example, if a customer is simply asked the question "Would you recommend us to a friend?"Let 1 {} n ii Y  denote the responses from n customers, assigning a "1" for Yes and "0" for No. Suppose a number of other data items (covariates) are polled from the customer on the same survey instrument.These items might measure the satisfaction of the customer across a wide variety of service dimensions and might be measured on a traditional Likert scale.We let i x  denote the vector of covariates for the i-th sampled customer and note that it reflects the use of dummy variable coding for covariates that are categorical scale.For example, if the first covariate is measured on a 5-point Likert scale, its value is encoded into i x  by using five dummy variables and the maximum likelihood estimate (MLE) of ( , )     , that maximizes this function.Once the MLE is available, the influence of the covariates can be assessed by the magnitude (relative to their standard error) of the individual slopes.In particular, it can be ascertained which attributes of customer service have a substantial effect on making the probability of a 'Yes' response high.

Multinomial logistic regression
and the MLE of ( , ) , that maximizes this function.Once the MLE is available, the magnitudes of the slope estimates (relative to their standard errors) can be used to identify the covariates that push the distribution of the response towards 9s and 10s.We note that the constraint on the intercepts is a standard constraint.In the next section, we will discuss an additional and novel constraint that can optionally be imposed on the slope parameters.

Case study
Carestream Health, Inc. (CSH) was formed in 2007 when Onex Corporation of Toronto, Canada purchased Eastman Kodak Company's Health Group and renamed the business as Carestream Health.They are an annual $2.5B company and a world leader in medical imaging (digital and film), healthcare information systems, dental imaging and dental practice management software, molecular imaging and non-destructive testing.Their customers include medical and dental doctors and staff and healthcare IT professionals in small offices and clinics to large hospitals and regional and national healthcare programs.A major company initiative is to create a sustainable competitive advantage by delivering the absolute best customer experience in the industry.Customer recommendations are key to growth in the digital medical space and no one has been able to do it consistently well.The foundation for taking advantage of this opportunity is to understand what's important to customers, measure their satisfaction and likelihood to recommend based on their experiences, and drive improvement.While descriptive statistics such as trend charts, bar charts, averages and listings of customer verbatim comments are helpful in identifying improvement opportunities to improve the Net Promoter Score (NPS), they are limited in their power.First, they lack quantitative measurements of correlation between elements of event satisfaction and NPS.As a consequence, it is not clear what impact a given process improvement will have on a customer's likelihood to recommend.Second, they lack the ability to view multidimensional relationships -they are limited to single factor inferences which may not sufficiently describe the complex relationships between elements of a customer's experience and their likelihood to recommend.This section summarizes the use of multinomial logistic regression analyses that were applied to 5056 independent customer experience surveys from Jan 2009 -Jan 2010.Each survey included a question that measured (on a 5-point Likert scale) how likely it would be for the customer to recommend colleagues to purchase imaging solutions from CSH.Five other questions measured the satisfaction level (on a 7-point Likert scale) of the customer with CSH services obtained in response to an equipment or software problem.Key NPS drivers are revealed through the multinomial logistic regression analyses, and improvement scenarios for specific geographic and business combinations are mapped out.The ability to develop a quantitative model to measure the impact on NPS of potential process improvements significantly enhances the value of the survey data.

CSH Customer survey data
The 5-point Likert response to the question about willingness to recommend is summarized in Table 1 below.CSH calculates a unique net promoter score from responses on this variable using the formula , where ( 1.25, 0.875, 0.25,0.75,1.0) is a vector of weights and where ˆi p is the estimated proportion of customers whose recommendation score is i.Two interesting characteristics of the weight vector are, first, the penalty for a 1 (2) exceeds the benefit of a 5 (4), and second, the negative weight for a neutral score is meant to drive policies toward delighting customers.

Recommendation Interpretation 1
Without being asked, I will advise others NOT to purchase from you 2 Only if asked, I will advise others NOT to purchase from you 3 I am neutral 4 Only if asked, I will recommend others TO purchase from you 5 Without being asked, I will recommend others TO purchase from you The customer satisfaction covariates are also coded using the dummy variable scheme.The data on these covariates are the responses to the survey questions identified as q79, q82a, q82b, q82d and q82f.These questions survey the customer satisfaction on 'Overall satisfaction with the service event,' 'Satisfaction with CSH knowledge of customer business and operations,' 'Satisfaction with meeting customer service response time requirements,' 'Satisfaction with overall service communications,' and 'Satisfaction with skills of CSH employees,' respectively.Survey questions q82c and q82e, which survey satisfaction with 'Time it took to resolve the problem once work was started,' and 'Attitude of CSH employees' were also considered as covariates, but they did not show themselves to be statistically significant in the model.
Their absence from the model does not necessarily imply they are not important drivers of their overall satisfaction with CSH, but more likely that their influence is correlated with the other dimensions of overall satisfaction that are in the model.Each customer satisfaction covariate is scored by customers using a 7-point Likert scale (where '1' indicates the customer is "extremely dissatisfied" and '7' indicates "extremely satisfied"), and thus each utilizes 7 dummy variables in the coding scheme.We denote these dummy variables as {8 2 d} ii q  , and 7 1 {8 2 f} ii q  , respectively, and they are defined as follows: 1 if customer response to q79 is 79 0 otherwise , Assembling all of the covariates together, we then have a total of 77 covariates in x  .Thus, {} ii   , the model we have developed has a total of 81 parameters.We note it is conceivable that interactions between the defined covariates could be important contributors to the model.However, interaction effects based on the current data set were difficult to assess because of confounding issues.As the data set gets larger over time, it is conceivable the confounding issues could be resolved and interaction effects could be tested for statistical significance.

Model fitting and interpretation
The SAS code for obtaining maximum likelihood estimates (MLEs) for the model parameters Lines 1-4 are used to read in the data that is stored as a space delimited text file 'indata.txt' that is located in the indicated directory.All of the input variables on the file are coded as integer values.The PROC LOGISTIC section of the code (lines 5-10) directs the fitting of the multinomial logistic regression model.The class statement is used to specify that all of the covariate variables are categorical in nature, and the param=glm option specifies to use the dummy variable coding scheme that was defined in the previous section.{} ii   and   The section of the PROC LOGISTIC output entitled 'Type-3 Analysis of Effects' characterizes the statistical significance of the covariates through p-values obtained by referencing a Wald chi-square test statistic to a corresponding null chi-square distribution.Table 3 shows the chi-square tests and the corresponding p-values, and it is seen that all covariate groups are highly significant contributors in the model.One way to assess model adequacy for multinomial logistic regression is to use the model to predict Y and then examine how well the predicted values match the true values of Y. Since the output of the model for each customer is an estimated probability distribution for Y, a natural predictor of Y is the mode of this distribution.We note that this predictor considers equal cost for all forms of prediction errors.More elaborate predictors could be derived by assuming a more complex cost model where, for example, the cost of predicting 5 when the actual value is 1 is higher than the cost of predicting 5 when the actual value is 4. Table 4, the so-called confusion matrix of the predictions, displays the cross classification of all 5056 customers based on their actual value of Y and the model-predicted value of Y. Figure 1 is a graphical display of the slopes for each of the customer satisfaction covariates.The larger the coefficient value, the more detrimental the response level is to NPS.The yaxis is therefore labeled as 'demerits.'In view of the ordinal nature of the customer satisfaction covariates, the slopes, which represent the effect of the Likert scale levels, should decrease monotonically.That is, the penalty for a 'satisfied' covariate value should be less than or equal to that of a 'dissatisfied' covariate value.As such, it would be logical to have the estimated values of the slopes display the monotone decreasing trend as the response level of the covariates ascends.
Figure 1 shows that the unconstrained MLEs for the slopes associated with the customer satisfaction covariates nearly satisfy the desired monotone property, but not exactly.The aberrations are due to data deficiencies or minor model inadequacies and can be resolved by using a constrained logistic regression model introduced in the next section.

Constrained logistic regression
Consider the situation where the i-th covariate is ordinal in nature, perhaps because it is measured on a k-point Likert scale.The CSH data is a good illustration of this situation, since all the customer satisfaction covariates are ordinal variables measured on 7-point Likert scale.Let the corresponding group of k slopes for this covariate be denoted by In order to reflect the information that the covariates are ordered, it is quite natural www.intechopen.com Efficient Decision Support Systems -Practice and Challenges From Current to Future 154 to impose the monotone constraint     12 ii i k onto the parameter space.Adding these constraints when finding the MLEs complicates the required maximization of the likelihoods in (1) and (3).In this section, however, we will show how this can be done using SAS with PROC NLP.In order to simplify our use of PROC NLP, it is convenient to work with a full-rank parameterization of the logistic regression model.Because countries are nested within regions, a linear dependency exists between the dummy variables corresponding to regions and countries within regions.We can eliminate the linear dependency by removing region from the model and specifying country to be non-nested factor.The result of this model reparameterization is that instead of 6 degrees of freedom in the model for regions and 16 degrees of freedom for countries nested within regions, we equivalently have 22 degrees of freedom for countries.For the same purpose, we also redefine the dummy variable coding used for other categorical and ordinal covariates by using a full rank parameterization scheme.In particular, we use k-1 dummy variables (rather than k) to represent a k-level categorical or ordinal variable.With the full rank parameterization, the highest level of customer satisfaction has a slope parameter that is fixed to be 0. Lines 3-10 in the SAS code shown in Appendix B are used to set up the full rank parameterization of the logistic regression model.Beginning with line 12 in the SAS code, PROC NLP is used to derive the MLEs of the parameters under the constrained parameter space.The 'max' statement (line 13) indicates the objective function is the log-likelihood function of the model and that it is to be maximized.The maximization is carried out using a Newton-Raphson algorithm, and the 'parms' statement (line 14) specifies initial values for the intercept and slope parameters.The SAS variables bqj, baj, bbj, bdj and bfj are used to symbolize the slope parameters corresponding to the j-th response level of the customer satisfaction covariates q79, q82a, q82b, q82d and q82f.Similarly, bccj, bbcj, and bjj are used to denote the slopes associated with different countries, business codes and job titles.The 'bounds' and 'lincon' statements (lines 15-21) jointly specify the monotone constraints associated with the intercept parameters and the slopes of the customer satisfaction covariates.Lines 22-29 define the log likelihood for each customer which, for the i-th customer, is given by 5 1 loglik ( , ) log ()  2 is a plot that shows the monotone behavior of the constrained estimates.There is very little difference between the unconstrained and constrained MLEs for the demographic covariates.Recall that for the unconstrained MLEs, the zero for the slope of the last level of each covariate is a structural zero resulting from the non-full rank dummy variable coding used when fitting the model.
In the case of the constrained MLEs, the slopes of the last levels of the covariates are implied zeros resulting from the full-rank dummy variable coding used when fitting the model.Table 5 shows that incorporating the constraints do not lead to a substantial change in the estimated slopes.In an indirect way, this provides a sanity check of the proposed model.We will use the constrained estimates for the remainder of the case study.

Targeted Pathways for
Consider now filling out the covariate vector x  with the sample frequencies for the observed demographic covariates and with the observed sample distributions for the subelement covariates.Using this x  with the model yields a predicted NPS of 65.7%.The close agreement between the data-based and model-based NPS scores is additional evidence that the model fits the data well, and it also instills confidence in using the model to explore "What If?" scenarios as outlined in Figure 3. Figure 3 defines sixteen "What If?" scenarios, labels them with brief descriptions, and then shows expected NPS score if the scenario is implemented.Table 6 contains a longer description of how each scenario was implemented.Each scenario can be evaluated on the basis of how much boost it gives to the expected NPS as well as the feasibility of establishing a company program that could make the hypothetical scenario real.We illustrated potential pathways to improve the overall NPS score, but this can also be done with specific sub-populations in mind.For example, if the first region was under study, then one could simply adjust the demographic covariates as illustrated in section 3.4.2before implementing scenarios adjustments.a k e a l l q 8 2 a s q 8 2 f vector of slopes   in the link equations has dimension 77 1  .Combined with the

Fig. 2 .
Fig. 2. Constrained MLEs of Slopes for 7-Point Likert Scale Customer Satisfaction Covariates Improving NPSA purely empirical way to compute NPS is to use the observed distribution (based on all 5,056 survey responses) of Y for p 

Table 1 .
Meaning of Each Level of Recommendation Score  values, the end result is a model that links NPS to what customers perceive to be important.This linkage can then be exploited to determine targeted pathways to improve NPS via improvement plans that are customer-driven.The demographic covariates include the (global) region code, country code, business code and the customer job title.The demographic covariates are coded using the standard dummy variable coding technique.For example, region code utilizes 7 binary variables www.intechopen.com

Table 4 .
Confusion Matrix of Multinomial Logistic Regression ModelA perfect model would have a confusion matrix that is diagonal indicating the predicted value for each customer coincided identically with the true value.Consider the rows of Table4corresponding to Y=4 and Y=5.These two rows account for almost 80% of the customers in the sample.It can be seen that in both cases, the predicted value coincides with the actual value about 60% of the time.Neither of these two cases predicts Y=1 or Y=2, and only 4% of the time is Y=3 predicted.The mean values of the predicted Y when Y=4 and Y=5 are 4.28 and 4.59, respectively.The 7% positive bias for the case Y=4 is roughly offset by the 11.8% negative bias for the case Y=5.Looking at the row of Table4corresponding to Y=3, we see that 86% of the time the predicted Y is within 1 of the actual Y.The mean value of the predicted Y is 3.77, indicating a 26% positive bias.Considering the rows corresponding to Y=1 and Y=2, where only about 2% of the customers reside, we see the model struggles to make accurate predictions, often over-estimating the actual value of Y.A hint as to the explanation for the noticeable overestimation associated with the Y=1, Y=2 and Y=3 customers is revealed by examining their responses to the covariate questions.As just one example, the respective mean scores on question q79 ("Overall satisfaction with the service event") are 3.8, 4.1 and 5.2.It seems a relatively large number of customers that give a low response to Y are inclined to simultaneously give favorable responses to the covariate questions on the survey.Although this might be unexpected, it can possibly be explained by the fact that the covariate questions are relevant to the most recent service event whereas Y is based on a customer's cumulative experience.Overall, Table4reflects significant lift afforded by the multinomial logistic regression model for predicting Y.For example, a model that utilized no covariate information would have a confusion matrix whose rows were constant, summing to the row total.In sum, we feel the accuracy of the model is sufficient to learn something about what drives customers to give high responses to Y, though perhaps not sufficient to learn as much about what drives customers to give low responses to Y.

Table 5 .
Unconstrained and Constrained Slope MLEs of Customer Satisfaction CovariatesTable5provides a side-by-side comparison of the constrained and unconstrained MLEs for the slopes of the customer satisfaction covariates, and Figure

Table 2 ,
and evaluating the equations in (2) gives the probability distribution for Y for this population profile as: Efficient Decision Support Systems -Practice and Challenges From Current to Future www.intechopen.com