Unlocking Insights: Constructing a Scorecard with Logistic Regression | by Vassily Morozov | Feb, 2024


After a bank card? An insurance coverage coverage? Ever puzzled in regards to the three-digit quantity that shapes these choices?


Scores are utilized by a lot of industries to make choices. Monetary establishments and insurance coverage suppliers are utilizing scores to find out whether or not somebody is correct for credit score or a coverage. Some nations are even utilizing social scoring to find out a person’s trustworthiness and choose their behaviour.

For instance, earlier than a rating was used to make an automated resolution, a buyer would go right into a financial institution and communicate to an individual concerning how a lot they need to borrow and why they want a mortgage. The financial institution worker might impose their very own ideas and biases into their decision-making course of. The place is that this individual from? What are they carrying? Even, how do I really feel at this time?

A rating ranges the taking part in subject and permits everybody to be assessed on the identical foundation.

Generated by DeepAI picture generator

Not too long ago, I’ve been participating in a number of Kaggle competitions and analyses of featured datasets. The primary playground competitors of 2024 aimed to find out the probability of a buyer leaving a financial institution. This can be a widespread job that’s helpful for advertising departments. For this competitors, I believed I might put apart the tree-based and ensemble modelling strategies usually required to be aggressive in these duties, and return to the fundamentals: a logistic regression.

Right here, I’ll information you thru the event of the logistic regression mannequin, its conversion right into a rating, and its presentation as a scorecard. The intention of doing that is to point out how this could reveal insights about your information and its relationship to a binary goal. The benefit of this sort of mannequin is that it’s less complicated and simpler to elucidate, even to non-technical audiences.

My Kaggle pocket book with all my code and maths will be discovered here. This text will concentrate on the highlights.

What’s a Rating?

The rating we’re describing right here relies on a logistic regression mannequin. The mannequin assigns weights to our enter options and can output a likelihood that we are able to convert via a calibration step right into a rating. As soon as we’ve got this, we are able to characterize it with a scorecard: exhibiting how a person is scoring primarily based on their accessible information.

Let’s undergo a easy instance.

Mr X walks right into a financial institution searching for mortgage for a brand new enterprise. The financial institution makes use of a easy rating primarily based on revenue and age to find out whether or not the person needs to be authorised.

Mr X is a younger particular person with a comparatively low revenue. He’s penalised for his age, however scores effectively (second finest) within the revenue band. In complete, he scores 24 factors on this scorecard, which is a mid-range rating (the utmost variety of factors being 52).

A rating cut-off would usually be utilized by the financial institution to say what number of factors are wanted to be accepted primarily based on inside coverage. A rating relies on a logistic regression which is constructed on some binary definition, utilizing a set of options to foretell the log odds.

Within the case of a financial institution, the logistic regression could also be attempting to foretell those who have missed funds. For an insurance coverage supplier, those that have made a declare earlier than. For a social rating, those who have ever attended an anarchist gathering (not likely positive what these scores can be predicting however I might be fascinated to know!).

We won’t undergo every part required for a full mannequin improvement, however among the key steps that will probably be explored are:

  • Weights of Proof Transformation: Making our steady options discrete by banding them up as with the Mr X instance.
  • Calibrating our Logistic Regression Outputs to Generate a Rating: Making our likelihood right into a extra user-friendly quantity by changing it right into a rating.
  • Representing Our Rating as a Scorecard: Displaying how every function contributes to the ultimate rating.

Weights of Proof Transformation

Within the Mr X instance, we noticed that the mannequin had two options which had been primarily based on numeric values: the age and revenue of Mr X. These variables had been banded into teams to make it simpler to grasp the mannequin and what drives a person’s rating. Utilizing these steady variables straight (as oppose to inside a gaggle) might imply considerably totally different scores for small variations in values. Within the context of credit score or insurance coverage threat, this decides tougher to justify and clarify.

There are a number of the way to strategy the banding, however usually an preliminary automated strategy is taken, earlier than fine-tuning the groupings manually to make qualitative sense. Right here, I fed every steady function individually into a choice tree to get an preliminary set of groupings.

As soon as the groupings had been accessible, I calculated the weights of proof for every band. The formulation for that is proven under:

System for Weights of Proof (WoE). The distributions will be flipped to reverse the connection in your options.

This can be a generally used transformation approach in scorecard modelling the place a logistic regression is used given its linear relationship to the log odds, the factor that the logistic regression is aimed to foretell. I cannot go into the maths of this right here as that is lined in full element in my Kaggle notebook.

As soon as we’ve got the weights of proof for every banded function, we are able to visualise the development. From the Kaggle information used for financial institution churn prediction, I’ve included a few options for example the transformations.

Picture by creator

The crimson bars surrounding every weights of proof present a 95% confidence interval, implying we’re 95% positive that the weights of proof would fall inside this vary. Slender intervals are related to sturdy teams which have enough quantity to be assured within the weights of proof.

For instance, classes 16 and 22 of the grouped steadiness have low volumes of shoppers leaving the financial institution (19 and 53 circumstances in every group respectively) and have the widest confidence intervals.

The patterns reveal insights in regards to the function relationship and the prospect of a buyer leaving the financial institution. The age function is barely less complicated to grasp so we’ll sort out that first.

As a buyer will get older they’re extra prone to go away the financial institution.

The development is pretty clear and largely monotonic besides some teams, for instance 25–34 12 months previous people are much less prone to go away than 18–24 12 months previous circumstances. Except there’s a robust argument to assist why that is the case (area data comes into play!), we might take into account grouping these two classes to make sure a monotonic development.

A monotonic development is vital when making choices to grant credit score or an insurance coverage coverage as that is usually a regulatory requirement to make the fashions interpretable and never simply correct.

This brings us on to the steadiness function. The sample isn’t clear and we don’t have an actual argument to make right here. It does appear that clients with decrease balances have much less likelihood to go away the financial institution however you would wish to band a number of of the teams to make this development make any sense.

By grouping classes 2–9, 13–21 and leaving 22 by itself (into bins 1, 2 and three respectively) we are able to begin to see the development. Nevertheless, the down aspect of that is shedding granularity in our options and certain impacting downstream mannequin efficiency.

Picture by creator

For the Kaggle competitors, my mannequin didn’t must be explainable, so I didn’t regroup any of the options and simply targeted on producing essentially the most predictive rating primarily based on the automated groupings I utilized. In an business setting, I might imagine twice about doing this.

It’s price noting that our insights are restricted to the options we’ve got accessible and there could also be different underlying causes for the noticed behaviour. For instance, the age development might have been pushed by coverage modifications over time such because the transfer to on-line banking, however there is no such thing as a possible option to seize this within the mannequin with out extra information being accessible.

If you wish to carry out auto groupings to numeric options, apply this transformation and make these related graphs for yourselves, they are often created for any binary classification job utilizing the Python repository I put collectively here.

As soon as these options can be found, we are able to match a logistic regression. The fitted logistic regression can have an intercept and every function within the mannequin can have a coefficient assigned to it. From this, we are able to output the likelihood that somebody goes to go away the financial institution. I received’t spend time right here discussing how I match the regression, however as earlier than, all the small print can be found in my Kaggle notebook.

The fitted logistic regression can output a likelihood, nevertheless this isn’t notably helpful for non-technical customers of the rating. As such, we have to calibrate these chances and remodel them into one thing neater and extra interpretable.

Do not forget that the logistic regression is aimed toward predicting the log odds. We are able to create the rating by performing a linear transformation to those odds within the following approach:

In credit score threat, the factors to double the chances and 1:1 odds are sometimes set to twenty and 500 respectively, nevertheless this isn’t at all times the case and the values might differ. For the needs of my evaluation, I caught to those values.

We are able to visualise the calibrated rating by plotting its distribution.

Picture by creator

I break up the distribution by the goal variable (whether or not a buyer leaves the financial institution), this offers a helpful validation that every one the earlier steps have been executed accurately. These extra prone to go away the financial institution rating decrease and those that keep rating greater. There’s an overlap, however a rating is never good!

Based mostly on this rating, a advertising division might set a rating cut-off to find out which clients needs to be focused with a specific advertising marketing campaign. This cut-off will be set by this distribution and changing a rating again to a likelihood.

Translating a rating of 500 would give a likelihood of fifty% (do not forget that our 1:1 odds are equal to 500 for the calibration step). This could indicate that half of our clients under a rating of 500 would depart the financial institution. If we need to goal extra of those clients, we might simply want to lift the rating cut-off.

Representing Our Rating as a Scorecard

We already know that the logistic regression is made up of an intercept and a set of weights for every of the used options. We additionally know that the weights of proof have a direct linear relationship with the log odds. Realizing this, we are able to convert the weights of proof for every function to grasp its contribution to the general rating.

I’ve displayed this for all options within the mannequin in my Kaggle notebook, however under are examples we’ve got already seen when remodeling the variables into their weights of proof type.



The benefit of this illustration, versus the weights of proof type, is it ought to make sense to anybody without having to grasp the underlying maths. I can inform a advertising colleague that clients age 48 to 63 years previous are scoring decrease than different clients. A buyer with no steadiness of their account is extra prone to go away than somebody with a excessive steadiness.

You might have seen that within the scorecard the steadiness development is the alternative to what was noticed on the weights of proof stage. Now, low balances are scoring decrease. That is because of the coefficient connected to this function within the mannequin. It’s adverse and so is flipping the preliminary development. This will occur as there are numerous interactions taking place between the options in the course of the becoming of the mannequin. A choice have to be made whether or not these types of interactions are acceptable or whether or not you’ll need to drop the function if the development turns into unintuitive.

Supporting documentation can clarify the total element of any rating and the way it’s developed (or at the very least ought to!), however with simply the scorecard, anybody ought to be capable to get rapid insights!


We’ve got explored among the key steps in creating a rating primarily based on a logistic regression and the insights that it could deliver. The simplicity of the ultimate output is why this sort of rating continues to be used to today within the face of extra superior classification strategies.

The rating I developed for this competitors had an space below the curve of 87.4%, whereas the highest options primarily based on ensemble strategies had been round 90%. This exhibits that the easy mannequin continues to be aggressive, though not good in case you are simply searching for accuracy. Nevertheless, if to your subsequent classification job you might be searching for one thing easy and simply explainable, what about contemplating a scorecard to realize insights into your information?


[1] Walter Reade, Ashley Chow, Binary Classification with a Bank Churn Dataset (2024), Kaggle.


Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button