A Sensible Method to Evaluating Optimistic-Unlabeled (PU) Classifiers in Actual-World Enterprise Analytics | by Volodymyr Holomb | Mar, 2023

As companies more and more make use of machine studying fashions on collected information, one problem that arises is the presence of positive-unlabeled (PU) datasets. These datasets include solely a small portion of labelled information, with the remaining samples being unlabeled. Whereas unlabeled samples are sometimes thought of adverse, a few of them could also be optimistic. PU datasets are utilized in varied enterprise contexts, resembling predicting buyer churn or upsell alternatives, gross sales forecasting, and fraud detection.
Evaluating machine studying algorithms on PU datasets could be tough as a result of conventional metrics could not precisely replicate the mannequin’s efficiency. For instance, merely holding out the optimistic samples for testing and including unlabeled entries because the adverse class can lead to a extremely skewed confusion matrix inflated by false positives. This may happen when the mannequin detects optimistic samples within the testing set, however their corresponding labels are adverse.
To deal with the difficulty, our staff adopted a sensible method that estimates commonplace binary classification metrics on PU datasets by utilizing details about the anticipated frequency of optimistic samples. Our method includes utilizing the prior chance of the optimistic class (estimated in the course of the becoming of the self-learning classifier) to regulate the noticed false positives and true positives noticed on the take a look at. This method allows a extra correct analysis of the mannequin’s efficiency on PU datasets, even when the optimistic class is considerably underrepresented.
To reveal the efficacy of our method and run an experiment in a managed setting, we first created an artificial binary classification dataset utilizing sci-kit-learn’s make_classification operate. The optimistic samples symbolize the minor class within the information, and a PU studying situation is simulated by randomly deciding on a subset of the optimistic samples and eradicating their labels.
In a real-world enterprise situation, the dataset could sometimes include such a preset ratio of labelled / unlabelled entries. For instance, the dataset used to foretell buyer churn for the approaching 12 months could include labelled clients from the earlier 12 months who didn’t signal a brand new yearly contract, in addition to present clients who’ve comparable traits because the churned clients however haven’t but churned. On this case, the dataset could include as much as 40% churned clients, however solely half of them might be labelled as such (displaying the annual churn charge of 20%).
We then break up the info into coaching and testing units utilizing the train_test_split operate. The options X and a pseudo-labelled model of the goal variable y_pu are handed to the classifier for coaching. To judge the classifier’s efficiency, we compute commonplace machine studying metrics resembling accuracy, precision, and recall on the unlabeled model of the testing set, and evaluate them additional to the corresponding metrics computed on the unique labelled model.
Beneath we offer a code snippet that demonstrates the implementation of our proposed method for evaluating classifier efficiency on PU datasets.
Our compute_confusion_matrix operate determines the scale of the testing information and identifies the indices of optimistic samples within the coaching set. The mannequin’s chance estimates of the optimistic samples within the coaching set are then obtained, and their imply is computed, representing the chance {that a} optimistic pattern is labelled.
Subsequent, the operate applies the fitted ImPULSE mannequin to foretell the possibilities of the optimistic class for the testing information and creates a confusion matrix utilizing sci-kit-learn’s confusion_matrix operate. If the mannequin’s prior chance of the optimistic class is larger than zero, the operate adjusts the confusion matrix to account for the potential presence of unlabeled optimistic samples within the testing information. The operate estimates the anticipated variety of false positives and true positives because of unlabeled entries. It then adjusts the confusion matrix accordingly.
To make sure that the ensuing confusion matrix matches the scale of the testing information, the operate rounds and rescales it, adjusting the variety of true negatives if wanted.
After acquiring the adjusted confusion matrix, we will use it to calculate commonplace machine studying metrics to get extra correct, so far as attainable, of the mannequin’s efficiency.
Yow will discover the corresponding demo notebook on Jovian and the complete code within the GitHub repo.
We’ve got proposed a sensible method for evaluating machine studying fashions on positive-unlabeled (PU) datasets generally present in enterprise eventualities. Conventional analysis metrics could not precisely replicate the mannequin’s efficiency on such datasets. The method estimates commonplace binary classification metrics on PU datasets by utilizing the prior chance of the optimistic class, enabling a extra correct analysis of the mannequin’s efficiency.
- Jain, Shantanu, et al. “Recovering True Classifier Performance in Positive-Unlabeled Learning.”, 2017
- Bekker Jessa, and Davis Jesse. “Learning from Positive and Unlabeled Data: a Survey.”, 2018
- Agmon Alon. “Semi-Supervised Classification of Unlabeled Knowledge (PU Studying).”, 2022
- Saunders, Jack, and Freitas, A. “Evaluating the Predictive Performance of Positive-Unlabelled Classifiers: a Brief Critical Review and Practical Recommendations for Improvement.”, 2022