Metrics for Classification models — Part 1

4 min readJul 27, 2022

Metrics used for evaluating a CNN based classification model

The evaluation of a classification model is extremely necessary step in building a machine learning model. Determining thee metrics values are necessary while deploying the model in real life scenarios.

In classification models, there can be 2 ways to solve the problem statement: one way is through class labels and the other is through probabilities.

Consider the case of binary classification model, there are 2 categories in it which can labelled as ‘YES’ and ‘NO’. Suppose the given data set has 50 YES samples and 50 NO samples, i.e, the data set distribution ratio is 50–50%. Similarly if the ratio is 60–40% or 70–30% we have a balanced model i.e, the model prediction will not be biased towards the higher ratio. For evaluating a balanced model, accuracy is used as the metric. Suppose the data set distribution ratio is 80–20%, then there is a chance that the model will be biased towards the 80% data, hence the model will be imbalanced. In this case metrics such as F Beta score, precision and recall are used for the model evaluation. Thus the evaluation metric is chosen depending upon the type of model.

There are various metrics used for model evaluation, they are: Confusion matrix, FPR-Type 1 error, FNR, Recall(TPR — True Positive Rate or Sensitivity), Precision, Accuracy, F Beta, Cohen Kappa, ROC Curve, AUC Score, PR Curve.

Confusion Matrix

For binary classification model, confusion matrix is a 2x2 matrix with the actual values along one axis and the predicted values along the other. The diagonal elements represent the true positive value TP and true negative TN values, i.e, the actual value and the predicted value are the same. The off diagonal elements are the False positive and the false negative(FN) values. The values TP and TN represent the accurate or true values.

FPR- False positive Rate = FP/(FP+TN) and FNR — False Negative Rate = FN/(FN+TP). In all classification problems, the aim is to reduce the Type1 error and Type 2 error. For a balanced data set, the Accuracy can be calculated using the formula: Accuracy = TP+TN/(TP+FP+TN+FN)

For an imbalanced data set, this method cant be used for finding the accuracy. Suppose there are 900 images belonging to class A and 100 images to class B then, the model will be biased towards class A. sample tested will be predicted as belonging to class A. So if accuracy is calculated for this case using the previous formula, then Accuracy = TP+TN/(TP+FP+TN+FN) = 900+0/1000 = 90% !! This value is incorrect.

So for such models with imbalanced data sets, precision and recall are used as the metrics for model evaluation.

Recall = TP/(TP+FN) or True Positive Rate (TPR) or Sensitivity: out of the total actual positive values, how many positive values were predicted correctly.

Precision or Positive Prediction Value = TP/(TP+FP): out of the total positive predicted results, how many were actually positive. In case of medical diagnosis, a false negative result can be disastrous as it can mislead a patient into believing that he is not diagnosed with a particular disease where in reality this is not true. Hence the calculation of recall is necessary for disease prediction model.

Now, in certain cases of imbalanced data sets where both Precision and Recall are equally important to get accurate values, F Beta score is considered.

When Beta=1, it is called as F1 Score. When both FP and FN are equally important select F1 score. In case of disease diagnosis both these parameters are equally important, hence F1 score can be used as a suitable metric for the model evaluation. If FP is more important, reduce the Beta value(usually between 0 and 1) and if FN is important increase the Beta value(usually in teh range 1 to 10).

Metrics for Classification models — Part 1

Written by Aparnashibu