3.5. Validation curves: plotting scores to evaluate models (2024)

Every estimator has its advantages and drawbacks. Its generalization errorcan be decomposed in terms of bias, variance and noise. The bias of anestimator is its average error for different training sets. The varianceof an estimator indicates how sensitive it is to varying training sets. Noiseis a property of the data.

In the following plot, we see a function \(f(x) = \cos (\frac{3}{2} \pi x)\)and some noisy samples from that function. We use three different estimatorsto fit the function: linear regression with polynomial features of degree 1,4 and 15. We see that the first estimator can at best provide only a poor fitto the samples and the true function because it is too simple (high bias),the second estimator approximates it almost perfectly and the last estimatorapproximates the training data perfectly but does not fit the true functionvery well, i.e. it is very sensitive to varying training data (high variance).

Bias and variance are inherent properties of estimators and we usually have toselect learning algorithms and hyperparameters so that both bias and varianceare as low as possible (see Bias-variance dilemma). Another way to reducethe variance of a model is to use more training data. However, you should onlycollect more training data if the true function is too complex to beapproximated by an estimator with a lower variance.

In the simple one-dimensional problem that we have seen in the example it iseasy to see whether the estimator suffers from bias or variance. However, inhigh-dimensional spaces, models can become very difficult to visualize. Forthis reason, it is often helpful to use the tools described below.

Examples

3.5.1. Validation curve#

To validate a model we need a scoring function (see Metrics and scoring: quantifying the quality of predictions),for example accuracy for classifiers. The proper way of choosing multiplehyperparameters of an estimator is of course grid search or similar methods(see Tuning the hyper-parameters of an estimator) that select the hyperparameter with the maximum scoreon a validation set or multiple validation sets. Note that if we optimizethe hyperparameters based on a validation score the validation score is biasedand not a good estimate of the generalization any longer. To get a properestimate of the generalization we have to compute the score on another testset.

However, it is sometimes helpful to plot the influence of a singlehyperparameter on the training score and the validation score to find outwhether the estimator is overfitting or underfitting for some hyperparametervalues.

The function validation_curve can help in this case:

>>> import numpy as np>>> from sklearn.model_selection import validation_curve>>> from sklearn.datasets import load_iris>>> from sklearn.svm import SVC>>> np.random.seed(0)>>> X, y = load_iris(return_X_y=True)>>> indices = np.arange(y.shape[0])>>> np.random.shuffle(indices)>>> X, y = X[indices], y[indices]>>> train_scores, valid_scores = validation_curve(...  SVC(kernel="linear"), X, y, param_name="C", param_range=np.logspace(-7, 3, 3),... )>>> train_scoresarray([[0.90..., 0.94..., 0.91..., 0.89..., 0.92...], [0.9... , 0.92..., 0.93..., 0.92..., 0.93...], [0.97..., 1... , 0.98..., 0.97..., 0.99...]])>>> valid_scoresarray([[0.9..., 0.9... , 0.9... , 0.96..., 0.9... ], [0.9..., 0.83..., 0.96..., 0.96..., 0.93...], [1.... , 0.93..., 1.... , 1.... , 0.9... ]])

If you intend to plot the validation curves only, the classValidationCurveDisplay is more direct thanusing matplotlib manually on the results of a call to validation_curve.You can use the methodfrom_estimator similarlyto validation_curve to generate and plot the validation curve:

from sklearn.datasets import load_irisfrom sklearn.model_selection import ValidationCurveDisplayfrom sklearn.svm import SVCfrom sklearn.utils import shuffleX, y = load_iris(return_X_y=True)X, y = shuffle(X, y, random_state=0)ValidationCurveDisplay.from_estimator( SVC(kernel="linear"), X, y, param_name="C", param_range=np.logspace(-7, 3, 10))
3.5. Validation curves: plotting scores to evaluate models (2)

If the training score and the validation score are both low, the estimator willbe underfitting. If the training score is high and the validation score is low,the estimator is overfitting and otherwise it is working very well. A lowtraining score and a high validation score is usually not possible. Underfitting,overfitting, and a working model are shown in the in the plot below where we varythe parameter gamma of an SVM with an RBF kernel on the digits dataset.

3.5.2. Learning curve#

A learning curve shows the validation and training score of an estimatorfor varying numbers of training samples. It is a tool to find out how muchwe benefit from adding more training data and whether the estimator suffersmore from a variance error or a bias error. Consider the following examplewhere we plot the learning curve of a naive Bayes classifier and an SVM.

For the naive Bayes, both the validation score and the training scoreconverge to a value that is quite low with increasing size of the trainingset. Thus, we will probably not benefit much from more training data.

In contrast, for small amounts of data, the training score of the SVM ismuch greater than the validation score. Adding more training samples willmost likely increase generalization.

We can use the function learning_curve to generate the valuesthat are required to plot such a learning curve (number of samplesthat have been used, the average scores on the training sets and theaverage scores on the validation sets):

>>> from sklearn.model_selection import learning_curve>>> from sklearn.svm import SVC>>> train_sizes, train_scores, valid_scores = learning_curve(...  SVC(kernel='linear'), X, y, train_sizes=[50, 80, 110], cv=5)>>> train_sizesarray([ 50, 80, 110])>>> train_scoresarray([[0.98..., 0.98 , 0.98..., 0.98..., 0.98...], [0.98..., 1. , 0.98..., 0.98..., 0.98...], [0.98..., 1. , 0.98..., 0.98..., 0.99...]])>>> valid_scoresarray([[1. , 0.93..., 1. , 1. , 0.96...], [1. , 0.96..., 1. , 1. , 0.96...], [1. , 0.96..., 1. , 1. , 0.96...]])

If you intend to plot the learning curves only, the classLearningCurveDisplay will be easier to use.You can use the methodfrom_estimator similarlyto learning_curve to generate and plot the learning curve:

from sklearn.datasets import load_irisfrom sklearn.model_selection import LearningCurveDisplayfrom sklearn.svm import SVCfrom sklearn.utils import shuffleX, y = load_iris(return_X_y=True)X, y = shuffle(X, y, random_state=0)LearningCurveDisplay.from_estimator( SVC(kernel="linear"), X, y, train_sizes=[50, 80, 110], cv=5)
3.5. Validation curves: plotting scores to evaluate models (5)
3.5. Validation curves: plotting scores to evaluate models (2024)

FAQs

What is a validation curve? ›

Validation curve. Determine training and test scores for varying parameter values. Compute scores for an estimator with different values of a specified parameter. This is similar to grid search with one parameter. However, this will also compute training scores and is merely a utility for plotting the results.

Are validation curves used to compare the accuracy of a model across different cross validation folds? ›

Validation curves are used to compare the accuracy of a model across different cross-validation folds. Validation curves should only be generated using a single train-test split. Validation curves show how mean cross-validation accuracy varies a function of a hyperparameter, such as the SVM gamma parameter.

What does a learning curve mean? ›

The learning curve is a visual representation of how long it takes to acquire new skills or knowledge. In business, the slope of the learning curve represents the rate in which learning new skills translates into cost savings for a company.

What is a validation score? ›

⚖️In principle, validation and test scores are performance-scoring values obtained without looking at the data under analysis. They are usually considered to be a reasonable estimate of the generalization error of the model we are trying to optimize.

What is the validation curve for hyperparameter? ›

The validation curve is a graphical technique that can be used to measure the influence of a single hyperparameter. By looking at this curve, you can determine if the model is underfitting, overfitting or just-right for some range of hyperparameter values.

What is a good cross-validation score? ›

In this case, an average score of approximately 0.91 suggests a strong performance. We used 5 folds for cross-validation, so we have 5 individual scores. Stratified k-fold cross validation is a method of cross-validation that ensures that the proportion of samples for each class is roughly the same in each fold.

How cross-validation is used to evaluate model performance? ›

It involves dividing the dataset into multiple subsets, using some for training the model and the rest for testing, multiple times to obtain reliable performance metrics. Types of Cross-validation: 1. K-Fold Cross-Validation: The data is divided into K subsets (or “folds”).

Does cross-validation mean accuracy? ›

Cross-validation (CV) is a technique used to assess a machine learning model and test its performance (or accuracy). It involves reserving a specific sample of a dataset on which the model isn't trained. Later on, the model is tested on this sample to evaluate it.

What best describes learning curves? ›

A learning curve is a correlation between a learner's performance on a task and the number of attempts or time required to complete the task; this can be represented as a direct proportion on a graph.

What does a 90% learning curve mean? ›

A 90% learning curve is the rate that decreases the cumulative average cost or time as a task gets repeated. If the initial time to perform the task is 100 hours, this 90% tells us that it will take an average of 90 hours (100 x 0.90 ) to do two tasks.

What is the purpose of the validation phase? ›

The main function of the validation phase is to verify the validity of each alert according to its effect on the overall monitored system.

What is validation in real estate? ›

Dictionary validation examines the property value in the context of the corresponding property rule.

Are training and validation loss curves the same? ›

The training loss indicates how well the model is fitting the training data, while the validation loss indicates how well the model fits new data.

What does validation accuracy tell us? ›

It refers to how accurately the classifier can correctly classify data that it has not been trained on. However, high validation accuracy during development does not guarantee that the model is free of serious flaws, such as vulnerability to adversarial attacks or a tendency to misclassify data it was not trained on .

Top Articles
Latest Posts
Article information

Author: Carmelo Roob

Last Updated:

Views: 6166

Rating: 4.4 / 5 (45 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Carmelo Roob

Birthday: 1995-01-09

Address: Apt. 915 481 Sipes Cliff, New Gonzalobury, CO 80176

Phone: +6773780339780

Job: Sales Executive

Hobby: Gaming, Jogging, Rugby, Video gaming, Handball, Ice skating, Web surfing

Introduction: My name is Carmelo Roob, I am a modern, handsome, delightful, comfortable, attractive, vast, good person who loves writing and wants to share my knowledge and understanding with you.