Is there always a trade-off between accuracy and variance? A short simulation

date

Sep 3, 2025

slug

always-accuracy-variance-trade-off

status

Published

summary

There’s one wide-spread statement in statistics and machine learning literature that increased flexibility comes with decreased robustness in the test set. However, is it always true?

Simulation

To specify a case where this statement is wrong, we start with a random dataset for simulation where a random variable y <- 3 * x1 + 2 * x2 + x3 + rnorm(1) , and x1, x2 , x3 are three distinct random variables distributed normally:

Now, we construct four different models with varying flexibility on the training set by a 4/1 ratio:

Suprisingly, if we calculate the Mean-squared-error on the test set, we would find that the MSE on the training set is more and more consistent with the test MSE with increasing flexbility:

In contrast to the common sense of the accuracy-variance trade-off, one may find that in the setting of common distributions of the training and testing set, increased flexibility of the model is associated with consistent MSE.

What’s the setting where the statement is wrong

In this setting, two datasets are quite similar. To illustrate, we can calculate the first-, second-, and third-order moments:

Implications

Be cautious about the assumption of a statement - different training and test sets

From this commonness scope, what cross-validation is really doing is to resample data such that the moments of the training set better approximate the population moments. Consequently, estimation strategies that leverage higher-order moments could potentially enhance efficiency in various problems in statistical learning, such as the emerging Generalized Methods of Moments in Econometrics.