Is there always a trade-off between accuracy and variance? A short simulation

date
Sep 3, 2025
slug
always-accuracy-variance-trade-off
status
Published
summary
There’s one wide-spread statement in statistics and machine learning literature that increased flexibility comes with decreased robustness in the test set. However, is it always true?
tags
Academic
Data Analysis
AI
ML
Economics
Methodology
Math
Sklearn
type
Post
There’s one wide-spread statement in statistics and machine learning literature that increased flexibility comes with decreased robustness in the test set (e.g., ISLR, etc.). However, is it always true?
This notion strikes me today when I was listening to a lecture, DSS5120, and I decide to do a small simulation. This post serves as a record of this attempt.

Simulation

To specify a case where this statement is wrong, we start with a random dataset for simulation where a random variable y <- 3 * x1 + 2 * x2 + x3 + rnorm(1) , and x1, x2 , x3 are three distinct random variables distributed normally:
notion image
Now, we construct four different models with varying flexibility on the training set by a 4/1 ratio:
Suprisingly, if we calculate the Mean-squared-error on the test set, we would find that the MSE on the training set is more and more consistent with the test MSE with increasing flexbility:
notion image
In contrast to the common sense of the accuracy-variance trade-off, one may find that in the setting of common distributions of the training and testing set, increased flexibility of the model is associated with consistent MSE.

What’s the setting where the statement is wrong

In this setting, two datasets are quite similar. To illustrate, we can calculate the first-, second-, and third-order moments:
notion image

Implications

  • Be cautious about the assumption of a statement - different training and test sets
  • From this commonness scope, what cross-validation is really doing is to resample data such that the moments of the training set better approximate the population moments. Consequently, estimation strategies that leverage higher-order moments could potentially enhance efficiency in various problems in statistical learning, such as the emerging Generalized Methods of Moments in Econometrics.

© Rongxin 2021 - 2025