Papers
arxiv:2503.03602

When Should we Expect Non-Decreasing Returns from Data in Prediction Tasks?

Published on Mar 5, 2025
Authors:

Abstract

This article studies the change in the prediction accuracy of a response variable when the number of predictors increases, and all variables follow a multivariate normal distribution. Assuming that the correlations between variables are independently drawn, I show that adding variables leads to globally increasing returns to scale when the mean of the correlation distribution is zero. The speed of learning depends positively on the variance of the correlation distribution. I use simulations to study the more complex case of correlation distributions with a non-zero mean and find a pattern of decreasing returns followed by increasing returns to scale - as long as the variance of correlations is not degenerate, in which case globally decreasing returns emerge. I train a collaborative filtering algorithm using the MovieLens 1M dataset to analyze returns from adding variables in a more realistic setting and find globally increasing returns to scale across 2,000 variables. The results suggest significant scale advantages from additional variables in prediction tasks.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.03602 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.03602 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.03602 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.