Fool’s Gold: Data Mining Digs Up Explosive Errors

Recent studies reveal the prevalence of poor-quality data, exacerbated by increased use of machine learning that allows users to dredge far bigger datasets and identify spurious correlations.


A review of 100 major psychology studies, for instance, found that only 36 percent had statistical significance. Over half the alien planets identified by Nasa’s Kepler telescope turned out to be stars. And in preclinical cancer research, a mere six out of 53 breakthrough studies were found to be reproducible. Quantitative finance does not fare much better.

“It’s a gigantic problem—spurious results are the norm,” says Zak David, co-founder of analytics firm Mile 59, and former engineer of high

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact or view our subscription options here:

You are currently unable to copy this content. Please contact to find out more.

Sorry, our subscription options are not loading right now

Please try again later. Get in touch with our customer services team if this issue persists.

New to Waterstechnology? View our subscription options

If you already have an account, please sign in here.

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here: