Fool’s Gold: Data Mining Digs Up Explosive Errors
Recent studies reveal the prevalence of poor-quality data, exacerbated by increased use of machine learning that allows users to dredge far bigger datasets and identify spurious correlations.

A review of 100 major psychology studies, for instance, found that only 36 percent had statistical significance. Over half the alien planets identified by Nasa’s Kepler telescope turned out to be stars. And in preclinical cancer research, a mere six out of 53 breakthrough studies were found to be reproducible. Quantitative finance does not fare much better.
“It’s a gigantic problem—spurious results are the norm,” says Zak David, co-founder of analytics firm Mile 59, and former engineer of high
Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.
To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: https://subscriptions.waterstechnology.com/subscribe
You are currently unable to print this content. Please contact info@waterstechnology.com to find out more.
You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.
Copyright Infopro Digital Limited. All rights reserved.
As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.
If you would like to purchase additional rights please email info@waterstechnology.com
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.
If you would like to purchase additional rights please email info@waterstechnology.com
More on Data Management
Goldman’s credit reporting proposal sparks criticism
The shift to end-of-day and next-day reporting on large portfolio trades is seen as a step back for transparency.
S&P Global partners with IBM, Eventus launches Frank AI, Tradeweb expands algo execution abilities, and more
The Waters Cooler: Arcesium makes waves with Aquata Marketplace, NYSE Cloud flows into Blue Ocean Technologies, and more in this week’s news roundup.
Is market data compliance too complex for AI?
The IMD Wrap: Reb looks at two recent studies and an article by CJC, which cast doubt on AI’s ability to manage complexity.
Robinhood looks to ‘Chaos Monkey’ for op resilience playbook
As firms look to break down silos across business divisions to bolster operational resilience, the US broker is ditching emails, while utilizing chaos engineering and automating everything in sight.
Can AI be the solution to ESG backlash?
AI is streamlining the complexities of ESG data management, but there are still ongoing challenges.
Drilling down into data redistribution
A series of podcasts focusing on data redistribution across the financial services industry.
Will return-to-office mandates fuel market data brain drain?
The IMD Wrap: Increasingly, market data systems can be operated completely remotely. So, why are firms insisting that data professionals return to the office?
Industry vets ally to launch full-service data consultancy
The new company combines the skills and experience of individuals and firms that each serve different needs of the data industry.