Fool’s Gold: Data Mining Digs Up Explosive Errors
Recent studies reveal the prevalence of poor-quality data, exacerbated by increased use of machine learning that allows users to dredge far bigger datasets and identify spurious correlations.
A review of 100 major psychology studies, for instance, found that only 36 percent had statistical significance. Over half the alien planets identified by Nasa’s Kepler telescope turned out to be stars. And in preclinical cancer research, a mere six out of 53 breakthrough studies were found to be reproducible. Quantitative finance does not fare much better.
“It’s a gigantic problem—spurious results are the norm,” says Zak David, co-founder of analytics firm Mile 59, and former engineer of high
Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.
To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: https://subscriptions.waterstechnology.com/subscribe
You are currently unable to print this content. Please contact info@waterstechnology.com to find out more.
You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.
Copyright Infopro Digital Limited. All rights reserved.
As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.
If you would like to purchase additional rights please email info@waterstechnology.com
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.
If you would like to purchase additional rights please email info@waterstechnology.com
More on Data Management
AI strategies could be pulling money into the data office
Benchmarking: As firms formalize AI strategies, some data offices are gaining attention and budget.
Identity resolution is key to future of tokenization
Firms should think not only about tokenization’s potential but also the underlying infrastructure and identity resolution, writes Cusip Global Services’ Matthew Bastian in this guest column.
Vendors are winning the AI buy-vs-build debate
Benchmarking: Most firms say proprietary LLM tools make up less than half of their AI capabilities as they revaluate earlier bets on building in-house.
Private markets boom exposes data weak points
As allocations to private market assets grow and are increasingly managed together with public market assets, firms need systems that enable different data types to coexist, says GoldenSource’s James Corrigan.
Banks hate data lineage, but regulators keep demanding it
Benchmarking: As firms automate regulatory reporting, a key BCBS 239 requirement is falling behind, raising questions about how much lineage banks really need.
TMX eyes global expansion in 2026 through data offering
The exchange operator bought Verity last fall in an expansion of its Datalinx business with a goal of growing it presence outside of Canada.
AI-driven infused reasoning set to democratize the capital markets
AI is reshaping how market participants interact with data, lowering barriers to entry and redefining what is possible when insight is generated at the same pace as the markets.
Fintechs grapple with how to enter Middle East markets
Intense relationship building, lack of data standards, and murky but improving market structure all await tech firms hoping to capitalize on the region’s growth.