Michael Shashoua: Shadow Play

Michael Shashoua looks at dark data and suggests its value is based on 'meaning' residing within large datasets.

michael-shashoua-waters
Michael Shashoua explains that according to Thomson Reuters' Tim Lind, dark data is all about making connections between existing pieces of data.

The term "dark data" is becoming a buzzword in data management, in the same manner as big data rose in consciousness five years ago. Like big data before it, the term dark data raises more questions than answers, at least so far, about what it means.

In a feature in the February issue of Inside Reference Data, Thomson Reuters' resident data management expert Tim Lind said that making connections between existing pieces of data is the definition of dark data. Accessibility of the data is only the first part of the problem, and solving that still leaves firms having to figure out what predictive value the data may have once put into a fuller context.

That is similar to the struggle that firms have experienced with big data. Often when big data has been used in financial services operations discussions—particularly in data management operations discussions—it pertains to what is being done with data, such as how firms mine larger data volumes.

Just as the industry shouldn't get caught up in big data hysteria, it should also be cautious about overreacting to dark data. The most important thing is to know exactly what data you have, how frequently you are getting it, and what you are trying to achieve with it.

"Dark data could be a more appropriate term than big data," says Norbert Boon, executive director of Flytxt, a consultancy in Amsterdam that works on big data analytics issues. "Big data is the structured and unstructured data one stores, for which there is no immediate purpose," he says. "Dark data is much more appropriate—it's the value hidden inside. Finding that is the challenge."

Just as the industry shouldn't get caught up in big data hysteria, it should also be cautious about overreacting to dark data. The most important thing is to know exactly what data you have, how frequently you are getting it, and what you are trying to achieve with it.

Customer Data Grows
Another mark against the relevance of big data is the possibility that data generated in our industry is not as large in volume as other industries, such as the telecoms industry. Increased activity around know-your-customer (KYC) data in the financial sector makes this assertion a question rather than a certainty, however. The global data and messaging services provider Swift recently unveiled its KYC Registry, launched in December 2014, which is still growing. The registry is already being upgraded with a new profile feature to collect correspondent firm activity data, making its customer information more complete to better support risk and exposure management.

It could be said that this meets the definition of dark data that Lind proposes: connecting pieces of data that already exist, but haven't been appropriately linked to show a more complete picture of what's occurring in the market or in a firm.

Identifiers' Impact
Also on the subject of knowing what's going on within one's firm, the global legal entity identifier (LEI) initiative, which reached 340,000 registrations at the start February this year, is likely to eventually end up three times that number, according to Bill Hodash, managing director of business development at the Depository Trust & Clearing Corporation (the DTCC and Swift operate the Global Markets Entity Identifier utility that has handled about half of those registrations).

There may, however, be another spur to accelerate the growth of LEIs—more regulatory action. In Europe, Mifid II and the Central Securities Depositories Regulation, already began fueling more LEI issuance in 2014, and the US Securities and Exchange Commission's new amendments to Regulation SBSR in early 2015, which include LEI registration requirements for security-based swaps, are likely to raise registrations yet further.

This suggests that the dark data or big data stories are far from finished. If the number of LEIs to be managed goes far beyond the current expectation of about 1.5 million, that could create higher data volumes and more pieces of information that end up in different silos within firms.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

Systematic tools gain favor in fixed income

Automation is enabling systematic strategies in fixed income that were previously reserved for equities trading. The tech gap between the two may be closing, but differences remain.

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here