Follow the Search Engines
Earlier this week, I had my first encounter with Apache Hadoop, the open-source parallel file framework for unstructured data. The system is currently being used by Yahoo to store much of the search engine's data, and the technology was built from similar technology developed and used by Google.
An industry contact of mine estimates that Yahoo has about 14 TB worth of data stored on its various servers.
The beauty of Hadoop is that it is, theoretically at least, infinitely scalable—it's just a matter of adding more servers to hold the additional data. The only catch is that it's for unstructured data, such as HTML text, which shouldn't come as surprise considering who contributed to the design of it.
There are a number of startups like Hadapt looking for ways to take Hadoop's scalability and mix it with various relational databases. Then there are some new firms, such as Mapr, creating a proprietary version of Hadoop, according to industry gossip.
Although much of the data in the financial services industry is structured data, the rising importance of unstructured data when it comes to trading cannot be underestimated.
We've seen the rise of commercially available machine-readable news over the past five years, where the news providers tag their stories to make it easier for automated analytics to consume them.
Now traders are looking to add data from unstructured content, such as government reports, court decisions and other content, to their automated decision-making.
The one spot that I'm not hearing about, though, is search engine results. I can't envision anything that would be higher up in the decision-making process than sitting down to Google, Yahoo or Bing, and literally seeing what the world is thinking about in real time. I'm sure that Google has perfected this to a science and will continue to manage its own treasury head and shoulders above the competition. The question is whether or not the search giant will commercialize that offering or keep it to itself.
I'm not sure whether financial firms would go as far as to create their own mini-Googles internally to analyze what is happening on the web, but I have a feeling that there are at least five firms going down this road, if not more, at the moment.
To handle this amount of unstructured data, Hadoop would seem to be the proper technology to adopt.
Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.
To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: http://subscriptions.waterstechnology.com/subscribe
You are currently unable to print this content. Please contact info@waterstechnology.com to find out more.
You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. Printing this content is for the sole use of the Authorised User (named subscriber), as outlined in our terms and conditions - https://www.infopro-insight.com/terms-conditions/insight-subscriptions/
If you would like to purchase additional rights please email info@waterstechnology.com
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. Copying this content is for the sole use of the Authorised User (named subscriber), as outlined in our terms and conditions - https://www.infopro-insight.com/terms-conditions/insight-subscriptions/
If you would like to purchase additional rights please email info@waterstechnology.com
More on Data Management
Deutsche Börse democratizes data with Marketplace offering
Deutsche Börse Group is set to unveil its Marketplace, a one-stop data shop designed to simplify and streamline data acquisition and consumption for its clients, while also surfacing data from across the firm to its own users. Jan Stiebing and Sven…
DSB says industry is ready to meet UPI mandate ahead of deadline
The Unique Product Identifier will be required for certain OTC derivatives in the EU at the end of April, following US adoption in January.
Mapping a successful data journey: strategy, execution and sustainability
A well-planned data journey can positively impact an organization’s long-term trajectory. However, it is important to have clarity not only in the strategy but also in successful execution and sustainability for the long haul, argues data veteran Subbiah Subramanian.
The IMD Wrap: The growing data catalogue space
With their potential to manage costs and surface strategic datasets, it’s no wonder Max gets excited about data catalogs. This week, he takes a look at a new startup entering the space.
LSEG to sunset Redi EMS in favor of Tora
Sources say competitors will look to seize on the decision to win over Redi’s sizeable US client base.
The IMD Wrap: Taking stock of inventory management
With market data and associated costs typically representing a firm’s third-largest expense, there’s a lot of incentive to manage data and its usage more efficiently. Max flings open his fridge to illustrate what’s new in this space.
Hub to lay off 20% of staff, sources say
Hub’s CEO says this is simply a case of a startup trying to stay nimble and efficient; others say it points to deeper issues.
Most read
- Sell-Side Technology Awards 2024: All the winners
- Deutsche Börse democratizes data with Marketplace offering
- Sell-Side Technology Awards 2024: Best sell-side front-office platform—Bloomberg