Follow the Search Engines - WatersTechnology.com

Rob Daly, Sell-Side Technology

- Rob Daly
- 25 Mar 2011

Earlier this week, I had my first encounter with Apache Hadoop, the open-source parallel file framework for unstructured data. The system is currently being used by Yahoo to store much of the search engine's data, and the technology was built from similar technology developed and used by Google.

An industry contact of mine estimates that Yahoo has about 14 TB worth of data stored on its various servers.

The beauty of Hadoop is that it is, theoretically at least, infinitely scalable—it's just a matter of adding more servers to hold the additional data. The only catch is that it's for unstructured data, such as HTML text, which shouldn't come as surprise considering who contributed to the design of it.

There are a number of startups like Hadapt looking for ways to take Hadoop's scalability and mix it with various relational databases. Then there are some new firms, such as Mapr, creating a proprietary version of Hadoop, according to industry gossip.

Although much of the data in the financial services industry is structured data, the rising importance of unstructured data when it comes to trading cannot be underestimated.

We've seen the rise of commercially available machine-readable news over the past five years, where the news providers tag their stories to make it easier for automated analytics to consume them.

Now traders are looking to add data from unstructured content, such as government reports, court decisions and other content, to their automated decision-making.

The one spot that I'm not hearing about, though, is search engine results. I can't envision anything that would be higher up in the decision-making process than sitting down to Google, Yahoo or Bing, and literally seeing what the world is thinking about in real time. I'm sure that Google has perfected this to a science and will continue to manage its own treasury head and shoulders above the competition. The question is whether or not the search giant will commercialize that offering or keep it to itself.

I'm not sure whether financial firms would go as far as to create their own mini-Googles internally to analyze what is happening on the web, but I have a feeling that there are at least five firms going down this road, if not more, at the moment.

To handle this amount of unstructured data, Hadoop would seem to be the proper technology to adopt.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: https://subscriptions.waterstechnology.com/subscribe

You are currently unable to print this content. Please contact info@waterstechnology.com to find out more.

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.

If you would like to purchase additional rights please email info@waterstechnology.com

You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.

If you would like to purchase additional rights please email info@waterstechnology.com

More on Data Management

After Dora, ITRS pursues agentic AI for autonomous monitoring

Chief product officer says firms can bolster data resilience with new forms of AI.

04 Mar 2026

Futuristic neon shield with a checkmark on a digital background. Concept of cybersecurity, data protection, digital safety, privacy policy, compliance and secure technology.

Geopolitics hits Middle East datacenters and firms’ operations

The IMD Wrap: Wei-Shen examines recent disruptions to AWS datacenters in the Middle East linked to the US-Israel strikes on Iran, and what it means for data and businesses operating in the region.

03 Mar 2026

CME rankles market data users with licensing changes

The exchange began charging for historically free end-of-day data in 2025, angering some users.

24 Feb 2026

Composite photo collage of massive hairy hands hold wad money count dollars earnings income wealth isolated on painted background.

Data heads scratch heads over data quality headwinds

Bank and asset manager execs say the pressure is on to build AI tools. They also say getting the data right is crucial, but not everyone appreciates that.

19 Feb 2026

Reddit fills gaping maw left by Twitter in alt data market

The IMD Wrap: In 2021, Reddit was thrust into the spotlight when day traders used the site to squeeze hedge funds. Now, for Intercontinental Exchange, it is the new it-girl of alternative data.

18 Feb 2026

Knowledge graphs, data quality, and reuse form Bloomberg’s AI strategy

Since 2023, Bloomberg has unveiled its internal LLM, BloombergGPT, and added an array of AI-powered tools to the Terminal. As banks and asset managers explore generative and agentic AI, what lessons can be learned from a massive tech and data provider?

17 Feb 2026

ICE launches Polymarket tool, Broadridge buys CQG, and more

The Waters Cooler: Deutsche Börse acquires remaining stake in ISS Stoxx, Etrading bids for EU derivatives tape, Lofthouse is out at ASX, and more in this week’s news roundup.

13 Feb 2026

Fidelity expands open-source ambitions as attitudes and key players shift

Waters Wrap: Fidelity Investments is deepening its partnership with Finos, which Anthony says hints at wider changes in the world of tech development.

11 Feb 2026

You are currently on corporate access.