Cutting Big Data Down to Size

- Michael Shashoua
- 28 Oct 2011

Last week, I wrote about a choice that could need to be made between cloud resources and the Hadoop tool for managing and working with big data. A better question, or a better way to frame the debate, could be as a decision about what is the best way to make use of cloud computing for data management, especially for "big data."

Tim Vogel, a veteran of data management projects for several major investment firms and service providers on Wall Street, has focused views on this subject. He advises that the cloud is best used for the most immediate real-time data and analytics. As an example, Vogel says the cloud would be an appropriate resource if one was concerned with the most recent five minutes of data under volume-weighted average pricing (VWAP).

"The cloud isn't cheap," he says. "Its best use is not for data on a security that hasn't traded in two weeks. Unless the objective is to cover the complete global universe, like [agency broker and trading technology provider] ITG does." Vogel points to intra-day pricing and intra-day analytics as tasks that could be enhanced, accelerated or otherwise improved upon through use of cloud computing resources. Data managers should think of securities data in two layers—a descriptive or identification layer and a pricing layer—both of which have to be processed and filtered to generate usable data that goes into cloud resources.

The active universe of securities as a whole, which includes fundamental data and analytics on securities, is really a super-set of what firms are trying to handle in terms of data on a daily basis, observes Vogel. With that in mind, the task for applying cloud computing to big data could actually be making big data smaller, or breaking it down into parts—cutting it down to size. That certainly will cut down on the bandwidth needed to send and retrieve data to and from the cloud, and consistently reconcile local data and cloud-stored data.

If nothing else, this is certainly a different way of looking at handling big data. It is worth considering whether going against the conventional or prevailing wisdom could lead data managers to a better way. Inside Reference Data would like to know what you think about this. We've reactivated our LinkedIn discussion group, where you can keep up with new stories being posted online, live tweets covering conference discussions, and provide feedback to questions and opinion pieces from IRD.

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@waterstechnology.com or view our subscription options here: https://subscriptions.waterstechnology.com/subscribe

You are currently unable to print this content. Please contact info@waterstechnology.com to find out more.

You are currently unable to copy this content. Please contact info@waterstechnology.com to find out more.

As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.

If you would like to purchase additional rights please email info@waterstechnology.com

You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.

If you would like to purchase additional rights please email info@waterstechnology.com

More on Data Management

Data infrastructure must keep pace with pension funds’ private market ambitions

As private markets grow in the UK, Keith Viverito says the infrastructure that underpins the sector needs to be improved, or these initiatives will fail.

25 Nov 2025

AI enthusiasts are running before they can walk

The IMD Wrap: As firms race to implement generative and agentic AI, having solid data foundations is crucial, but Wei-Shen wonders how many have put those foundations in.

20 Nov 2025

People running in marathon on city streets. Motion blur.

Jump Trading spinoff Pyth enters institutional market data

The data oracle has introduced Pyth Pro as it seeks to compete with the traditional players in market data more directly.

19 Nov 2025

50% of firms are using AI or ML to spot data quality issues

How does your firm stack up?

18 Nov 2025

Multicolored home grown fresh organic carrots arranged in size on a light gray background

FCA files to lift UK bond tape suspension, says legal claims ‘without merit’

After losing the bid for the UK’s bond CT, Ediphy sued the UK regulator, halting the tape’s implementation. Now, the FCA is asking the UK’s High Court to end the suspension and allow it to fight Ediphy’s claims in parallel.

18 Nov 2025

Man holding ticker tape showing stock prices

Waters Wavelength Ep. 339: Northern Trust Asset Management’s Jan Rohof

This week, Jan Rohof from Northern Trust Asset Management joins to discuss how asset managers and quants get more context from data.

18 Nov 2025

Tokenization & Private Markets: Where mixed data finds a needed partner?

Waters Wrap: Reading the tea leaves, Anthony predicts BlackRock’s Preqin deal, Securitize’s IPO, and numerous public comments from industry leaders are just the tip of the iceberg.

13 Nov 2025

Plaintiffs propose to represent all non-database Cusip licensees in last 7 years

If granted, the recent motion for class certification in the ongoing case against Cusip Global Services would allow end-user firms and third-party data vendors alike to join the lawsuit.

10 Nov 2025

Editor's View