Secret Formula: Sequencing the Genome of Data Scientists

Many firms are creating data science teams to gain insights from new sources of unstructured and alternative data. But finding people with the right skillsets within the financial markets ecosystem is proving to be a barrier that slows firms’ ability to leverage new data. Joanne Faulkner investigates how, as a result, firms are looking to other industries with established data science practices.

The establishment of data science teams is on the rise in capital markets, particularly on the buy side, as firms believe data science is key to unlocking value from alternative and unstructured data sources such as consumer transactions, geospatial data, or text documents via natural-language processing, and that the insights gleaned may lead to better investment decisions.

But while these techniques are relatively new to financial services, other industries, such as telecoms, have been leveraging them for some time already. Thus, many firms are now looking beyond the traditional financial-focused talent pools to bolster their data science teams. 

Market participants say the most desired candidates are those who can code and design algorithms—not just analyze data. And while data science requires an extra level of technical ability, a crucial factor is that candidates can “tell stories” with data. 

Mark Ainsworth, head of data insights and analytics at Schroders, leads a team of 15 data scientists within the firm’s Data Insights unit, which was established two and a half years ago. “We’re focused on being a source for alternative data and new techniques—contributing new things to the investment process: unusual, alternative datasets that are too noisy, too messy, and too big to be part of the traditional source of information,” he says. The team’s background is varied, with most coming from analyzing data in other industries, including bioinformatics—understanding biological data, such as DNA profiles—and online retail. One recruit is the former chief data scientist of, a travel and events booking site. 

Prior to joining Schroders, Ainsworth held varied data analytics roles, including at UK supermarket Tesco mining their loyalty card data for thousands of factors that could be analyzed to gain insight into customer behavior. “It was really important to know the specific questions that mattered to my colleagues in marketing. What were the particular projects that we’re working on? What are the strategic opportunities for Tesco? What generally matters to a retailer? If I knew those things, then I would focus on the specific bit of analysis and the specific way of communicating that analysis that would really make a difference,” he says. Similarly, knowing what questions to ask of what data is also pivotal to the success of the data science team at Schroders. 

Ainsworth also held roles at British Airways, and Formula One racing team McLaren, where he was a race strategy analyst. The emphasis at McLaren was on visualization techniques and predictive modelling techniques that could be used to build tools to aid decisions “before the race about how much fuel to put in the car, and during the race about when to do the pit stops—basically looking at visuals of data and complex analysis of data.” He says the biggest impact he had at McLaren was around simple data visualization. “There was a particular way of viewing where all the cars were on the track. When it comes to race strategy, it doesn’t really matter what corner somebody’s at; all that really matters is that one car is 10 seconds ahead of the other. I devised a very simple circle view where each car is placed on a circle based on how many seconds around a lap they’ve gone. It’s a live view of where all the cars are.” By knowing to take away information, the data team was able to have the biggest impact. “In the heat of the moment… you can look at that view and just make sense of the complexity of it much easier.” 

It’s Not All Rocket Science

Barney Rowe, senior data scientist at Fidelity International, was also new to financial services. Before joining the firm in 2014, he spent 10 years as an astrophysicist and worked at a NASA lab in California on the design of a camera for a space telescope. “My work in general was trying to characterize very weak signals in very noisy data, which was very useful training for the financial world. Having that background in sifting through data to try and find a very weak statistical signal hasn’t hurt,” he says. 

At Fidelity, Rowe says his team is designing tools that make use of advances in behavioral psychology to provide support in scenarios where known blind spots exist. 

As part of a new wider data strategy, Aberdeen Asset Management hired Ruben Lara as its first chief data officer seven months ago, responsible for “the whole value chain of data” within Aberdeen, which includes building out a data science team of four staff by year-end. “Data science is part of a refresh of our data strategy to make sure we deliver an information advantage to Aberdeen,” Lara says. 

The unit is focused on two areas: first, enabling the analytical teams “to do more, faster and better by leveraging more modern technologies to break down some of the silos which have built up over time;” and second, building analytics that will focus on deeper performance analysis within Aberdeen’s equities business. 

For example, questions such as “What are the reasons for certain performances?” or “What insights into the investment process emerge from the data?” are similar to strategies applied by Lara’s previous employers—most recently mobile phone network operator Vodafone, where he was head of Big Data and analytics. Before that, he set up a central analytics team at Spanish telecoms vendor Telefonica. “I was focused on the central core businesses, so looking at areas like marketing—how do we improve client profiling, for example. We built a suite of products—Smart Step—which uses telco data captured from the mobile network and some client proprietary data, analyzing it and aggregating it into insights around footfall, catchment for retailers, transportation demands for transportation operators, information about planning for cities, for example. We took a lot of the analysis we had been doing for internal purposes and packaged it into products for those sectors—other companies… with different motivations for what they wanted to know from the data and analytics we could deliver.”  

This presented many data challenges. “You’re dealing with a few billion records a day. Those volumes need processing in different ways,” Lara says, adding that it was paramount that the team had a grasp of distributed computing and understood how to scale storage up and down leveraging cloud. “The signals are not precise like a GPS; they are approximate locations, so you have to build models that can fill in the blanks.”

Lara says the data used by both Aberdeen and telco companies is complex. “And some of the datasets you can leverage are common. For example, geospatial data can be leveraged by both industries, and is being used by asset managers and investment banks.” But as an industry, some telcos have been leveraging technology such as public cloud, distributed computing, and open-source tools such as Hadoop, and have been mining incredibly large datasets for more than five years. Thus, candidates from outside the financial industry are desirable because they’ve been using these skills for so long. “We are looking for people who are comfortable with at least some of these technologies…. We say in our job ads that experience in finance is desirable, but not a must, which means we get candidates apply from different domains. A finance background is a ‘nice to have,’ but we’re looking for a combination of people who can give us an interesting richness of approaches and creativity to solving problems,” he says. 

At the North American Financial Information Summit event held in New York in May, panelists also wrestled with finding people who could tick all the data science boxes—although speakers placed more emphasis on domain knowledge. Kayoshi Wiesner, executive director of global capital markets at Morgan Stanley, said that to him, data science is divided into three components: front-end, models and back-end. “The back-end component is data architecture, and data garbage in [means] garbage out. If you don’t have good data, it doesn’t matter how good your model is—the output doesn’t mean anything. You really have to have a good back-end component to have the sound data science practice.” 

To Wiesner, the font-end component is the output of that architecture: how the data is presented to those who have to use it. “Data visualization is the front-end component… how you make the data models’ output easy to understand and interpretable to the users.” Wiesner said that while many people think of data science as this model component, the front- and back-end components are equally important. 

Cross-Functional Experience

And it can be a struggle to find candidates who are both math-oriented and can also communicate in “layman business-speak” to explain the impact that a model will have on the business. With the “three Vs” of Big Data—velocity, volume and variety—set to only increase, data professionals need to learn “something new” to adapt to this new business reality, Wiesner said. “At the same time, a lot of modelers and quants don’t run the models for themselves—they don’t consume the output by themselves; they typically work with someone else within the organization. They need to understand how to communicate with people without a data science background,” he added.

One reason why it’s so challenging to find the right candidates is that they are now expected to have experience across very different fields, Wiesner said—though knowing how to look at data “the right way and how to ask the right questions” is more important than having a double PhD in computer science. 

Fidelity’s Rowe—who joined the firm on a scheme that recruits PhDs from non-finance backgrounds—says that while the ability to write code is a desired skill, the most important thing is an ability to learn quickly. “We are a research function…. We’re not only delivering insights to our investment team; we’re trying to work out the most effective way to deploy new datasets, visualization and analysis to most effectively do that. There is no best practice; there is no rulebook that we can follow. We’re aiming to set those standards ourselves. We’re looking for people who are flexible, intrepid, and are used to there being no established rulebook—which is essentially people who are used to a research environment,” he says.

Rowe adds that the financial industry has historically focused too narrowly on those with certain finance-related backgrounds, but this is now changing. “There’s a wealth of skillsets out there that are very transferable. I’d love to hire a biologist—specifically in terms of understanding networks of organisms and ecosystem dynamics. There’s a lot we in finance could learn from that area.” 

For Aberdeen’s Lara, the high demand for data scientists across industries can make filling roles a long process. “There is obviously a lot of competition between industries: finance, telco, and from technology companies like Google and Netflix, who are trying to recruit the same candidates,” he says, though he remains optimistic that Aberdeen can offer candidates an environment where they can see the impact of their work more visibly. 

“I’m not going to argue about the depth and technical mastery that Amazon has—that’s a business that has grown based on good data. However, when you work in some of these companies, you are focusing on a very specific problem. In terms of technical application of your knowledge you have maybe more depth, but less breadth of your skills. Whereas in asset management, there’s an exciting variety of data and problems that you can solve for, and where data scientists are very much needed,” Lara says. “The data science team at Aberdeen has a good opportunity to engage directly with those who are making the decisions based on data, across different datasets, pulling in different techniques, and there is a much shorter connection between what you do, what you analyze, and how it translates into an outcome.”

Schroders’ Ainsworth says that while the firm has been able to attract talent, “there are some people who think they don’t want to work in this industry because everyone wears a suit and it doesn’t sound quite as exciting as a startup in Shoreditch. But in practice, we are a startup: This function didn’t previously exist. I wasn’t aware of any asset manager who had a function like this. It was a startup in the sense that we are working out what problems to solve and how to solve them, and how to create value completely from scratch—and that attracted lots of good people who liked the idea of that without the genuine peril that the thing might go bust at any moment,” he says. 

Data in the DNA?

While finding candidates with the right combination of skills is tricky now, Sarah Biller, former managing director and head of innovation ventures at State Street, says that “within the next three to five years, these will be skills that most of your colleagues possess. They will have some understanding of quantitative techniques—it’s the way the world is going.” 

This is reflected by the emergence of data science skills within finance-oriented college courses, including programs at MIT and Harvard, says Biller, who teaches in Brandeis University’s FinTech Masters of Science degree program, which includes on its syllabus items such as how to assess and handle Big Data, how machine learning and AI can be applied to new and proprietary datasets, and how to code in Python. Programs like these will be a “powerful draw to larger firms… not too dissimilar to what we saw with the Masters of Science programs which came out of the US in the mid- to late 1990s to 2000s,” she says.

Thus, data scientists are not a special species, but rather represent a set of skills and capabilities that will spread across organizations, she says. “The data science continuum is the entire process of abstracting information from a growing variety of information and putting it into a format that we can use to make better decisions. It’s not just the use of advanced techniques like deep learning…. It starts well before that: it’s the handling of the data.”

  • LinkedIn  
  • Save this article
  • Print this page  

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact [email protected] or view our subscription options here:

You are currently unable to copy this content. Please contact [email protected] to find out more.

You need to sign in to use this feature. If you don’t have a WatersTechnology account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here: