Artificial intelligence and machine learning are changing the ways that firms understand and analyze data. While natural language processing is a decades-old technology, it’s gaining prominence today. So too is its sibling, natural language generation. Anthony Malakian looks at how Wall Street firms and technology providers are deploying these tools to improve research and analysis functions.
On Tuesday, December 20, 2016, Linde, a German supplier of industrial-gas products, announced that it was entering into a "merger of equals" with US competitor Praxair in a deal valued at $35 billion. The pairing had been talked about for months. The market reacted by dropping Linde's stock price from 162.15 to a low of 156.00, and Praxair's from 122.32 to 116.47.
But a subscriber of Bloomberg's Professional service was able to dig deeper. Using the terminal's "tren" function ─ short(er) for trend ─ users could get a series of different views of real-time trends on these two companies, specifically, relating to sentiment analysis. First, there's news sentiment ─ upwards of 1.2 million stories stream through the system each day, and the platform scores these stories based on positive or negative sentiment and the volume of stories relative to its average.
In terms of news volume, Linde was trending way up ─ over 100 percent greater than normal. While the market was down on the news ─ at this particular time, it was a little after 2 pm on the day of the announcement and its stock was down 4.18 percent ─ the news sentiment was slightly positive: +.09 rated on a scale from -1.00 to +1.00.
Next is the recently-added Twitter sentiment tab, which ingests a cultivated feed of about 500,000 Twitter accounts. The Twitterverse ─ including StockTwits ─ was neutral on the announcement ─ a nice, round 0.0.
Underpinning all of this information is a form of artificial intelligence (AI) known as natural language processing, or NLP.
“The automated intelligence is in what signals should be put out, because this puts out a lot of signals and sometimes you don’t know what those signals mean because of the way that you’ve constructed your learning model and the way you’re teaching your machine.” Marc Alvarez, Mizuho Securities
"What we found was that you couldn't apply the news sentiment algorithm to Twitter; they really are very distinct types of content that people use very different words to express themselves," says Adela Quinones, product manager of news and social media functionality at Bloomberg LP. "As a result, we chose to develop two different algorithms. For this, we use natural language processing and we also use support-vector machines to identify what the sentiment is. The news one, in particular, gets trained in over hundreds of thousands of story segments."
Historically, Bloomberg would index every piece of news using a rules-based system: which people were mentioned in the story, which companies, and what are the relevant topics? So, if a story flows through and Apple CEO Tim Cook is mentioned in it ─ or the iPad or iPhone ─ then "Apple" the company would automatically get tagged, as well. Depending on how many of these words were mentioned, the stronger the relevance it has to Apple.
"All of this has been done historically using rules. Now, increasingly what we're doing is using natural language processing to do this classification," Quinones says. "We started building the NLP-based system out about three years ago and are really starting to see the benefits in a big scale in the last year."
Ubiquitous, If Quiet
Natural language processing serves as a bridge between the raw, unstructured data and analytics platforms; basically, it's the melding of human and computer languages using AI and computation linguistics. As analytics platforms take on more importance, so too will natural language processing.
While it's not always so well known, NLP-underpinned technologies have seeped into many parts of everyday life. It's only logical, then, that it would creep into the capital markets: research, regulatory reporting, security, communication, portfolio management.
NLP ─ and its sibling, natural language generation (NLG) ─ is everywhere. But this decades-old AI is taking on greater importance because firms are now more easily able to suck in huge amounts of data, store it more cheaply, and distribute that information more easily. Additionally, cloud-based platforms and open-source projects have expanded NLP's scope of capabilities.
Marc Alvarez, chief data officer for Mizuho Securities, says that while there's already a whole universe of NLP technologies enveloping financial services as more data becomes available, more potential will arise. "We're getting our toes in," he says. "We use it to pick up sentiment from corporate press releases and earnings releases. That generates some interesting data."
Pluribus Labs was created two years ago and using AI and machine-learning techniques to extract signals from unstructured data. Everything that Pluribus delivers that's text-based is built using NLP tools.
The Berkeley, Calif.-based vendor uses AI to judge sentiment to predict outcomes and to analyze longer-form communications ─ i.e. SEC 10-K or 10-Q or Form 13F and earnings call transcripts.
"We use that sentiment ─ and aspects of the discussion that are not necessarily positive/negative sentiment, but more dispersion of opinions/volume of opinions ─ to predict outcomes of the market systematically," says Frank Freitas, founder and CEO of Pluribus. "We try to focus on taking the outcomes of that discussion to predict both volume and volatility in markets. What we want to capture is people saying things about either the security itself, executives that are associated with that entity, products associated with that entity, and then ─ based on capturing those right conversation points ─ arrive at a view of sentiment that's based on what real people are saying. That's where NLP comes in."
In the US, Pluribus runs sentiment analysis on social platforms like Twitter and StockTwits. In China, it curates Sina Weibo, a platform of 125 million users, many of whom are discussing market outcomes and individual securities. It is also working with a social media company based in Japan, and Australia will be next, says Freitas, who spent seven years as Instinet's COO. Pluribus' Deep Earnings platform offers a suite of metrics for every earnings call that happens for every company in the Russell 3000 Index.
To build out the platform, Pluribus had to create a proper finance dictionary.
"You need to be able to look at words in the context of parts of speech ─ where's the subject, where's the verb, where's the direct object, where's the indirect object?" he says. "Then you need, frankly, a dictionary that is machine-learned, or at least appropriate for the application that you're looking at. So for finance, we need to make sure that our dictionary loads on terms that have value for finance and that the terms themselves are scored right given their use in financial services."
Take, for example, the word "rip." Ninety-nine percent of the time it's used as a negative sentiment: You can rip your pants or rip a piece of paper. You can rip someone off. A rip-and-run is a crime. A fire can rip through a village. A bullet can rip a hole in your heart. I'm gonna rip your face off. But in the context of finance, rip is a positive outcome: The stock ripped up, meaning that it gained value rapidly.
"What we see is that by picking up the right words and the right valence of those words, we're accurately capturing the sentiment of the market," Freitas says.
Big for Small
Sitting just east of Lake Washington in the Pacific Northwest, Opal Advisors serves as a small fund managing just a few million dollars of assets. James Hua, Opal's founder and portfolio manager, left Freestone Capital Management ─ one of the largest registered investment advisors (RIAs) in the region ─ to try his hand at running his own shop. Knowing that he wouldn't be able to take advantage of all the tools he had at Freestone, he was referred by a colleague to AlphaSense.
AlphaSense offers a financial search engine that helps analysts and portfolio managers find info that's buried in dense text documents, "whether the information is in page 200 of an SEC filing footnote, or a broker research report, a piece of news or conference transcript," says Jack Kokko, CEO of the vendor.
Hua says that he uses the platform to generate investment ideas. "Usually, I have some companies that I'm interested in and then I'll do some searches around the company," he says. "If I read something, if I talk to another portfolio manager, if a company gets passed to me, the first thing I do is go to AlphaSense and read all the information they have: Qs, Ks, annual reports, presentation transcripts."
He says that the platform's NLP technology is very effective at highlighting and labeling relevant information.
"If I search for a certain word and they have a lot of synonyms attached to it, it gets underlined; there's a thesaurus database behind it," Hua says. "So, if I search ‘revenue guidance,' guidance is underlined and it takes everything similar to guidance, such as ‘given the range of' or ‘projected.' I don't have to be super precise about the words that I'm searching for. ... Sometimes I might miss something. I might term it as ‘risk,' but they term it as ‘uncertainties' within the balance sheet. So they recommend terms that I otherwise might not have looked for."
Hua estimates that the platform cuts his search times down on an individual company from four or five hours to 30 or 45 minutes. Much like how Bloomberg has moved from a rules-based engine to an NLP-driven process, Kokko says this is evidence of a larger trend overall. This is also thanks to advancements in another form of AI, known as deep learning.
"When we started, we were still seeing a lot people applying rules-based systems. They were trying to cover human language with rules that would apply to every possible situation. We saw that running out of steam," Kokko says. "What's made this orders of magnitude more scalable is how deep learning has come in to give the machine the task of categorizing language and finding relationships nearly automatically. Deep learning takes in all that content and lets the algos learn and come up with patterns and rules that people wouldn't have thought of."
After Processing, Generating
The next evolution in this sector has seen the technology move beyond processing to generation, says Mizuho's Alvarez.
"More interesting for us is natural language generation. Where we're heading, we're trying to use our machine-learning capabilities to generate signals," he says. "The automated intelligence is in what signals should be put out, because this puts out a lot of signals and sometimes you don't know what those signals mean because of the way that you've constructed your learning model and the way you're teaching your machine. As we get more and more sophisticated and we put better input into the acceptance criteria into those signals, it's then about how do we deliver and express that for someone who needs to know about this."
As an example of how this is starting to look, every day Bloomberg produces a swath of short news stories that fall under the byline "Bloomberg Automation" ─ today, XYZ Company saw trading volumes slightly below average; in the past week the stock is up 5 percent; here's how it looks compared to other companies in the sector, etc.
While a behemoth with the scale and resources can invest in both NLP and NLG, most in the vendor space are picking sides. Narrative Science, for example, is firmly in the NLG market.
The Chicago-based company, through its Quill suite of solutions, focuses on three areas for content generation: institutional portfolio commentary, regulatory compliance and improving customer engagement.
Franklin Templeton Investments, for instance, is using Quill Portfolio Commentary to streamline the firm's fund commentary process. The platform allows Franklin Templeton's global marketing team to scale standard fund reporting coverage and frequency. This allows them to reallocate some of the group's resources to "focus on higher-value tasks, like producing white papers," says Kim Neuwirth, director of product management for Narrative Science.
Nick Beil, Narrative Science's COO, says that while the vendor is focused on NLG, that doesn't mean that partnerships with NLP companies are out of the question. After all, the future will see a blending of the techniques to make platforms more interactive and all-encompassing. Beil points to consumer products like Amazon Echo and Google Home ─ which use voice recognition to play music or answer questions ─ as a sign of what's to come.
"Look at the world of voice as a user interface," Beil says. "We're not 10 years away from someone saying, ‘Amazon, how's my portfolio doing today?' Voice recognition and processing what is being asked, combined with Narrative Science service of language generation, drives conversation through that. Those technologies available through API services didn't exist five years ago the way they exist today."
Speaking the Lingo
In the late 1990s, while getting his PhD in computer science with a focus on computational neuroscience from Dublin City University, Tom Doris began working with natural-language interfaces, with a focus on generation. He has worked at Intel, Bear Stearns and Marshall Wace, and is currently the CEO of OTAS Technologies, which provides market analytics and trader support tools.
One of its products is Lingo, a natural-language reporting platform. Doris says the vendor really started to turn to natural language generation (NLG) once it realized it didn't have a good solution to alert users as to what happened overnight; they wanted a report or overview of the main metrics and highlights of the things most important to them in an easy-to-use format.
The key for OTAS wasn't just to convert the content to text; it was also about taking hundreds of individual indicators and boiling them down to the key indicators. He says the most challenging part of creating Lingo was the salience modeling, or finding metrics that are most interesting to humans, rather than oddities that aren't that odd. For example, how unusual is it for the options three-month implied volatility to be two times the industry average? The key is to use salience modeling to cut out most of the noise.
"You're basically creating narratives around conditions that have multiple factors feeding into them and it allows you to go that additional step to marry up with the hypotheses that the humans use to figure out when something is interesting or risky," Doris says.
Lingo gives traders an overview at the start of the day of the major things to watch for over the coming day or two, while providing something of a safety net so they are not blindsided by an event-something of a 50,000-foot view of the market.
On January 11, 2017, OTAS announced the launch of Intraday Lingo, which pairs automated natural language with microstructure charts to provide a textual report identifying moves and alerts, both in real time and historically. As the day unfolds, instead of just looking at charts for volume, liquidity and spreads, it provides a narrative that describes "blow-by-blow" what's happening in the market, Doris says.
He says that for NLG, it's important to not get too clever when developing reports. Trying to structure predisposed flair into the writ will only beget strain.
"We find that when text is machine generated, you don't want it to be completely dry, but you want it not to look like an attempt to mimic human text generation. When I use Excel to chart a price series, I expect the price series to look really good and generated by a computer and not like a drawing by a human being," he says. "A lot of people come into this thinking that they have to add in that arbitrary randomization to avoid appearing repetitive and therefore giving the game away; what we find is that users are very comfortable with that similar structure and begin to insist on it. They want the information to be where they expect it to be when they open that report."
Blending of Worlds
Like Bloomberg, Intel is one of those giants that walks both sides of the natural language street. Its Saffron Technology Group subsidiary offers a cognitive computing platform that "remembers" transitive relationships between entities over time, says Bruce Horn, CTO of Saffron Technology. "NLP allows us to take unstructured text and turn it into semantic entities that we can work on inside of Saffron," he says.
Among other things, Saffron's platform is used to report bugs and defects in software. If you have 1,000 developers trying to create and/or maintain a computer chip or piece of software ─ which involves debugging and putting those found defects into a "bug database" ─ it's more than likely that there will be overlap.
The developer writes what the problem is ─ such as the routines that were called, the version of the software, clock rates, etc. ─ and the Saffron system makes connections as to what that developer has written and compares that text to other requests in the database, looking for similarities. The system then alerts the developer that this bug has already been solved for and here's what was done, or it tells the developer that while this bug is not exactly recognized, it looks similar to a fix used by another group on another project.
Andrew Hickl, chief product officer for Saffron, says NLP helps to generalize, which is important because there are lots of different ways to correctly or incorrectly refer to different parts. It also helps to decode the complexity of human language, he says.
"Human language allows us shortcuts," Hickl says. "NLP and co-reference resolution and time normalization and geolocation normalization helps the system be more explicit and draw connections where they're supposed to be."
Finally, NLP allows users to more effectively look at relations between concepts and events. "To build these resources, NLP is the glue that allows these resources to provide value," he says.
Natural language processing dates back to the 1950s with Alan Turing ─ he of the famed Turing Test to see if a machine can exhibit intelligence and conversation that is indistinguishable from a human. Even though it's decades old, NLP ─ and, by extension, NLG ─ still has room to improve.
"NLP is still kind of a surface technology; there's not very much meaning storage or meaning understanding or memory process," adds Horn. "If I say, ‘Michelle Obama's speech,' an NLP system will say, ‘OK, I know Michelle Obama and I know what a speech is,' but does it know anything about the fact that Michelle gave an important speech two weeks ago? There's all this world knowledge that allows us in language to basically do a very low-bandwidth communication ─ Michelle Obama's speech ─ and have it activate a huge amount of context in your own mind."
But Horn also thinks that we're getting closer to a leap in evolution. "NLP is about to blast off with this new modeling of meaning, context and world knowledge that is starting to come together," he says.
* While natural language processing has been around since the 1950s, the fact that there are more sources of unstructured text data, the ability to take in and store data more cheaply, and the growth of analytics platforms has seen more firms deploying NLP tools.
* These tools are used for everything from research to regulatory requirements to anti-money laundering and know-your-customer requirements.
*The next stage is natural language generation, where the platform takes in the information, processes the information, and then spits out reports and other documents to alert users to pertinent information.
Linedata's Dave Remy and Chris Condron discuss all things CQRS and James Rundle goes over some of the big news breaking in the crypto space.Subscribe to Weekly Wrap emails