Dan takes a look at the news that broke this weekend after a massive data breach at a Panamanian law firm.
Even for those quantitative analysis geeks who enjoy combing through massive datasets to find a tiny bit of information, the Panama Papers had to look like a daunting data dump. For the uninformed, the Panama Papers are a leak of 11.5 million files from Mossack Fonseca, a Panamanian law firm that is the world's fourth-largest offshore law firm.
If you're looking for a good overview of the story, I urge you to head over to the Guardian, which was one of the partner media organizations that helped the International Consortium of Investigative Journalists (ICIJ) make sense of the documents, which were originally obtained by German newspaper Süddeutsche Zeitung from an anonymous source.
However, to boil it down to the simplest of terms, some of the world's wealthiest and most powerful people have allegedly been avoiding paying taxes by laundering money through shell companies set up by Mossack Fonseca.
While the work done by the journalists breaking the story is extremely impressive, and should be applauded, the actual content of the documents isn't that interesting to me. I don't need to read an investigative report to know that Vladimir Putin has probably done some illegal things. Also, everything I've ever needed to know about money laundering and tax evasion I learned from Sal Goodman in AMC's Breaking Bad, so I'm all set on that front.
No, what I've found most interesting about the Panama Papers is around the breach itself and how the data was analyzed.
I have to say, I was impressed when I first heard the leak occurred through a single, internal source, and wasn't due to some group of hackers, like Anonymous, breaking in through a gap in the security infrastructure. It just goes to show you that a firm could have all the firewall protection in the world, but one determined employee can still pose the greatest threat to information security.
It also reminded me of my colleague Anthony Malakian's April feature on CFTC's Regulation Automated Trading. Anthony's feature focuses in on one particularly controversial aspect of the potential regulation: Firms would be required to keep a source-code repository of their algorithms.
To put it simply, that would mean every firm would have a lockbox that essentially contains the most important proprietary data at the firm. As nearly every comment letter submitted to the CFTC pointed out, if that information were to fall into the wrong hands it would be catastrophic for a firm.
Which brings me back to my point about the Panama Papers. If a breach this size could happen at a firm whose foundation is based around secrecy and security, it's not completely unfathomable that the same would happen to a financial firm's hypothetical source-code repository.
Sorting Through the Data
The other fascinating piece of the Panama Papers, in my eyes, was the way the data was catalogued. Roughly 2.6 terabytes of data was eventually released to the journalists from the source over time. To put that in perspective, the amount of data from the 2010 Cablegate/Wikileaks (1.7 gigabytes), 2013 Offshore Leaks (260 GB), 2014 Luxemburg Leaks (4 GB) and 2015 Swiss Leaks (3.3 GB) all could fit comfortably inside the amount of data released in the Panama Papers.
Naturally, sorting through that amount of data is no easy task, especially when it hasn't been organized in a clean, sensible fashion. It's a problem all too common amongst financial firms.
Süddeutsche Zeitung does a great job giving an overview of the painstakingly long process, which included applying optical character recognition (OCR) to make the data searchable. It's an interesting process that some financial firms also deal with as they try and rid themselves of those last remaining paper-based processes.
This week on the Waters Wavelength podcast ─ Episode 10: Markit-IHS Merger, FIA Boca
Food for Thought
- My feature looking at the use of open-source software in financial services is live. Click here to read it.
- As I mentioned earlier, WatersTechnology US editor Anthony Malakian wrote a fantastic feature on the CFTC's Regulation AT. You can read it here. European staff writer John Brazier also profiled Aberdeen Asset Management chief technology officer Iain Plunkett. Check it out here. Also, Victor Anderson, our editor-in-chief, wrote a great story on Agile software development. You can find it here.
- We are under a month away from the North American Trading Architecture Summit 2016, which is held in New York. For more info on the event, click here.
Jesse Lund talks about real uses for DLT in the capital markets, lessons learned while rolling out IBM's blockchain platform, and what’s ahead for 2018, and into 2019.Subscribe to Weekly Wrap emails