Originally published on TabbFORUM
Capital markets have a massive bad data problem that amounts to an estimated $50 billion a year spent on manually cleaning broken data. Even worse, the problem is growing as the amount of data generated continues to expand rapidly and is typically shared in a batch process often limited to once daily reports. Here, we’ll review the root causes of broken data in financial markets, coupled with end-of-day reporting, and how that can lead to an ever-growing backlog of reconciliations.
The Many-Hops Problem
Perhaps the most important foundational knowledge to understand about post trade processes in capital markets is the sheer quantity of systems involved, each managed by different parties to complete their respective part of complex workflows.
It wouldn’t be atypical for information about a single trade to end up in a dozen distinct databases. See Exhibit 1 showing the various systems involved in an over-the-counter derivative (OTC derivative) trade processing workflow. OTC derivatives represent the largest notional value of any financial market, so this is far from a fringe case.
Exhibit 1: OTC Derivative Trade Workflow
As shown, trade data has to move between systems within each entity and between different parties involved in the trade processing workflow, often changing state or being appended at various points. Imagine a giant game of telephone, except with billions of highly complex messages, and it’s easy to understand how errors can occur as information moves between parties.
Now imagine that game of telephone, but the players don’t speak the same native language.
The Translation Problem
To add an additional layer to the issue, these systems are not natively designed to speak to each other. They’re built to do their piece of the workflow as the top priority, and connectivity to other systems is inherently secondary.
As shown in Exhibit 2, data exists at the original source – often an external data producer’s system – and has to periodically get to its destination – the data consumer’s system. With all forms of traditional bulk data transfer, not only do each of the parties persist the data in whatever method they’ve independently designed in their databases, but there is also a third, intermediary representation in the form of CSV or PDF files, API messages, and other formats.
Exhibit 2: Traditional Data Transfer
That’s three representations of data, with two translations, just to move information from the producer’s system to the consumer’s system. It’s easy to see the compounding impact this would have when scaled across thousands of systems market-wide.
The Constant Change Problem
As the data evolves, both the data producer and the data consumer (the sell-side and buy-side participants) need to ensure their systems stay in sync. The implications of misaligned data are significant. For example, in equity swaps, one of the main fields that is often changing is the ‘spread.’ When that field changes, the sell-side participant, such as a prime broker, updates their system before payment is due. While data breaks can occur in a number of areas, this field in particular is often prone to issues as it may change numerous times through the life of a deal, resulting in misalignment of accrual calculations and settlement breaks.
Additionally, a sell-side firm may make updates to a field, such as adding margin to a trade, creating an extra column of data in the source database. This sort of schema change is not typically picked up by the end user via other modern data-sharing methods like APIs. A mismatched field can result in a different payment amount, requiring manual reconciliation.
In addition to the field-level changes, it is most common for bulk position data to be shared via end-of-day (or sometimes end-of-month) reports that are “point in time,” further exacerbating the constant change and subsequent reconciliation process that follows.
The Complexity Problem
To offer a sense of the complexity, Exhibit 3 shows approximately one third of a sample payload related to an OTC equity swap. In this example, an average of 100+ data points are changing throughout the deal, and each data field may have numerous acceptable variables, further complicating the situation. Using traditional means of data management, each database in the chain of operations outlined above will attempt to store this data in whatever structure the developer of that database defined.
Exhibit 3: OTC Equity Swap Payload Sample
The above payload is a total return swap on a stock, a sample from the equity swaps market, which averages more than 20 million trades per year, each with multiple updates throughout a lifecycle that can last a few years. Each of those updates can trigger complex calculations or changes to the state of the trade, with those updates having to flow between all the systems described in the previous section.
You can imagine how often this breaks, but you don’t have to: we have the data in Exhibit 4.
Exhibit 4: Equity Swap Breaks
Sources
1 DTCC
2 Axoni client data and client feedback
3 Axoni client estimation
4 Cost to fix break multiplied by the number of breaks / year
The Bottom Line
The cost of fixing upstream changes in the lifecycle of a trade is a heavy burden on any organization. Neither side wants to take on these operational costs, let alone be left to deal with the aftermath of systemic or regulatory risk. As stewards of trust, it is the responsibility of all financial firms to guarantee their systems demonstrate accuracy and completeness versus the source of truth.
For post-trade data reconciliation, most companies typically receive end-of-day snapshot reports, often sent over SFTP. Firms reconcile their positions against the source of truth after the trading day is complete, often utilizing offshore teams, resulting in an overnight reconciliation cycle. By this time, the underlying positions may have already changed. This hampers the ability to proactively manage risk and undermines certainty in the accuracy of internal records for trading and other decision-making or insights. Additionally, the legacy infrastructure supporting these daily reporting systems is often unreliable and error prone. As markets continue to shorten settlement cycles, there is even less time to identify and resolve discrepancies, highlighting the critical need to rethink the data sharing tools and processes used today.