The short answer is: You need both.
Since the advent of the first high-frequency trading (HFT) firm, the quest for low-latency trading has been paramount. Strategies that were profitable before HFT are now obsolete.
Among those strategies with questionable profitability today are:
- Arbitrage: Markets move too quickly to allow time for arbitrage.
- Market making: HFT imposes excessive risks on those traders.
- Event trading: Competing against HFT in terms of speed of response to scheduled economic reports and conventional news is impossible, since HFT systems can process and react to the information quicker.
Faster execution is necessary to take advantage of short-term opportunities. Profitability is directly correlated to volume and volatility where tens of millions of shares need to be trading daily to create a profitable strategy. The financial impact of low-latency trading is profound, but the window of profitability is rapidly closing as more firms engage in low-latency execution.
Visionary Proprietary Trading Firms (PTF) need to build upon their low-latency competitive advantage by looking for a more sustainable approach. They are seeking to build smarter models that look well beyond the constraints of market data only. Today, firms are using social media feeds, real-time data feeds and data-generating devices such as smart meters (part of the Internet of Things) to gain a unique perspective into the market. Due to the variety, volume and velocity of these data sources, most of which are unstructured, firms are finding their current compute and storage infrastructure are simply not up to the task.
Developing smarter models and algorithmic trading architecture requires the use of big data analytics techniques to uncover patterns or signals upon which they can build a strategy, but does this new strategy really achieve alpha across rapidly changing market conditions? A strategy that builds growth over months but loses it all in seconds due to a non-tested market condition is dangerous. Back-testing is the critical step to accurately measure risk and forecast profitability, yet it is this step that is often short-circuited due to traditional technology limitations.
An immense amount of data is used during back testing, and it pushes the limitations of traditional storage architectures. Just finding, accessing and loading data could stretch from hours to days. Traditional network storage devices also lock the files to maintain data integrity, permitting only sequential files — exacerbating the issue.
Data input/output is the crux of the problem. One year of market data is 150 TB, and external data may run into the petabytes. A firm with multiple years’ worth of valuable data could have in excess of 10 PB. Just managing, accessing and protecting this vast data lake is a challenge. Extracting information from the data is the objective, and doing it quickly is the panacea.
Back-testing uses a subset (sample) of the relevant population. It is not uncommon that a simple back-testing dataset could be 5 TB to 50 TB in size and could take anywhere from 5 to 60 hours to load. Too often, applications spend more time moving data than computing on it. Let’s focus on the obvious I/O issue at hand.
Faced with a time crunch and needing to meet profitability objectives, quant researchers may short-circuit the back-testing cycle. How do they typically do this? They may use smaller dataset using less granular market data over a shorter time period, but is 300 GB to 500 GB of data really representative of the population? Back-testing is critical to expose the potential trading strategy to historical data in various market conditions. This smaller sample size could overlook potential market scenarios. Due to time constraints, they may also run fewer models.
Both shortcuts could be producing models that are invalid and putting your business at risk. Statistically speaking, there are only two ways to improve the degree of confidence in modeling or simulations:
- Use larger sample sizes — more representative of the population.
- Run more models.
For example, consider two firms. Firm A uses a traditional architecture while Firm B uses an HPC environment with a parallel file system.
Statistically speaking, Firm B would yield a higher degree of confidence in nearly every case.
How does this translate to your business?
- Improve the profitability ratio of profitable vs. losing strategies.
- Accurately reflect the risk/reward of each strategy.
- Reduce the new strategy development cycle time.
- Exit aging strategies that may no longer attain alpha — “faster to fail.”
- Continuously re-evaluate strategies with new data.
The ROI of these high performance data analytics (HPDA) platforms is very high. Why are they not implemented more often? Insight and experience! Many of the staff in today’s trading firms lack big data/HPC experience, and rely on more traditional approaches that were never optimized for big data.
A common error many customers make when faced with latency issues is that they add more compute. Unfortunately it does little to address the issue. They are I/O bound.
When developing an HPDA environment, you need to build a balanced platform that scales tri-directionally (compute, storage, networking) as requirements evolve. A high performance storage platform using a parallel file system or software-defined architecture is your best bet to break the I/O bottleneck.
Finally, HPDA needs to be enterprise-centric, supporting all your analytic platforms (Hadoop to SAS to your HPC custom-coded applications). Each platform has unique functionality requiring concurrent access to the same data repository.
There is immense potential for big data. Firms need to look beyond traditional approaches to uncover its value.