Big data is beginning to have a major impact on the financial services sector, and many organizations in the industry are finding themselves in a situation in which they need to adopt analytics solutions for big data. This has left many financial services firms struggling to figure out how they can deploy technologies that meet their big data needs. High-performance computing systems have incredible potential in this area. To help you get acclimated with how HPC systems can help support big data, we got together with Austin Trippensee, Functional Solutions Architect at YarcData.
What industry and economic factors are pushing financial services firms to engage in big data strategies?
New regulations, expanded product portfolios, and increasing variety in forms of electronic payments are driving financial services organizations to embrace big data analytics.
The first factor – new regulations around risk aggregation and reporting – is driving implementation of robust aggregation and reporting capabilities across the enterprise. Second, the sheer volume of data has increased as financial services organizations have expanded and diversified their product portfolios. To meet regulatory requirements and better tailor their products, banks are creating a 360-degree view of the customer – and an overwhelming big data problem in the process. Finally, as the use of credit cards, debit cards, pre-paid cards and other forms of electronic payments increases, so does the amount of associated data. With these additional forms of electronic data, the risk of fraud has grown, driving financial services organizations to focus on their ability to manage and analyze data faster.
What technologies should a financial services firm leverage to derive value from big data?
The analysis of big data can be broadly broken down into two distinct categories: search and discovery, with patterns found during discovery becoming the patterns used to search.
When search techniques are applied to big data, one of the major considerations will revolve around how the data will be partitioned across either a grid or cluster environment. To partition the data effectively, the user will need to know in advance the types of queries that may need to be supported. The challenge is that the partitioning methodology will limit the types of queries that can be supported while maintaining performance.
Therefore, partitioned data is not a silver bullet for all business problems or queries, particularly large discovery problems where graph analytics is a more appropriate technology. When performing discovery – because discovery is an attempt to find new patterns – queries may traverse unpredictable paths, and may not be well served by a partitioned architecture. During discovery, typical partitioning techniques introduce performance delays due to I/O bottlenecks. From a practical perspective, discovery is best performed across a dataset that exists within a single, in-memory process rather than partitioned across a grid or cluster architecture.
What technological solutions, particularly infrastructure, are available for financial services organizations dealing with big data?
Today, most financial services organizations try to solve all of their big data challenges using either grid or cluster technologies. One popular approach involves the use of Hadoop running on commodity x86-based clusters. The market is saturated with many software providers, each attempting to differentiate through their proprietary software solutions and all facing the same technological limitations. As a result, competing solutions differ very little from one another in terms of performance.
At Cray we’re taking a different approach, leveraging our supercomputing technologies to improve I/O performance, disk utilization and efficiency. Additionally, we provide the largest globally shared-memory environment for graph analytics, and an innovative cluster appliance that can leverage a parallel file system to improve performance, reduce data center footprint, and lower on-going maintenance and support costs for Hadoop-based implementations.
How do HPC systems stand out when dealing with the types of data sets that financial services companies need to deal with?
HPC systems have been used to manage big data for quite some time. As a result, HPC systems include technologies like superfast interconnects and parallel file systems to improve I/O performance. In addition, some HPC technologies include CPUs capable of mitigating memory latency through the parallelization of processing and the use of a large number of CPU threads. Financial services companies should consider leveraging all of the benefits born through HPC computing development.
Here at Cray we’re leveraging our long history in HPC to bring unique hardware and software solutions for both grid and cluster environments as well as large graph analytics applications. We are the only vendor that can deliver solutions across the entire spectrum of big data analytic requirements.
Why does graph analytics stand out as a big data solution for financial services?
One of the more common promises of big data has been its ability to yield new results or provide access to previously unavailable information. However, an organization using search capabilities on grids and clusters, will only be able to find “known” results. Organizations will use these approaches for the purposes of aggregating or decomposing large datasets across the entire enterprise.
But, how much of this will be new information?
Graph analytics on big data offers the opportunity for financial services organizations to find new patterns in their data. This capability delivers on the promise of big data in a way that is not possible using traditional analytic techniques. Graphs enable the ability to find the “unknown unknowns.” In financial services, these capabilities are being pursued in the traditional areas of cybersecurity and information security. However, financial services organizations also have some unique business challenges around fraud, and the recent enhancements to the regulatory requirements around customer due diligence.
In addition, certain areas around customer analytics and identifying customer and product characteristics may yield the most profitable business opportunities for financial services organizations. With a better understanding of one’s customers and products, financial services companies can create new, tailored products or improve the identification of customer segments to offer cross-sell or up-sell opportunities.
Financial services organizations have been using traditional data mining techniques for more than 10 years. All of these techniques start by requiring the user to sample or partition the data into small chunks that can be processed. Initially, the ability to find small patterns in subsets of data led to significant improvements in score carding, fraud detection and customer analytics. However, most organizations have maximized the benefits of working with samples of their data.
Graph analytics on big data provides the ability to radically change the data mining paradigms that limit analysts to samples. Now, the paradigms can start by looking at all the data and therefore all of the connections in the data. These capabilities will avail new patterns and enable organizations to see significant improvements to their analytic programs. Very similar to the jump they experienced 10 years ago, graph analytics has the opportunity to provide the next leap in analytical improvements.
Austin Trippensee, Functional Solutions Architect