Big Data and Analytics systems are fast emerging as one of the critical system in an organization’s IT environment. But with such huge amount of data, comes performance challenges. If Big Data systems cannot be used to make or forecast critical business decisions, or provide the insights into business values, hidden under huge amount of data, at the right time, then these systems lose their relevance. This paper talks about some of the most critical performance considerations in a technology agnostic way. It talks about some techniques or guidelines, which can be used, during different phases of a big data system (i.e. data extraction, data cleansing, processing, storage as well as presentation). This should act as generic guidelines, which can be used by any Big Data professional to ensure that the final system meets the performance requirements of the system.
A Big Data system compromises of a number of functional blocks that provide the system the capability for data acquisition from diverse sources, doing pre-processing (e.g. cleansing, validation) etc. on this data, storing the data, doing processing and analytics on this stored data (e.g. doing predictive analytics, generating recommendations for online uses and so on), and finally presenting and visualizing the summarized and aggregated results.
This paper presented various performance considerations, which can act as guidelines to build a high performance big data and analytics systems. Big Data and Analytics systems can be very complex because of multiple reasons. To meet the performance requirements of such system, it is necessary that the system is designed and built from grounds up, to meet these performance requirements. This paper presented such guidelines, which should be followed during different stages of a Big Data system, including how security requirements can impact performance of a big data system.