Big Data Ecosystem

 Ecosystem of Big Data

The rapid development of digital technologies, IoT products and connectivity platforms, social networking applications, video, audio and geolocation services has created opportunities for collecting/accumulating a large amount of data. While in the past corporations used to deal with static, centrally stored data collected from various sources, with the birth of the web and cloud services, cloud computing is rapidly overtaking the traditional in-house system as a reliable, scalable and cost-effective IT solution. The high volumes of structures and unstructured data, stored in a distributed manner, and the wide variety of data sources pose problems related to data/knowledge representation and integration, data querying, business anaylsis and knowledge discovery.

In 2001, in an attempt to characterize and visualize the changes that are likely to emerge in the future, Douglas Laney of META Group (Gartner now) proposed three dimensions that characterize the challenges and opportunities of increasingly large data: Volume, Velocity, and Variety, known as the 3 Vs of big data. Thus, according to Gartner:

"Big data" is high-volume, velocity, and variety information assessts that demand cost-effective, innovative forms of information processing for enhanced insights and decision making.

According to Maniyka et al. this definition is intentionally subjective and incorporates a moving definition of how big a dataset needs to be in order to be considered big data. Along this lines, big data to Amazon or Google is quite different from big data to a medium-sized insurance or telecommunications organization. Hence, many different definitions have emerged over time, but in general, it refers to "datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze" and technologies that address "data management challenges" and process and analyze data to uncover valuable information that can benefit businesses and organizations. Additional "Vs" of data have added over the years, but Volume, Velocity, and Variety are the tree main dimensions that characterize the data.

The volume dimension refers to the largeness of the data. The data size in a big data ecosystem can range from dozens of terabytes to a few zettabytes and is still growing. In 2010, the McKinsey Global Institute estimated that enterprises globally stored more than 7 exabytes of new data on disk drives, while consumers stored more than 6 exabytes of new data on devices such as PCs and notebooks.

The velocity dimension refers to the increasing speed at which big data is created and the increasing speed at which the data need to be stored and analysed, while the variety dimension refers to increased diversity of data types.

Variety introduces additional complexity to data processing as more kinds of data need to be processed, combined and stored. While the 3 Vs have been continuously used to describe big data, the additional dimensions of veracity and value have been added to describe data integrity and quality, in what is called the 5 Vs of big data. More Vs have been introduced, including validity, vulnerability, volatity, and visualization, which sums up to the 10 Vs of big data. Regardless of how many descriptors are isolted when describing the nature of big data, it is abundantly clear that the nature of big data is highly complex and that it, as such, requires special technical solutions for every step in the data workflow.

Big Data Ecosystem

The term Ecosystem is defined in scientific literature as a complex network or interconnected systems. While in the past corporations used to deal with static, centrally stored data collected from various sources, with the birth of the web and cloud services, cloud computing is rapidly overtaking the traditional in-house system as a reliable, scalable and cost-effective IT solution. Thus, large dataset - log files, social media sentiments, click-streams - are no longer expected to reside within a central server or within a fixed place in the cloud. 

Architecture Framework and Components for the Big Data Ecosystem

Big Data are becoming a new technology focus both in science and in industry and motivate technology shift to data centric architecture and operational models. There is a vital need to define the basic information/semantic models, architecture components and operational models that may originate from different scientific, industry and social activity domains and proposes improved Big Data definition that includes the following parts: Big Data properties (also called Big Data 5V: Volume, Velocity, Variety, Value and Veracity), data data models and structures, data analytic, infrastructure and security. The essay discusses paradigm change from traditional host or service based to data centric architecture and operational models in Big Data. The Big Data Architecture Framework (BDAF) is proposed to address all aspects of the Big Data Ecosystem and includes the following components: Big Data Infrastructure, Big Data Analytics, Data structures and models, Big Data Lifecycle Management, Big Data Security. The paper analyses requirements to and provides suggestions how the mentioned above components can address the main Big Data challenges. The presented work intends to provide a consolidated view of the Big Data phenomena and related challenges to modern technologies, and initiate wide discussion.

Big Data are becoming related to almost all aspects of human activity from just recording events to research, design, production and digital services or products delivery to the final consumer. Current technologies such as Cloud Computing and ubiquitous network connectivity provide a platform for automation of all processes in data collection, storing, processing and visualization.
The goal of our research at current stage is to understand the nature of Big Data, their main features, trends and new possibilities in Big Data technologies development, identify the security issues and problems related to the specific Big Data properties, and based on this to review architecture models and propose a consistent approach to defining the Big Data architecture/solutions to resolve existing challenges and known issues/problems.

Big Data Landscape

For the uninitiated, the Big Data landscape can be daunting. The vast proliferation of technologies in this competitive market mean there's no single go-to solution when you begin to build your Big Data architecture. In this series of articles, we will examine the Big Data ecosystem, and the multivarious technologies that exist to help enterprises harness their data. This first article aims to serve as a basic map, a brief overview of the main options available for those taking the first steps into the vastly profitable realm of Big Data and Analytics.
Ultimately, a Big Data environment should allow you to store, process, analyse and visualise data. It starts with the infrastructure, and selecting the right tools for storing, processing and often analysing. There are then specialised analytics tools to help you find the insights within the data. Further on from this, there are also applications which run off the processed, analysed data. All of these are valuable components of the Big Data ecosystem.
Infrastructure

Infrastructural technologies are the core of the Big Data ecosystem. They process, store and often also analyse data. For decades, enterprises relied on relational databases - typical collections rows and tables- for processing structured data. However, the volume, velocity and variety of data mean that relational databases often cannot deliver the performance and latency required to handle large, complex data. The rise of unstructured data in particular meant that that data capture had to move beyond merely rows and tables. Thus new infrastructural technologies emerged, capable of wrangling a vast variety of data, and making it possible to run applications on systems with thousands of nodes, potentially involving thousands of terabytes of data.

Comments

Popular posts from this blog

The Morph Concept in 2025: From Vision to Emerging Reality

Mortgage Train 2025

Web Train 2025: Locomotives