A brief introduction to Big Data

 Big Data: Concepts, Applications, & Challenge

Any organization either public or private rely on accurate data analytic to take decisions. The utilization of big data is pivotal for driving organization extract value from a large amount of data. Big data is a method and technique to retrieve, collect, managed analyze a very huge volume of both structure and unstructured data that is difficult to process using traditional database which entail new technologies and technique to analyze them. This post covers the big data with the intentions to find out the concept, its applications and challenges from literature analysis as well as discussing and reviewing interpretations from these findings along with possible recommendations.

An Introduction

Big data is a term for large and complex sets of data in which traditional methods of processing data are insufficient. This essay will cover about the big data as a whole (i.e., the big picture), along with the intentions to find out the origins of the concept, its applications and challenges from findings resulted from cited sources and publications as well as discussing and reviewing interpretations from these findings along with possible recommendations. Big data need to be analyzed to gain its value either the trends, patterns or behavior anything related to the people or customers. Though data comes from the rapid growth of volume, yet it does rapidly and efficiently processing those data refers to velocity of data processing. Big data analytic leads to more precise analysis thus helps to bring more accurate decision-making and better performance. Big data are collected either through structured or unstructured data sources (online or offline). Unstructured data can come from social media (FB, Instagram, Twitter posts,). While, structured data sources can come from internal database of organization. In business, both sources are used to understand the patterns of the customers. Indeed, an organization nowdays relies the fact that any data could be analyzed and used to reveal patterns of their customers. In other words, big data will help the organization to understand the behavior of their customers and use it to win a competition.

Even though business organizations are still in early stage of perceiving big data as an asset, public agencies are still struggling with the issue of open data, whereas science and technology are exploring the potentials of big data and its innovation, yet general public are keep producing a huge amount of data in daily basis poses, challenges for all organizations. An organization faces the fact that the reality of big data that can affect their competitiveness. The aim of this study is to examine fundamental concept, applications, and challenges that are closely related to big data in organization.

Data Structures

We must first understand the new types of data structures. Traditionally, we have been focused towards structured and unstructured data. Structured data is that which is contained in relational databases and spreadsheets. Structured data conforms to a database model having a fixed structure of format of capturing data. Database tools and additional reporting and analyzing tools tools have been used to help analyze this data and creating meaningful reports.

Unstructured data doesn't have a pre-defined data model nor is it organized in a predefined manner. It is typically text heavy and may contain data such as dates, numbers, and facts and include untagged data representing photos and graphic images. Word processing documents, presentations, and PDF files are prime examples of unstructured data.

New data structures that have come up are semi-structured data and quasi-structured data.

Semi-structured data is not the raw data and is not stored in a conventional database system. It is structured data but is not organized in a rational model like a table or an object-based graph. Semi-structured data contains tags or markers to separate semantic elements and enforce hierarchies of records and fields within the data. The entities belonging to the same class may have different attributes even though they are grouped together irrespective of the attributes' order. Markup languages like XML, email, and EDI are forms of semistructured data. These support nested or hierarchical data simplifying the data models representing complex relationships between entities. These also support the lists of objects that simplify data models by avoiding messy translations of lists into a relational data model.

Quasi-structured data is more of a textual data with erratic data formats. It can be formatted with efforts, tools, and time. This data type includes web clickstream data such as Google searches. Other examples are pasted texts which yields a network map based on similarity of language within the text, as well as proximity of words to each other within the text. However, it's not "tagged" the way Youtube and Flickr track content in images. Generally, to get untagged or image-based textual data, try an algorithm to analyze it and refine it based on the results you get.
It is believed that actual structured data is only 5% of the total data and therefore you need better ways to analyze the remaining 95%. Traditional database analysis or searching standard texts doesn't help complete the overall analysis.
Some people continue to classify data into structured and unstructured data only with semi-structured and quasi-structured as sub-types to unstructured to avoid confusion. More important is that organizations are realizing the benefits of analyzing data beyond databases and therefore moving beyond MIS reports.

Big Data is not just Volume

From the IT prospective, the first thing everyone wants to discuss when discussing data is the size of data. How large is it? How much physical storage do you need? When it comes to renaming this data as Big Data, the name itself implies that we are talking about really large volumes of data.

Data needs to be meaningful and should create meaningful results for the enterprise. Therefore, it is important to understand the characteristics of this Big Data so that ways and means of analyzing this data are better developed. It would also help in defining what result an enterprise should expect from this data.

Researchers defined the first characteristic model for understanding this by using the following three V's:

1. Volume: The volume defines the size of the data that gets collected and sorted. The concern is not only in terms of the storage required to store this but also the resources required for processing this huge amount of data irrespective of the source of the data and generate a real-time result from it.

2. Velocity: Data generation has changed from the traditional applications like invoicing or production where the data gets generated only during production hours and is restricted to how many invoices a day or how much production a day.

3. Variety: The volume is not coming from the structured data or database-based applications only. Datasets have a lot of new formats. Social media is one of the most important new varieties that have emerged

Data Quality Dimensions

Big Data often starts the discussion about the new dimensions defined for data. These need to be handled in a different way than just handling Big Data. These new challenges are:

1. Real-time data: This data is different from the traditional form of data that we store on our servers. It doesn't matter whether this falls under structured or unstructured data type. The key aspect is that this is about the "current data" not the old data. It enables situational awareness on what is happening now. Real-time data raises the issue of perishable and orphaned data which no longer has valid use cases but continues to be in use nonetheless.

2. Shared data: This deals with the information that is shared across the organization. This includes sharing information between various applications and data sources. To share information efficiently, enterprises need to ensure that the data is consistent, usable, and extensible. 

3. Linked data: This comes from the various data sources that have relationships with each other and maintain this context so as to be useful to humans and computers. Once a user links the data, relationship in that data persists from that point onwards.

4. High-fiedelity data: This data preserves the context, detail, relationships, and identities of important business information. This is largely done through the embedded metadata. High-fidelity data allows new meaning to be added without destroying the previous meaning of the data.

Comments

Popular posts from this blog

The Morph Concept in 2025: From Vision to Emerging Reality

Mortgage Train 2025

Web Train 2025: Locomotives