Big Data; Analysis Phase
Analyzing the Data
Since IT came into existence, data always needed methods to analyze it and generate meaningful information from it. Large volumes of data being generated with reports being created out of them have been the traditional approach for businesses to operate and expand.
Database Analytics
The most basic type is database analysis wherein data is stored in the fixed row and column format of the tables and table in the database. Programmers would run queries on these tables to get the desired results and use another tool to present this data in a more logical manner. Reporting tools would help with graphs and so forth as well.
Database analytics continues to have relevance. Although new analytic dimensions are emerging, database analytics would never lose its charm as this is more towards analyzing the static data and getting reports out of it. Examples of database types are:
1. Relational: stores data in rows, and columns. Parent-child relationship can be joined remotely on the server, providing speed over scale. This is the type of database where organizations usually start. These are good when you have highly structured data and you know what you will be storing.
2. Document: These databases store data in documents storing parent-child records in the same document. The server is aware of the fields stored within the document, can query on them, and return their properties selectively. These databases are good when your concept of record has relatively bounded growth and can store all related properties in the same document: MongoDB, CouchDB, and BigCouch, among others.
3. Big-Table inspired: This database type stores the data into column-oriented stores inspired by Google's BigTable paper. It has tunable CAP parameters adjustable to provide either consistency or availability. Both of these adjustments are operationally intensive. These databases are good when you need consistency and write performance that scales past the capabilities of a single machine. Hbase is one such database type, which can help analyze data across 100 nodes in production.
4. Graph: These databases use graph structures with nodes, edges, and properties to represent and store the data. There is no index adjacency. Every element contains a direct pointer to the adjacent element and no index lookups are necessary.
5. NewSQL: These databases are nearly like relational databases except that these offer high performance and scalability while preserving the traditional notions. Capable of high throughput online transaction processing requirements, these databases are the scalable version of a relational database that handles queries more efficiently. VoltDB and SQLfire are examples of this database type.
Real-time Analytics
Times have changed. Businesses cannot rely only on past performances to plan their future. They need to capture the current trends and the needs of the consumer today. Analytics have changed from analyzing past data to performing analytics on real-time data, generated not only from the structured in-house databases but also from non-structured data generated through social media and consumer behavior.
Tools are available today to collect data from these sources and put them together through what is called the process of "Data Conditioning" into databases to help analyze them. All of this data gets processed real time to produce near-instant results to help businesses serve their consumers better.
Many people ask if there is actually a need for real-time analysis. How would it help them in their business and is it worth the kind of investment it needs to get into real-time analysis?
You can analyze your business to see how you can use it for your business development. We can cite a few small examples of different streams that organizations have used to see the benefits. Let us start with healthcare - a small device worn on a waist belt injects desired quantities of insulin to a diabetic patient while monitoring the blood sugar level through the device.
Someone walking near the store gets a message on additional offers on their favorite product for the next 30 minutes. If you reject a product from the shelf, you get a message of additional offers on the product. A win-win for the store and the consumer.
Predictive Analytics
Organizations no longer ask for just analysis of their data. Traditional Business Intelligence (BI) tools have been doing that for a long time. What they are exploring is getting more useful insights from the data from visualization tools and predictive analysis to explore data in new ways and discover new patterns.
Predictive Analysis is the practice of extracting information from existing data to determine patterns and predict future outcomes and trends. It helps forecast what might happen in the future with an acceptable level of reliability and includes what-if scenarios and risk assessment.
Applied to business, predictive analysis models are used to analyze current data and historical facts to better understand customers, products, and partners and to identify potential risks and opportunities for a company. It uses a number of techniques, including data mining, statistical modeling, and machine learning to help analysts make business forecasts.
All businesses are run at a risk. Risk of the way business is managed. Every decision an organization takes impacts the risks an enterprise can withstand, i.e., the risk of customer defection, of not responding to an expensive glossy mailer, or offering a huge retention discount to a customer who was not leaving while missing out on a critical customer who leaves
The data-driven means to compute risk of any type of negative outcome in general is predictive analysis. Insurance companies have used this very well, augmenting their practices by integrating predictive analysis in order to improve pricing and selection decisions.
The actuarial methods that enable an insurance company to conduct its core business perform the very same function as predictive models, rating customers by chance of positive or negative outcomes. Predictive modeling improves on standard actuarial methods by incorporating additional analytical automation and by generalizing to a broader set of customer variables.
Misconceptions about Big Data
Big Data has been around for some time now. However, not everyone knows everything about it. Several IT leaders have their concerns and doubts about Big Data technology.
Let us try and understand some of those and see if they really are of any concern:
1. 80% of all data is unstructured: There wouldn't be any patterns to discover in data if it had no structure. Use of non-relational databases like NoSQL and graph databases helps create the structures and the patterns for most data types.
2. Advanced analytics is just an advanced version of Normal Analytics: Static reports from the static databases.
3. Embedded analytics solves all problems: Embedded analytics are the standard set of tools or reports embedded over the data and datasets.
4. Improved tools will replace the Data Scientist: Regardless of the type of tool or advancement of the tools, you would need data scientists/analysts to use these tools to perform the analysis and get dynamic reports.
5. Data Scientists need high-level education: We believe education solely doesn't help if application of mind and logic is not done properly. It is more important to be more logical and understand the business needs.
6. We can predict everything with Big Data: While we can use big data to form patterns and predict many things, big data cannot predict everything. Hospitals can analyze which kind of people are at higher risk of heart ailments so that precautions can be taken but many things in more complex domains such as law and politics cannot be predicted.
7. Big Data isn't biased: Data is always biased regardless of the volume or data source. Data is a result of certain measurements and was collected with some purpose.
Comments
Post a Comment