Big Data Architecture

When you need to ingest, process and analyze data sets that are too sizable and/or complex for conventional relational databases, the solution is technology organized into a structure, called a Big Data architecture. Use cases include:

1. Storage and processing of data in very large volumes: generally, anything over 100 GB in size.

2. Aggregation and transformation of large sets of unstructured data for analysis and reporting.

3. The capture, processing, and analysis of streaming data in real-time or near-real-time.

Components of Big Data Architecture


Big Data architectures have a number of layers or components. These are the most common:

1. Data sources

Data is sourced from multiple inputs in a variety of formats, including both structured and unstructured. Sources include relational databases allied with applications such as ERP or CRM, data warehouses, mobile devices, social media, email, and real-time streaming data inputs such as IoT devices. Data can be ingested in batch mode or in real-time.

2. Data Storage

This is the data receiving layer, which ingest data, stores it, and converts unstructured data into a format analytic tools can work with. Structured data is often stored in a relational database, while unstructured data can be housed in a NoSQL database such as MongoDB Atlas. A specialized distributed system like Hadoop Distributed File System (HDFS) is a good option for high-volume batch processed data in various formats.


3. Batch processing with very large data sets, long-running batch jobs are required to filter, combine, and generally render the data usable for analysis. Source files are typically read and processed, with the output written to new files. Hadoop is a common solution for this.

4. Real-time message ingestion: this component focuses on categorizing the data for a smooth transition into the deeper layers of the environment. An architecture designed for real-time sources needs a mechanism to ingest and store real-time messages for stream processing. Messages can sometimes just be dropped into a folder, but in other cases, a message capture store is necessary for buffering and to enable scale-out processing, reliable deliver, and other queing requirements.

5. Stream processing: once captured, the real-time messages have to be filtered, aggregated, and otherwise prepared for analysis, after which they are written to an output sink. Options for this phase include Azure Stream Analytics, Apache Storm, and Apache Spark Streaming.

6. Analytical data store: the processed data can now be presented in a structured format - such as a relational data warehouse - for querying by analytical tools, as is the case with traditional business intelligence (BI) platforms. Other alternatives for serving the data are low-latency NoSQL technologies or an interactive Hive database.

7. Analysis and reporting: most Big Data platforms are geared to extracting business insights from the stored data via analysis and reporting. This requires multiple tools. Structured data is relatively easy to handle, while more advanced and specialized techniques are required for unstructured data. Data scientists may undertake interactive data exploration using various notebooks and tool-sets. A data modeling layer might also be included in the archiecture, which may also enable self-service BI using popular visualization and modeling techniques.
Analytics results are sent to the reporting component, which replicates them to various output systems for human viewers business processes and applications. After visualization into reports or dashboards, the analytic results are used for data-driven business decision making.

8. Orchestration: The cadence of Big Data analysis involves multiple data processing operations followed by data transformation, movement among sources and sinks, and loading of the prepared data into an analytical data store. These workflows can be automated with orchestration systems from Apache such as OOzie and Sqoop, or Azure Data Factory.

Comments

Popular posts from this blog

The Morph Concept in 2025: From Vision to Emerging Reality

Mortgage Train 2025

Web Train 2025: Locomotives