Spark Data Hub

The Data Hub for Apache Spark is purpose built data cluster solution for analyzing large amount of data using modern data processing engine. It utilizes Apache Spark, which a modern unified framework for processing large amount of batch as well as streaming data with implicit data parallelism and fault tolerance.

Overview

The Spark Data Hub is intended to work as a compute cluster for data processing. It supports a rich set of higher-level tools including SQL as well as rich Data Frame APIs that can be used for structured data processing. It supports a variety of data caching options that enhances performance, protects data sources from costly queries, and/or reusing complex data combinations and transformations for a faster end result.

Features

Speed: Achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine.
Ease of Use: Offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python, R, and SQL shells.
Generality: Allows for combining SQL with complex analytics. Allows large scale data caching to client applications.

Architecture

Invariant Data Hub runs as independent sets of processes within a cluster, coordinated by the SparkContext. It works with a variety of popular cluster manager including a local standalone cluster for development as well as YARN and Kubernetes for production.

Datahub runs on Linux servers – CentOS 7.x, RedHat 7.x. The web based console can be used to monitor the task execution and can be accessed using Google Chrome and Microsoft Edge web browsers.

Benefits

Data Hub provides data processing and analysis support for diverse workloads with a variety of data formats . It does the heavy lifting of organizing and transforming data for delivery to applications.

Key benefits

Use computational power of the cluster for fast analytics
Handle variety of data storage formats and store them in a format suitable for aggregations

Spark Data Hub

Spark Data Hub

Overview

Architecture

Benefits

Cookie Policy