The Data Hub for Apache Spark is purpose built data cluster solution for analyzing large amount of data using modern data processing engine. It utilizes Apache Spark, which a modern unified framework for processing large amount of batch as well as streaming data with implicit data parallelism and fault tolerance.
The Spark Data Hub is intended to work as a compute cluster for data processing. It supports a rich set of higher-level tools including SQL as well as rich Data Frame APIs that can be used for structured data processing. It supports a variety of data caching options that enhances performance, protects data sources from costly queries, and/or reusing complex data combinations and transformations for a faster end result.
Features
Invariant Data Hub runs as independent sets of processes within a cluster, coordinated by the SparkContext. It works with a variety of popular cluster manager including a local standalone cluster for development as well as YARN and Kubernetes for production.
Datahub runs on Linux servers – CentOS 7.x, RedHat 7.x. The web based console can be used to monitor the task execution and can be accessed using Google Chrome and Microsoft Edge web browsers.
Data Hub provides data processing and analysis support for diverse workloads with a variety of data formats . It does the heavy lifting of organizing and transforming data for delivery to applications.
Key benefits
Spark Data Hub
Copyright © 2021 Invariant LLC - All Rights Reserved.
This website uses cookies. By continuing to use this site, you accept our use of cookies.