Invariant
  • Sign In

  • My Account
  • Signed in as:

  • filler@godaddy.com


  • My Account
  • Sign out

  • Home
  • Data
    • Hadoop Data Platform
    • Discovery Pipeline
    • Polyglot Data Manager
  • Analytics
    • Operational Insight
    • Process Insight
    • Spark DataHub
    • Data Science Notebook
  • Content
    • Content Engine
    • Content Insight
    • Document Flow
  • Docs
  • Support
  • Company
    • About Us
    • Contact Us
  • More
    • Home
    • Data
      • Hadoop Data Platform
      • Discovery Pipeline
      • Polyglot Data Manager
    • Analytics
      • Operational Insight
      • Process Insight
      • Spark DataHub
      • Data Science Notebook
    • Content
      • Content Engine
      • Content Insight
      • Document Flow
    • Docs
    • Support
    • Company
      • About Us
      • Contact Us
Invariant

Signed in as:

filler@godaddy.com

  • Home
  • Data
    • Hadoop Data Platform
    • Discovery Pipeline
    • Polyglot Data Manager
  • Analytics
    • Operational Insight
    • Process Insight
    • Spark DataHub
    • Data Science Notebook
  • Content
    • Content Engine
    • Content Insight
    • Document Flow
  • Docs
  • Support
  • Company
    • About Us
    • Contact Us

Account


  • My Account
  • Sign out


  • Sign In
  • My Account

Spark Data Hub

The Data Hub for Apache Spark is purpose built data cluster solution for analyzing large amount of data using modern data processing engine.  It utilizes Apache Spark,  which a modern unified framework  for processing large amount of batch as well as streaming data with implicit data parallelism and fault tolerance.


Overview

The Spark Data Hub is intended to work as a compute cluster for data processing. It supports a rich set of higher-level tools including SQL as well as rich Data Frame APIs that can be used for structured data processing. It supports a variety of data caching options that enhances performance, protects data sources from costly queries, and/or reusing complex data combinations and transformations for a faster end result.


Features

  • Speed:  Achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine.
  • Ease of Use:  Offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python, R, and SQL shells.
  • Generality: Allows for combining SQL with complex analytics. Allows large scale data caching to client applications.


Architecture

Invariant Data Hub runs as independent sets of processes within a cluster, coordinated by the SparkContext.  It works with a variety of popular cluster manager including a local standalone cluster for development as well as YARN and Kubernetes for production.


Datahub runs on Linux servers – CentOS 7.x, RedHat 7.x. The web based console can be used to monitor the task execution and can be accessed using Google Chrome and Microsoft Edge web browsers.


Benefits

Data Hub provides data processing and analysis support for diverse workloads with a variety of data formats . It does the heavy lifting of organizing and transforming data for delivery to applications.

Key benefits 

  • Use computational power of the cluster for fast analytics
  • Handle variety of data storage formats and store them in a format suitable for aggregations

Spark Data Hub


Copyright © 2021 Invariant LLC - All Rights Reserved.

  • Operational Insight
  • Process Insight
  • Spark DataHub
  • Data Science Notebook
  • Content Insight
  • Document Flow

Powered by

Cookie Policy

This website uses cookies. By continuing to use this site, you accept our use of cookies.

Accept & Close