Data Platform

Invariant Platform and Discovery provides tools and utilities for fast data ingestion and schema management to bring in data into the Invariant platform for ad-hoc analysis and reports development. The data platform is built using Apache Hadoop where HDFS is used as a distributed store. Hive can then be used to map and query this data. The data can be ingested using periodic batch pull or via a continuous streaming channel.

Discovery automates much of the laborious tasks of data mapping and movement. This frees up the data analysts to focus more on the downstream ad-hoc discovery tasks including data shaping and transformation to fit their business needs.

Product Overview

Discovery leverages the Invariant data platform and use Source DB Schema metadata and HCatalog to manage the schema. The discovery pipeline maps source data streams from Kafka streams and JMS queues and loads the data into target location in HDFS. Data from multiple source systems can be ingested and managed, with user defined functions and rules applied, to build the warehouse for discovery.

Discovery inventory management components are designed to help you keep your data in sync. Utilities to generate required configuration, workflows and reports allows the data engineers to keep on top of data pipeline management tasks.

Architecture

The data platform uses Hadoop which is built using a Master-Worker architecture. The resource manager is used to allocate resources and the worker nodes perform the actually data crunching tasks.

Discovery pipeline and inventory management services run on the edge node and keeps track of the source and target metadata and data mapping. The command line utilities can be used to generate pipeline mapping configuration as well as target DDLs for Hive. Once configured, the services run in the background and connect to Kafka or other queue based data sources to collect the data streamed in. For source which do not support streaming, discovery can periodically pull the data and merge it into target stores. Discovery also supports Apache Oozie workflows, allowing it to participate with broader enterprise level data pipeline.

Benefits

Platform provides data storage and compute needs for diverse workloads and different data formats . It does the heavy lifting of organizing and transforming data for delivery to applications and data marts.

Key benefits

Store petabytes of data in low-cost storage
Use computational power of the cluster to process data in bulk
Handle variety of data storage formats and store them in a format suitable for aggregations

Invariant platform can really help you with managing your big data and fast data. Speak to our associates to find out more.

Find out more

Data Platform

Product Overview

Architecture

Benefits

Cookie Policy