Today’s data-driven companies require timely access to critical data. For companies operating in dynamic data environments there is a tension between ensuring the operational performance of data analytics systems and satisfying the needs of data consumers for timely reports. Data warehouses are one solution, but they require ETL (extract, transform, load) workloads to ingest data, introducing additional operational costs.
This problem prompted product engineers of Intertrust Platform, the edge-to-cloud solution for zero trust architectures, to develop materialized datasets. With this new feature, system administrators and data stewards can take advantage of the flexibility and operational cost savings associated with data virtualization, while reducing the query load on production databases that are used to generate data-driven analytics and reports. The end result is that reports needed to support critical business transactions can be generated in a timely manner without overloading systems with computationally expensive analytical queries generated by data scientists and business analysts.
Data virtualization technologies can make it simpler for system administrators to run analytical queries on their production databases. However, these queries can increase the load on these databases, making system administrators reluctant to adopt data virtualization. This creates data friction for analysts, who need to wait until data engineers can implement workloads or ETL pipelines to populate a data warehouse to get the latest data. Materialized datasets not only reduce this data friction, it gives data stewards the ability to optimize queries on their own without having to rely on system administrators, further streamlining the process for generating reports and analytics.
With Intertrust Platform’s materialized dataset feature, data from connected data sources can be physically cached within the Platform. Leveraging the Platform’s SQL-driven dataset definitions, data managers can compose datasets tailored to the needs of their end users, including joining multiple data sources located in different physical infrastructures. The new self-service UI included with the feature allows anyone who understands SQL to perform these tasks. All queries against materialized datasets are processed within the Platform, eliminating analytical overhead – and resulting performance hits – on the underlying databases. Datasets may be cached once, or set to refresh on a recurring basis.
“Data virtualization technology offers an agile approach to data operations. However, market adoption of this technology has been stymied because of the performance impacts caused by the increased load on connected databases,” said Chris Kalima, Vice President of Product Management at Intertrust. “The materialized datasets feature addresses these concerns and eliminates the time and cost associated with developing ETL pipelines, reducing friction in the process of delivering valuable data.”
Intertrust Platform and the materialized datasets feature is currently used by a major Asian energy company to generate daily personnel reports from multiple business units and multiple separate databases. The company used Intertrust Platform to create a materialized dataset by virtually joining the necessary data and refreshing it daily during off business hours. By doing so, the time to generate a report was greatly reduced, going from 30 minutes to 3 seconds.
The materialized datasets feature is now available.