Data governance is the framework that guides an organization’s approach to collecting, storing, processing, and securing their data. Data governance protocols allow companies to better adhere to business and regulatory rules, protect their data, and enable agile data operations to deliver greater business benefits. However, approaches to data governance can vary considerably.
In this blog, we’ll look at the two that are most commonly applied: passive, or traditional, data governance and active data governance. In short, here’s what they mean:
- Passive data governance: With this approach, data is first input by users. Then, business and governance rules are applied to the data afterwards. This includes cleaning operations, identifying and removing duplicates, and creating exceptions.
- Active data governance: Here the goal is to assess and verify data quality before it is input, thus removing the strain on resources later in the life cycle. It uses AI or machine learning to improve data cataloging, and takes a more proactive stance of assessing data quality at the point of collection rather than simply seeking to ingest as large a quantity as possible.
The choice between which approach an organization takes can have a significant impact on their success in achieving their data governance goals. The differences between them delineate how data governance is enforced at critical stages of the data lifecycle.
Differences between passive vs. active data governance
The differences between passive and active data governance frameworks center on whether data governance is performed retroactively to existing data or proactively along its lifecycle. There is not necessarily a right or wrong approach, but active data governance is geared towards creating greater agility in DataOps. Below are the major differences between the two approaches.
With traditional data governance, a quantity over quality approach is often taken, with the presumption that issues with the data will be addressed later on. Active data governance seeks to assess data from before it enters the system through a variety of measures. These include working more closely with users on defining proper data collection and deploying automated systems to intelligently identify data quality issues being replicated in collection.
Data rules and dictionaries
Data governance relies on pre-decided rules being consistently applied across all data operations. With passive data governance, this happens in the form of manually updated data dictionaries, terminology glossaries, and data catalogs. Active data governance also applies these governance rules but integrates technologies to allow for automatic building and optimizing of rule repositories. This ensures their consistent application across all the organization’s data.
Passive data governance seeks to identify and correct problems with data quality in the system. While this is a positive undertaking, it is still reactive rather than proactive. An active data governance framework seeks to identify how a data quality issue arose in the first place and track it from where it is found to where it occurred so that the issue won’t be consistently replicated. This action may take more time than simply fixing the immediate mistakes but will deliver consistent gains over time.
Modern data storage mostly consists of hybrid cloud and on-site or multi-cloud setups. Passive data governance applies relevant governance rules within these siloes, meaning that data quality may still be high but at the cost of both extra work and data duplication. In this instance, the goal of active data governance is to provide a comprehensive overview for administrators across the entirety of their fragmented ecosystem. Only with complete end-to-end visibility can uniform application of data governance protocols be ensured.
Data auditing and tracking
Siloes and lack of uniform and effective data cataloging prevent the formation of complete data lineages. This can have a major impact on operations such as composing datasets for analysis as well as data auditing, an essential part of data safety and ensuring regulatory compliance.
Under a passive data governance model, the lack of prior efforts to prevent duplication or ineffective cataloging mean that auditing costs more in terms of time and resources. Active data governance attempts to create a holistic plan for the entire data lifecycle by incorporating automated data tracking and lineage tools, along with more effective cataloging processes. This means that both data location and auditing have already been assisted by the pre-planning and ongoing work of the whole system.
Active data governance is agile
Data governance frameworks are critical for enabling enterprises to maintain consistency in data operations, ensuring regulatory compliance and business returns. In data governance, however, there are two different approaches: passive and active data governance. Passive data governance takes a retroactive and reactive approach, preferring to ingest all data possible and then apply business rules to data once it is held. Active data governance on the other hand seeks to ensure data quality and effective cataloging right from the beginning of its lifecycle.
Active data governance looks to create agile data operations and give end-to-end data visibility to administrators. One tool that can enable this is the virtualized data platform. By creating a virtualized data layer over all of an organization’s data assets, governance protocols can be proactively applied to all data, wherever it is. The integration of automated tools also allows for uniform cataloging and application of business rules, while simultaneously reducing strain on DataOps resources.
About Prateek Panda
Prateek Panda is Director of Marketing at Intertrust Technologies and leads global marketing for Intertrust’s device identity solutions. His expertise in product marketing and product management stem from his experience as the founder of a cybersecurity company with products in the mobile application security space.