Halfway to Data Access Governance

If you’re a Collibra customer, chances are you’re already made significant progress in establishing an expansive data access governance program even if that wasn’t your original intent.

The journey has been marked by milestones that include gaining a landscape view of where your data assets reside and ensuring that data stewards properly manage them.

The three components of a scalable data access governance foundation

The foundation required for scalable data access governance consists of three components: data classification, data catalog, and data governance. 

Data classification is an integral part of the data discovery process. It helps you uncover then categorize the sensitive data in your ecosystem. 

Data catalog, on the other hand, creates a unified view of the sensitive data, cutting across data silos. This ensures that you are protecting data on an organization-wide scale. 

And data governance helps teams understand the context of how data should be used, and therefore, protected. It also helps you understand the stakeholders that should be engaged when creating policies.

What are the challenges of building this foundation?

So you’ve decided that it’s time to grant broader access and protection of data? Before you can even get to policy creation and maintenance, have you worked through the necessary preparation to ensure you can deploy a scalable data access governance program? Here are three key challenges in the foundational phase that organizations need to think about:

  • Uncertainty on where to start: organizations typically want to apply protection to everything. There is a lack of prioritization on what data needs to be protected and likely a misalignment between security and business objectives in handling data. When a blanket policy is applied to shield data from all potential harm and exposure, we also limit its usability and value to the business.
  • Unsure on what’s the scope of data: data leaders will likely have a sense of where some of the data is residing. But before kicking off a large-scale project to govern access to the organization’s data, we need to ask ourselves if we know the volume and scope of all the data under our preview. 
  • Unclear who should be responsible: there is no one formula to ensure that the best policies are put into place. Policy decisions should be a collaborative process between policy administrators who understand the data standards to uphold and the domain owners who best understand their data. Policies determined without this collaborative process are likely to be rigid and prevent the business from realizing the full value of their data. 

Data classification for data access governance

What is automatic data classification?

Automatic data classification is a ML-driven process of analyzing and labeling the content of data based on a subset of data itself, helping users understand what kinds of data they have and risk associated with the data. 

Data classification automatically assigns “data classes” or labels to individual columns of data to identify what kind of data is contained in that column. These labels can be “name”, “address”, and “email” for example.

What classification outputs are useful for data access governance implementation?

The data classification engine can yield different types of outputs. To assist in policy creation, the following outputs are the most useful to generate and utilize.

Classification labels

The label identifies the contents of the column and assigns it to a pre-defined class. Labels are sometimes referred to as categories or sometimes just classes.

  • Semantic label: a generic descriptor for the column header, using an assigned value chosen from a library of approved data classes. In this case, both “email_address” and “C_email” column headers will be assigned the label of “email.” 
  • Sensitivity label: a label that indicates the level of sensitivity of contents in the data column. Examples of sensitivity labels include “private”, “public”, and “restricted”. 
  • Custom labels: you can choose to create custom labels to deliver more granular controls of data. For example, regulatory labels can be assigned to data columns, indicating if the contents fall under the scope of “GDPR” or should be considered “PI”.

Classification context 

Context provides additional information about the columns that may not be captured under the label. 

  • Line of business context: this information indicates which department owns the data. Examples include finance, sales, and retail operations.
  • Entity context: this information describes the data subject category that the data is about such as customer, employee, or third-party vendor. 

Classification hierarchy 

Hierarchy describes the structure of the data and how different columns of data are linked under that structure. To keep this idea simple, we’ve introduced two layers in this hierarchy: parent and child. 

  • Parent layer: this classifies a column into the top layer, using an assigned value that is shared with other, related columns. Examples include biometric data, financial data, payment card information.
  • Child layer: this classifies a column into its most granular layer, using an assigned value that reflects the individual contents of that column. “Credit card number,” “cardholder name,” and “security code” all roll into the top layer of “payment card information.”

Why is automatic data classification important?

  • Understand content of your data columns: you understand what data you have and where it’s located, which provides you with guidance on where to direct your policies.
  • Understand the risks of the data you possess: the risk associated with the data informs you on how data should be treated, meaning you may decide to simply gate access to data or apply masking to the data column.
  • Automate and scale protection of data: in addition to protecting existing data, all incoming data should be automatically classified. The classification will apply to the new data, ensuring that it is governed under an existing policy without extra work on your end to create and update policies for incoming data. 

Data catalog for data access governance

What is a data catalog?

A data catalog inventories and organizes all of a company’s data assets. It uses metadata to help data teams discover, understand, and manage their data across their ecosystem.

Why is a data catalog important?

A data catalog helps data leaders easily find, understand and trust their data. It provides a comprehensive view across the entire data landscape, as well as accelerates time to implement data access governance. 

  • Comprehensive view across the entire data landscape: a data catalog allows data leaders to view data across their entire data landscape. This enables data leaders to know where their sensitive data is stored so that they can focus on protecting that data. You can only protect the data that you’re aware of. 
  • Accelerate time to implement data access governance: a comprehensive view of the entire data landscape enables data leaders to seamlessly identify sensitive data and quickly deploy policies to protect this data. 

Data governance for data access governance

What is data governance?

Data governance is the process of managing data so that data can be used as a consistent, trusted and secure asset that meets organizational policies. Data teams should have a single platform to govern data, bridging departmental silos to provide consistent understanding and proper handling of data. 

Why is data governance important?

Data governance is important because it ensures trust across all data assets. It provides context to policies, helps data leaders identify owners and relevant stakeholders of data, and define data definitions. 

  • Provides context to policies: document and inform others on how data should be used, what groups or third parties should get access to it, and what retention policies apply to what data.
  • Identifies owners and relevant stakeholders of data: designate the owner, data steward, subject matter expert, or other stakeholders of data assets. With a federated governance model, you’ll have policy administrators who are in charge of managing the overarching policies. You’ll also have domain leaders in different departments such as marketing, sales, and finance who are responsible for and know how best to extract value from their data. 
  • Defines data definitions: what is sensitive data? What is financial data? Being able to define and agree on what constitutes certain classes of data helps ensure that organizations can apply appropriate and consistent controls to data across the organization. Even as new classes of data are created or new methods of interpreting data are adopted, the protection and utility of data can be maintained

Conclusion

In order to ensure scalable data access governance, you must lay the groundwork by implementing data classification, data catalog and data governance across your entire organization. 

Data classification, with data discovery, helps us uncover, then categorize, the sensitive data in the data ecosystem. It supplies the context to help us create more nuanced policies that enable quick and compliant access to data.

In addition to data classification, data catalog creates a unified view of the sensitive data, cutting across data silos, so we’re protecting data on an organization-wide scale. It helps us also know what to prioritize. We know the quantity of tables, we know where they are located. This means we can deploy policies to address them. 

And data governance helps us understand who are the stewards and domain owners of data. It tells us who to engage when creating policies that impact data of specific teams. 

Classification, catalog, and governance combined become the groundwork for data access governance implementation. Many of our customers have built that groundwork. They’ve laid the groundwork and now we have the knowledge to build smarter, more effective policies. 

Related resources

Blog

Introducing Collibra Protect for Snowflake

View all resources

More stories like this one

Mar 28, 2024 - 3 min read

Ensuring data reliability for AI-driven success: The critical role of data...

Read more
Arrow
Mar 26, 2024 - 5 min read

Migrating data to the cloud? Don’t neglect change management

Read more
Arrow
Mar 12, 2024 - 4 min read

Accelerating your transition from traditional BI to advanced analytics with data...

Read more
Arrow