Pular para o conteúdo principal

Falantes

  • Aaren Stubberfield Tiro na cabeça

    Aaren Stubberfield

    Manager of Data Governance and AI at Ingredion Inc

Saiba Mais

Treinar 2 ou mais pessoas?

Obtenha acesso à biblioteca completa do DataCamp, com relatórios, atribuições, projetos e muito mais centralizados
Experimente O DataCamp for BusinessPara uma solução sob medida , agende uma demonstração.

How Data Governance Enables Scalable Data Science

November 2021

Slide Deck

Additional Resources:

In data literate organizations, everyone has the access and the skills they need to work with data to make better decisions. No one department hoards the data, as data is democratized. With data democratization, all parts of a business, from the CEO to analysts, are equipped to make data-driven decisions. To achieve data democratization, data quality, and governance are crucial. Data governance can provide clarity to users about the data and help minimize business risk from data privacy laws. Data users can have confidence that the data they are using to make decisions is as expected. In this webinar, Aaren Stubberfield will outline the challenges and opportunities of data governance, and best practices organizations should look to adopt when starting out their data governance journeys.

Key Takeaways

  • Why data governance culture enables data democratization and strengthens organization-wide trust in data

  • The challenges and benefits of scaling data governance throughout the organization

  • Best practices when operationalizing data governance programs

Webinar Transcripts

The rise of data democratization motivates the need for data governance

So, let's start with the motivation of the presentation, and why data governance is essential. As many are aware, data-driven organizations have been a staple of the technology industry. Think of the Amazon, Netflix, Ubers, and Googles of the world. These are organizations born with a digital mind and were born with data as a top-of-mind priority of their leadership and people alike. For these companies, data was critical in decision-making. For example, Netflix has a whole research division that focuses on providing decisions and data to decision-makers with useful analytics and metrics. They partially decide what movies to move forward with and projects to pursue based on user data. 

Now, as digital services and products are becoming the de facto way that we interact with most services, we're moving toward a time where every organization needs to become data-driven. This transition is necessary for companies to stay competitive, and work to achieve their goals. This applies to financial, health care, and government institutions, and almost every other industry. There is a wide-ranging effort for organizations in all industries to become more data-driven. So we've seen many organizations invest in data teams and data science talent to alleviate these problems. However, the real challenge in data science democratization is democratizing the data itself.

So let's break down data democratization and what it means. In general, members of organizations are making hundreds of little and big decisions for our organization every day. It is a culmination of these decisions that drive the organization forward. When data science is democratized, all of these decisions are data-driven and organizations are able to move faster and be more agile due to these data decisions. Ultimately, data democratization means empowering people with the proper tools and culture to make data decisions or decisions based on data. The key to this is that the data needs to be accessible and of high quality. In summary, data democratization means everyone knows how to access and work with the data that they need for their job. Information is not ordered or owned by one department or team.

Let's back this up with some numbers and discuss the benefits of data democratization. In a recent 2019 Deloitte survey on becoming a data-driven organization, organizations that report having the strongest cultural orientation to data-driven insights and decision-making are twice as likely to have reported exceeding business goals in the past 12 months. 48% of these businesses say that they outperform their targets versus just 22% of those with a more diluted analytics culture. These are organizations that have invested a lot in making high-quality data accessible for all of their employees, empowering them with the skills to make data-driven decisions. A McKinsey white paper tells a similar story, where they analyze organizations that have matured high data competencies. One thing that they found is those high-performing organizations are 65% more likely to have provided access to high-quality data to their frontline employees.

Ultimately, when it comes to data democratization, the north star for any organization could be something like Airbnb. To support the data-driven culture. Airbnb has focused on these three areas. These areas include providing their employees with data education with a data university, data tools to handle their data such as Airflow, and data access by making a single source of truth within their data ecosystem and making it easier for employees to access that data. Airbnb has the data tools, organizational culture, and processes in place to democratize data throughout the organization, scaling the opportunity to use data science throughout the organization and not really limiting it to just one department or team.

So what are the key components of data democratization? The components of data democratization are about the tools, organization, and process. However, those are supported by the infrastructure and people. You need to ensure that data is collected, is discoverable, reliable, actionable, and compliant with all of the current laws and regulations. Coupled with that, organizations need to empower the people to build a data culture, where everyone understands the value of data, and ensure that people have the training to work with that data to do their best work. 

In our talk today, I want to focus on the intersection of people, infrastructure, and data governance, and how they contribute to democratizing data in an organization. Where data governance plays a role in scaling data science is that data governance and data quality are crucial for democratizing data science because they enable organizational-wide trust and data. It's pretty simple that business users will be hesitant in using their data for decision-making if they lack trust in that the data will be consistent or of high quality.

Data governance requires planning, monitoring, and enforcement

Let's now focus a little bit more on data governance. So, what is Data Governance? DAMA, the Data Management Association, defines data governance as an exercise of authority and control, including planning, monitoring, and enforcement, over the management of data assets. I'll pause here briefly and say that throughout this presentation, I pull elements from the DAMA’s Book of Data Management Body of Knowledge. It's really useful; it has a lot of useful information. Let's go back to the definition. What is data governance? As we look closer at each of those portions of that, we find:

  • Planning

  • Monitoring

  • Enforcement

Planning refers to the rules of a data set. For example, what is the format of the data fields? How many characters are used in that field? In addition, what encoding do you use if you're going to encode your string characters? In summary, it is a key business rule that you want your datasets to conform to.

When you get to monitoring, monitoring is about measuring the compliance to the business rules that we just talked about during the planning phase. For example, if one rule states that a field should not have any blank values, monitoring includes measuring what percentage of records have blank values, and ensuring the organization follows the rules that are set during the planning stage. 

Now, we come to enforcement. That's more focused on what to do when the rules are not followed. Different levels of remediation range from a record that needs to be corrected right away to a record corrected later. You might even decide that “I'm going to make a suggestion that we need to change this record at some point in the future.”

Planning

Let's try to get into this with one more example. Imagine we are a B2B company and our customers are other companies, and we want to develop a dataset of our customers. In the planning stage, we have decided on the metadata of a table that is important for the business. That's going to include which fields can be null and which cannot. If you look at the field CompanyIndustry, since these are predefined categories, the planning stage will also decide on what are the possible categories that we will fit in this table or that we will allow. In this example, I'm showing a SQL code that you might use to create a table, but the same concepts apply if you already have a dataset. 

Monitoring

Back to monitoring, this is again just talking about the measuring against the rules that we set in place for our dataset. In our example, maybe we decide that it is key and it’s crucial that we do not have any unknown companies in the CompanyIndustry field. Monitoring will include measuring the percentage of unknown companies with that record.

Enforcement

Finally, for enforcement, the question comes, what do we do to remediate a non-conforming record. In the example of CompanyIndustry, if we have deemed as mission-critical that we do not have any customer with an unknown value, maybe we might decide that we need to task someone in the business to investigate and update that record within a day. However, in the field DateFounded, you can see record two is missing any values. Maybe this is also along the line of the rules that we set during the planning phase. In this case, maybe we agree that this field is not mission-critical, and decide that we’ll just find someone to work on this later on. Maybe once a quarter, we work with a team to update the values here.

Data stewards and data technology are essential to data governance

There are many different types of data governance frameworks, but usually, they have the following features in common. They have a data steward and those other important stakeholders who develop policies and procedures about a data set. These policies and procedures influence the business process that generates the data. Often, we use technology, possibly like an MDM tool, which is a Master Data Management tool used to measure the compliance of that data against the policies and procedures.

Data Governance - Data stewards and stakeholders

Data stewards and stakeholders monitor the performance of the overall system and make adjustments as needed. So now let's talk a little bit more about the data stewards and technology aspect of this framework. In general, you have data stewards, and they are a very important portion of the governance framework. But you may be asking yourself, what are good qualities that make you a good steward? I would argue that a good data steward works with the data regularly, and is accountable for our data and its quality. You want someone who knows the business in general and knows the business process that generates the data. They have the skills to communicate that data clearly. Therefore, if a change is suggested to the policies, they can investigate its impact and communicate it back to the other data stewards. 

A great data steward can influence data quality throughout the organization by using their knowledge of the data, communication skills, and political savviness to possibly influence others. Oftentimes, data governance is focused on areas where you need to work with others to move the organization and influence others to change their business process. Having data stewards who have that skillset and ability to influence is really key.

Data governance - Technology

Now, focusing a little more on the technology of data governance and the system that helps support the data stewards to implement and measure performance, policies, and procedures. This is a non-exhaustive list, but it can include technologies such as data quality measurement, data discovery tools, and overall data governance platform. The information technology group at most companies or IT typically leads the decisions on how this technology is implemented. And I want to say that this highlights an important point that IT must be a part of the data governance team. A lot of time, IT may not be aware or aligned on what's needed by the data governance team. IT can participate on a team as a data steward, helping to lead the discussion regarding technology deployment, and of course the policies and procedures.

Benefits of data governance

Reducing risk

The different elements of the governance framework: you have the data stewards, policies, and technology working together to provide the benefits of data governance. So, what are those benefits? One of those benefits includes reducing risks. Data governance provides the ability to reduce data risk and data security risks, and risks related to a hearing of privacy rules and regulations. Governance of the dataset will require you to understand the dataset sensitivity, and from there you can limit access as needed.

Additionally, there are privacy benefits that will allow an organization to develop a process of restricting access to personal identity information (PII) data as needed. With the current regulations such as GDPR and others, organizations need to understand the risk around privacy and plans to address them. The data governance program can be used as a tool here to help do that.

Scalable data science

Additionally, for benefits, data governance improves the scalability of data science. Data governance improves the scalability of data science. Data is no longer in silos. You have essentially created a common data glossary and verbiage. There's more confidence in the data quality because you have someone actually looking for any non-conforming records. This all has a positive impact on the implications of how the business runs, where users are now more focused on data-driven decisions, which will overall improve business performance.

Data governance provides clear rules for processing. They're a centralized management system for data. Since the data is of high quality and there's trust in it, other teams are not keeping their own versions of the data, and some of the datasets can be used on different projects. This leads to IT and the business being more aligned and agile, and reduced costs for storing data.

Challenges in scaling data governance

Now we've talked a little bit about what data governance is and its benefits. We'll discuss some of the challenges and best practices next. First, let us remind ourselves of the general framework. The key here is that these different elements work together to support the overall data governance program. Without any one of these elements, it becomes a challenge to achieve the objectives of the overall program. Here are some challenges that might arise when implementing a good data governance framework.

Challenges in data governance: Not an organizational priority

Often, data governance is not a priority. This can be a significant risk to the program's overall success if there isn't buy-in from senior leadership. Here are a couple of stats. Gartner highlights that 42% of senior data analytics leaders did not even monitor data governance. Additionally, 60% of organizations do not plan to publish data provenance or lineage. These stats speak to how difficult it can be to make data governance a priority.

Furthermore, let's just be honest. This work is not sexy or cool; it is often about driving the organization to a consensus on the rules and policy, changes in business process to support the agreed-upon policy, and measuring performance. Also, each of these steps can be a thankless task in itself. 

While I plan to get to best practices shortly, I do want to leave you with some thoughts and high-level approaches on how to combat some of these challenges. So, for this one, essentially show that data governance improves data quality, and how it has direct impacts on a business process; cautiously and continuously communicate those to senior leadership to help them understand and make it a priority. Also, celebrate where you can anyone who greatly helps advance the work of the data governance program. Essentially, you want to assign a light on their efforts to make them feel sexy and cool. You want to make this an enjoyable thing for people to participate in.

Challenges in data governance: Data infrastructure

Another challenge you might have is during acquisition times of organizations. This could be a challenge where you have two different organizations coming together, and you want to bring their related datasets together. For example, a customer dataset or supplier dataset. These are usually pretty key datasets within the different organizations. The caution I have for you as an acquisition is a time of change. Without governance over the data, it may introduce data quality issues causing what I will call a spaghetti mess where the information in a field for one organization does not mean the same thing in another organization.

I'll give you an example here. Imagine we're working with customer data. In this data set, there's a date field about when a customer will like delivery of a particular item. How one organization defines the actual delivery date may be different from the acquired organization defining the actual delivery date. Now, as you bring the two datasets together, you start to have data quality issues. My advice to combat this challenge is to bring the two organization data governance teams together as early as possible in the merging process.

If one organization does not have a data governance team, assign the appropriate people who can operate data stewards. There are many important activities in a merger like essentially defining a new organizational structure. Data governance might be an afterthought. The goal here is to make it as much of a priority as possible especially for our key datasets. It's essentially my last point about making data governance a priority. I have the same advice when looking at third-party datasets. You want to work with the data supplier and the governance team to ensure that your data meets your organizational standards.

Challenges in data governance: Inadequate data stewardship

Another challenge that you might have is inadequate data steward representation. One of the challenges here is finding adequate data stewards representation includes low organizational data literacy, making it difficult to convince individuals to service data stewards because they fail to see the importance of the role. Additionally, if data stewards are not aligned or how to work together or if the governance lacks data stewards from the appropriate business units, these can be additional challenges to have a strong data steward team.

Finally, it may be difficult for data storage to devote a sufficient amount of time to data governance activities if there's a lack of executive sponsorship that supports the governance program overall and ensures that the data stewards have adequate time to participate. Some possible responses to this challenge include providing training or having the stronger data stewards work with other data stewards. Additionally, I will look to gain executive sponsorship as soon as possible.

Challenges in data governance - data governance ownership

Another challenge that you might face is data governance ownership. Often, there's a misconception by people within the business that IT owns the data and therefore is responsible for the governance. Without some level of ownership from the business, data governance is unlikely to succeed. We need the support of both IT and the business process owner to develop policies and procedures and work as data stewards. Let's remember that it is the business process that generates the data, and the business process owners have influence over that. Additionally, it is likely that someone within the business is going to be the primary user of the data. Therefore, the business itself has a significant stake in data governance and you need the business and IT to work together in partnership to be successful. 

Best practices for data governance

Start with the Right People

Now we've covered some of the challenges of data governance. I want to say a little bit about some of the best practices. Hopefully, you heard me stressed throughout this presentation: data governance is a combination of people, processes, and technology.

Well, start by finding the right people. People set the agenda for scalable data governance. We have a quote here that I love from Tableau, a data governance and visualization software company where they give their advice around data governance. They say, “to begin building the big picture, start with people, then build your processes and finally incorporate your technology. Without the right people, it's difficult to build a successful process needed for the technical implementation of data governance.” In my personal experience, you need the right people in place who have either influence over the process directly or have some sort of influence that helps that process that generates the data. This again needs senior leadership buy-in.

Treat data governance as an ongoing process

Like a marathon I'm running here, I want to communicate to you that data governance is a journey. Therefore, once started, it's about an ongoing process as a business develops and adds new data sets. Industries like finance and healthcare generate massive amounts of data, and a data governance framework will be needed here to help ensure that the new data is of high quality.

Be iterative

Another suggestion is, essentially, to be iterative. Start with one area domain and then expand to the others. You might begin, for example, with customer data then you expand to financial data, and then you move to a different domain within the organization. Constantly iterate. 

Track metrics in areas of value, effectiveness, and sustainability

Also, you want to develop metrics to measure the value of data governance as a business process. DAMA again suggests developing metrics in the areas of value, effectiveness, and sustainability. For metrics that measure value, think about measuring business process improvements. Some examples might include measuring the reduction in time it takes to close month-end books because you have cleaner data without missing values, or an increase in speed to onboarding customers because you have better data governance. An example of an effective metric might include the number of business users who have trained or certified on a new governance policy or the time that it takes to bring to market a new data product.

And now, when we think about sustainability, think more long-term. For example, to the extent to which data stewards are using the relevant tools and metrics that measure the number of non-conforming records. Another metric might include, what is the number or percentage of datasets in an organization covered by the governance program. For me, I'd like to see if the percentage is increasing over time. Finally, once you have these metrics, use them to develop a story, and you want to be able to tell if that governance program is successful or if it's being challenged.

Communicate early and often

Building off of that last point, communicate early and often the success of the program using a story. You're trying to build momentum for the work here that needs to be done. Hearing success stories of benefits of other teams and departments on the governance model tends to make it easier to spread the model to other business units. Unless you’re starting with a brand new organization or business, there's likely that sort of change is going to be needed. 

Remember, change can be challenging for people. I have a slide here, and I'm not sure if you have seen this model on the different emotional states during a change. Essentially, there can be resistance early on. Therefore, communicating any small wins helps reduce the future hesitancy from other teams and departments, as we continue to expand the data governance framework coverage over different datasets.

Invest in tools

Finally, let's invest in data governance tools. Working with IT, these different tools help support the data governance process. At this point hopefully, I've excited you about data governance, and you're asking yourself “how does my organization get started?”

Getting started with data governance 

Start by building a business case

One way to get started is by building a business case for a data governance program. In doing so, start by identifying pain points and business processes that might improve or better data quality. By going into this process, this will help identify potential datasets or domains as you might want to start governing.

Additionally, try to evaluate the cost of poor data quality in the business process. You want to identify any hard dollar savings — those that directly impact the business — but it's also useful to capture what I will call soft savings, or things that just are hard to put dollars to, but we know improve processes overall. Finally, garner executive support for your effort.

Readiness assessment

Another suggestion that I have is looking to perform a readiness assessment. Questions like, do I have the right people who can be data stewards? What is the level of data literacy for the organization? What is the quality of the data today? Do I have any rules in place that I can use as a base to build from my policies and procedures? Additionally, what tools and technologies do I have in place to support the governance model? These are all questions that a readiness assessment can assist with and give you a sense of the strengths and opportunity areas for your organization.

Avoid starting from scratch

Finally, avoid starting from scratch. There are different organizations and frameworks that can be helpful when working through problems. Organizations such as DAMA or the EDMCouncil provide training and thought leadership on data governance. Additionally, they often have local chapters in many different cities for sharing and learning best practices. The recent advances in technologies have made it possible to collect data on almost everything. However, the field of data governance is not new. It has a long history that you can draw from.

Wrap-up

It’s going close to wrapping up, and I'd like to leave you with a few central points. I want to again highlight that data governance is crucial to the scalability of data science because it enables organizational-wide trust in the data. Again, business users will be hesitant in using data in their decision-making process if they lack trust that the data will be consistent and of high quality. High-quality data enables data-driven decision-making in the organization at scale and ensures decision making is not just limited to one department, and that the opportunity of data science is available throughout the organization. Data governance is a framework and a process of scaling that data quality by leveraging technology, ensuring that datasets conform with and agreed-upon policies and procedures. Therefore, data governance is inextricably linked with a data-driven culture.

My last point is that — I would like to leave you with the idea that — data governance is the responsibility of everyone. It is a team effort; it requires the support of senior leadership, business process owners, and IT. Each one of these groups has an important role to play. By partnering together on data governance programs, you help to democratize data science and which should improve the overall business performance. With that, I would like to wrap my presentation, and I thank you for your time.

Questions and Answers

  • Question: You mentioned this as well during the discussions on metrics, Aaren. How do you justify the cost involved in a data transformation project that will support data governance? Are there KPIs or established frameworks that you can use to be able to determine the ROI there?

  • Answer: That's a good question. It's a difficult question at times where you have, as I call them, hidden costs. For example, what is the cost of not being able to make a decision, or not being able to answer a question because your data doesn't support that decision? I will again point you to the DAMA and EDMCouncil. They have frameworks and information around looking at the costs. But essentially, it comes down to looking for places where if you are able to improve data quality, you can improve a business process. For example, I gave the example of closing the books of a company a little faster because you have higher data quality. This is the example that I had a chance of working through where the organization was working through closing their books. But, because there were missing values in their dataset and they needed to call around to different places throughout the organization to find that data, it took a long time. The value of closing a book by 3 or 4 more days, because we were able to reduce that time of just checking to see where these missing values were, was significant. We can attach dollar signs to that. We’ll continue to work in that direction.

  • Question: We mentioned here ownership being a major component. Who or which department is exactly responsible for data governance? Who do you think it should be?

  • Answer: It's a good question also. I think that depending on the organization, it — if you wanna call it Chief Data Steward — can sit in different parts of the business. It can be IT, or it can be within the business itself. I think, really, the real key here is to realize and expect that it will be a partnership between the different parts of the business. So you definitely need IT working in conjunction with business users to actually develop, and have a strong data governance process. If you set it only in IT but business users are disengaged, I don't think it will be successful. If you put it in the business and you're not able to get IT engaged, again you can still have the same problem. I believe it can sit wherever it needs to sit, like the team or the official people responsible for the budget responsible to help it out, but you're going to need a partnership across different departments to actually make it successful.

  • Question: So startups often lacked people and resources to implement the data governance policy. Could you share any public data, any data governance framework, or template that you think they can rely upon in order to get started quickly?

  • Answer: That’s a good question. So I would probably point again to those different organizations, DAMA and EDMCouncil. They have an extensive amount of information around policies and procedures. What I would suggest is that data governance can be very iterative. So even in a startup and you're starting off like “we just want to ensure that this column only has values of these particular categories and we're not missing something else” and “we deemed this column as crucial to the business process, and we need to make sure that when data comes in so it only has a certain type of value.” You can start there and continue to build those policies and procedures as you need them. As the organization grows, it can continue to be added to and go from there.

  • Question: With trial and error, how do you know which particular data governance projects are worth investing in?

  • Answer: During the assessment period, the assessments I've seen in the past go through and talk about where there may be an opportunity. For me, the idea is, once you have that set of potential projects or areas that you can look to govern, then you can start to prioritize that based on the value assigned to them. There's usually a cutoff between the resources you have available to work with and the time you have available versus what is the potential benefit from that. It's a tough balancing act but it's about how you decide that.

  • Question: Are there any KPIs that you recommend to measure ROI with regards to data governance?

  • Answer: I would definitely look to see those metrics around business impact. Those particular metrics are really useful to help you garner support from the senior leadership. Using those are the main ones, and those are usually defined or developed based on the data sets you're looking at. So I gave the example of closing the month-end books. We can use that as a metric to measure that overtime for a bit to say what dollar values are assigned to this project because we're now able to close month-end books, five days, seven days, eight days faster because we have improved data quality. So I will start there. Additionally, do not forget about, I will say, those sustainability metrics. You want to look at data governance being a successful light project overall. If you start to see that you are having a challenge getting others to use the tools, and you have a lot of non-conformance, and you aren't able to get that corrected, that should be a red flag to say that the program itself is overall being challenged, and you’ve got to find ways to improve that process.

  • Question: What considerations should an organization keep in mind when trying to find a balance between data democratization and data security? Often, personal identifiers are required to be locked down or distribution is kept limited.

  • Answer: Yes, that's a good question. Essentially, I would err on the side of caution. You don't want to make data available to the broader organization unnecessarily. For example, if you have names and personally identifiable information, is there an ability to somehow obscure that information so you can make that available to the organization using internally assigned IDs versus actual names? Hopefully makes that process easier. Now, the organization has access to that concept and the ability to use that data but you're not exposing the organization overall to someone unintentionally using that data improperly, like credit card information and others. These are really important security considerations. In that process, if there is the ability to, I would definitely make sure that legal teams and others are engaged in this concept of data governance for those really highly sensitive areas around data and data security.

  • Question: What do you think are the main challenges in terms of organizational culture to achieve a sweet spot in terms of data democratization because it could be a conflict of different interests? How about solving all of the different interests working in order to find a solution where everyone has flexibility, security, speed of access, and privacy risks are being respected as well?

  • Answer: When it comes to organizational culture, I think the hard part is especially for organizations that are older. Unless they're starting that way and you're starting with a culture where data democratization and data governance is already a part of, it can be a challenge to get leadership and others aware and understand the importance of it. It's about working with people and bringing them along to see where there are opportunities to improve different processes and again especially around data quality. If there is a spot where there are challenges where people often complain about “if I have X, Y, Z, I could do this,” or “if this data was cleaner, I can do this process better,” those are potential areas that you can use as a test case to help the organization overall better understand how data governance can improve business processes. So now, you have that culture that you started which hopefully develops over time. Hopefully, you'll start to bring that culture along, and let them see the importance of it. It then becomes a balancing act between the needs of a data governance team — resources, time, money, effort, and all that — with other prioritizations. But, I would say businesses and organizations have other processes to identify and decide what they want to proceed with. I'll just give an example of a company that's looking to invest in certain areas of technology or not. They have an internal process where they decide what they want to do. I would just put data governance as a part of it also.

  • Question: We could say that focusing on better performance decision making, and aggregated value obtained through data governance is a way to justify the cost. However, it seems like a leap-of-faith project to the organization. Although there are many benchmarks and metrics to compare, how do you convince or convert people with low data literacy?

  • Answer: These are truly really good questions, and also are at the heart of how you change an organization. I think it starts with showing people information and opportunities through an assessment in starting very small in a particular area where there are pain points around data quality and working from that as a small project. Once you have a small success, then you're able to then communicate that to others and show the benefits of that. In addition, I think, most organizations, as I stated earlier, are starting to see that data governance is becoming important for them. They want to use different technologies, such as AI and machine learning. They realized pretty quickly early on that they need high-quality data to start to utilize some of those tools. And then that falls back to how do we improve our data quality, how do we get that to a certain level where we can now utilize those investments. So, data governance quickly comes up, and there are opportunities there. So, hopefully, internally, you're using examples of looking to find those pain points, and internally developing examples of where a data governance program and larger investment will make sense for the organization. And then, externally, I think a lot of industries are starting to realize that if they want to use certain advanced toolsets that they have to quickly learn, and quickly adjust their data to clean it up, have better quality, and then be able to get some of the benefits of that they are expecting externally. Hopefully, I answered that question.

  • Question: How does DAMA or EDMCouncil help in establishing data governance frameworks? Can you provide some insight on these frameworks?

  • Answer: Yeah, both of those organizations are around, not just data governance, but just data management in general, and data governance comes with that. They have brought together professionals from a lot of different organizations going way back to the early database era. They put people together to identify what are some of the best practices to work on data and just businesses overall. From that, they've put together their version of what is the best kind of framework to use, and how to have data stewards and individuals work together. They've given definitions on what they call data stewards and other players. They just have a ton of information that they've put together over time. I would say that you can look to them or their frameworks. And I'm sure there are other frameworks out there that you could try. I would point again to finding one that works and start with one. Especially if you have one of those organizations that is local and can help you, I would start with that and then continue to grow your governance model. Start slowly and continue to move it, and grow it throughout the organization.

Compartilhar

Summary

Data governance is essential for enabling scalable data science. It ensures organizational trust in data, which is key for developing a data-driven culture. Organizations like Netflix and Airbnb exemplify the benefits of data democratization, where accessible and high-quality data lead to improved decision-making and competitive advantages. Aaron Stubberfield emphasizes that data governance involves planning, monitoring, and enforcement of data management. It necessitates a combination of people, processes, and technology to reduce risks, improve efficiency, and align business and IT teams. Challenges include gaining executive support and finding qualified data stewards, while best practices highlight the importance of communication, iterative processes, and using existing frameworks. Ultimately, data governance is a team effort that requires commitment from senior leadership, IT, and business process owners.

Key Takeaways:

  • Data governance is key for scalable data science and organizational trust in data.
  • High-quality data leads to better decision-making and business performance.
  • Data democratization enables all employees to make data-driven decisions.
  • Challenges include executive support and finding effective data stewards.
  • Best practices involve clear communication, iterative processes, and using existing frameworks.

Deep Dives

The Role of Data Governance

Data governance plays a key role in ensuring that data is reliable, compliant, and of high quality. It involves the exercise of authority and control over data management, which includes planning, monitoring, and enforcement. Acco ...
Ler Mais

rding to Aaron Stubberfield, without data governance, business users may hesitate to use data for decision-making if they lack trust in its quality. This trust is foundational for a data-driven culture, enabling organizations to make informed decisions. As Aaron notes, "High-quality data enables data-driven decision-making in an organization at scale and ensures decision-making is not limited to one department."

Advantages of Data Democratization

Data democratization means making data accessible and usable for all employees, not just data scientists or specific teams. This approach allows organizations to move faster and be more agile. Companies like Airbnb have successfully implemented data democratization by providing data education, tools, and access to create a single source of truth. A Deloitte survey indicated that organizations with a strong data-driven culture are twice as likely to exceed business goals. Empowering employees with the right tools and a culture of data literacy is key to realizing the benefits of data democratization.

Obstacles in Implementing Data Governance

Despite its benefits, implementing data governance can be challenging. One major hurdle is gaining executive support, as highlighted by Gartner's findings that many senior leaders do not prioritize data governance. Additionally, finding qualified data stewards who understand both the business and technical aspects of data management is key. Aaron advises that "a good data steward works with the data regularly, is accountable for that data, and its quality." Organizations must also manage the complexities of integrating data governance during mergers or acquisitions, where disparate data systems can create quality issues.

Effective Strategies for Data Governance

Successful data governance requires a combination of the right people, processes, and technology. Aaron emphasizes starting with people, as they set the agenda for scalable data governance. This includes gaining senior leadership support and involving IT and business process owners. Effective strategies also involve iterative processes, starting with a specific domain and expanding as needed. Developing metrics to measure the value of data governance and communicating successes to build momentum are key. Using existing frameworks, such as those from DAMA or EDM Council, can provide valuable guidance and resources.


Relacionado

white paper

Your Organization's Guide to Data Maturity

Learn how evaluate and scale data maturity throughout your organization

webinar

Scaling Data Science At Your Organization - Part 2

Scaling and democratizing data science relies on infrastructure and tools.

webinar

Democratizing Data Science at Your Company

Data science isn't just for data scientists. It's for everyone at your company.

webinar

Democratizing Data in Government Agencies

Get specific, actionable best practices for data democratization in government.

webinar

Scaling Data Science At Your Organization - Part 3

Learn how to organize your data science team to scale effectively.

webinar

Fostering Confidence with Data Across Your Organization

Learn how to empower your entire organization with data literacy.

Hands-on learning experience

Companies using DataCamp achieve course completion rates 6X higher than traditional online course providers

Learn More

Upskill your teams in data science and analytics

Learn More

Join 5,000+ companies and 80% of the Fortune 1000 who use DataCamp to upskill their teams.

Don’t just take our word for it.