Data Management

According to the Data Management Body of Knowledge, data management is “the development, execution and supervision of plans, policies, programs and practices that control, protect, deliver and enhance the value of data and information assets.” In our opinion this is a very good definition, unfortunately the implementation of data management strategies tends to be challenged in practice due to the traditional, documentation-heavy mindset. This mindset tends to result in onerous, bureaucratic strategies that more often than not struggle to support the goals of your organization.

Having said that, data management is still very important to the success of your organization. The Disciplined Agile (DA) toolkit’s Data Management process blade promotes a pragmatic, streamlined approach to data management that fits into the rest of your IT processes – we need to optimize the entire workflow, not sub-optimize our data management strategy. We need to support the overall needs of our organization, producing real value for our stakeholders. Disciplined agile data management does this in an evolutionary and collaborative manner, via concrete data management strategies that provide the right data at the right time to the right people.

This article addresses several topics:


Why Data Management?

There are several reasons why a disciplined agile approach data management is important: 

  1. Data is the lifeblood of your organization. Without data, or more accurately information, you quickly find that you cannot run your business. Having said that, data is only one part of the overall picture. Yes, blood is important but so is your skeleton, your muscles, your organs, and many other body parts. We need to optimize the whole organizational body, not just the “data blood.”
  2. Data is a corporate asset and needs to be treated as such. Unfortunately the traditional approach to data management has resulted in data with sketchy quality, data that is inconsistent, incomplete, and is often not available in a timely manner. Traditional strategies are too slow moving and heavy-weight to address the needs of modern, lean enterprises. To treat data like a real asset we must adopt concrete agile data quality techniques such as database regression testing to discover quality problems and database refactoring to fix them. We also need to support delivery teams with lightweight agile data models and agile/lean data governance.
  3. People deserve to have appropriate access to data in a timely manner. People need access to the right data at the right time to make effective decisions. The implication is that your organization must be able to provide the data in a streamlined and timely manner.
  4. Data management must be an enabler of DevOps. As you can see in the following diagram, Data Management is an important part of our overall Disciplined DevOps strategy. A successful DevOps approach requires you to streamline the entire flow between delivery and operations, and part of that effort is to evolve existing production data sources to support new functionality.

Figure 1. The workflow of Disciplined DevOps.

The workflow of Disciplined DevOps


Disciplined Agile Values for Data Management

There are several values that are key to your success when transforming to a leaner, more agile approach to Data Management. Taking a cue from the Disciplined Agile Manifesto, we’ve captured these values in the form of X over Y. While both X and Y are important, X proves to be far more important than Y in practice. These values are:

  1. Evolution over definition. The ability to safely and quickly evolve an existing data source, either to extend it to support new functionality or to fix quality issues with it, is absolutely paramount in today’s hyper-competitive environment. Yes, having defined data models and metadata describing them is also important, but nowhere near as important as being able to react to new business opportunities. Luckily agile database techniques, long proven in practice, exist that enable the safe evolution of production data stores.
  2. Holistic organization over Data Management. Earlier we said that data is the lifeblood of your organization. Yes, blood is important but so is your skeleton, your muscles, your organs, and many other body parts. We need to optimize the whole organizational body, not just the “data blood.” Traditional Data Management approaches often run aground because they locally optimize for data concerns, whereas a DA approach to Data Management recognizes that we must optimize the overall whole. This implies that sometimes we may need to sub-optimize our strategy from a data point of view, for the sake of organizational level optimization.
  3. Sufficiency over perfection. Data sources, like many other IT assets, need to be good enough for the task at hand. The old saw “perfect is the enemy of good” clearly applies in the data realm – too much time has been lost, and opportunities squandered, while development teams were forced to wait on Data Management teams to create (near) perfect models before being allowed to move forward. Traditional data professionals mistakenly assume that production databases are difficult to evolve and as a result strive to get their designs right the first time so as to avoid very painful database changes in the future. The Agile Data method has of course shown this assumption to be wrong, that it is very straightforward and desirable to evolve production databases. A side effect of this revelation is that we no longer need to strive for perfect, detailed models up front. Instead we can do just enough up-front thinking to get going in the right direction and then evolve our implementation (including data sources) over time as our understanding of our stakeholder needs evolve.
  4. Collaboration over documentation. We’ve known for decades that the most effective way to communicate information is face-to-face around a shared sketching environment, and that the least effective is to provide detailed documentation to people. The implication is that we need to refocus our efforts to be more collaborative in nature. As data professionals we need to get actively involved with solution delivery teams: to share our knowledge and skills with those teams, and to enable them to become more effective in working with data. Yes, we will still need to develop and sustain data-related artifacts, but those artifacts should be lightweight and better yet executable in nature (see below).
  5. Cross-functional people over specialized staff. Agilists have come to realize that people are more effective when they are cross-functional (also known as T-skilled or generalizing specialists). Although specialists are very skilled in a narrow aspect of the overall process, the problem is that you need a lot of specialists to perform anything of value and as a result the overall workflow tends to be error prone, slow, and expensive. The other extreme would be to be a generalist, someone who knows a little bit about all aspects of the overall process. But, the challenge with these people is that although they’re good at collaborating with others they don’t actually have the skills to produce concrete value. We need the best of both worlds – a generalizing specialist with one or more specialties so that they can add value AND a general knowledge so that they can collaborate effectively with others and streamline the overall effort.
  6. Automation over manual processes. The only way that we can respond quickly to marketplace opportunities is to automate as much of the bureaucracy as we possibly can. Instead of creating detailed models and documents and then reviewing potential changes against them we capture our detailed specifications in the form of executable tests. This is quickly becoming the norm for specifying both the requirements and designs of code, and the same test-driven techniques are now being applied to data sources. Continuous integration (CI) and continuous deployment (CD) are also being applied to data sources, contributing to improving overall data quality and decreasing the time to safely deploy database updates into production.

As you can see, we’re not talking about your grandfather’s approach to Data Management. As Figure 2 summarizes, organizations are now shifting from the slow and documentation-heavy bureaucratic strategies of traditional Data Management towards the collaborative, streamlined, and quality-driven agile/lean strategies that focus on enabling others rather than controlling them.

Figure 2. Shifting from bureaucracy to enablement in Data Management.

Data Management Mindset

The Process

The following process goal diagram overviews the potential activities associated with disciplined agile data management. These activities are often performed by, or at least supported by, a data management team.

Figure 3. The process goal diagram for Data Management.

Data Management Goal Diagram

The decision points that you need to address with your data management strategy are:

  1. Improve data quality. There is a range of strategies that you can adopt to ensure data quality. The agile community has developed concrete quality techniques – in particular database testing, continuous database integration, and database refactoring – that prove more effective than traditional strategies. Meta data management (MDM) proves to be fragile in practice as the overhead of collecting and maintaining the meta data proves to be far greater than the benefit of doing so. Extract transform and load (ETL) strategies are commonplace for data warehouse (DW) efforts, but they are in effect band-aids that do nothing to fix data quality problems at the source.
  2. Evolve data assets. There are several categories of data that prove to be true assets over the long term: Test data that is used to support your testing efforts; Reference data, also called lookup data, that describes relatively static entities such as states/provinces, product categories, or lines of business; Master data that is critical to your business, such as customer or supplier data; Meta data, which is data about data. Traditional data management tends to be reasonably good at this, although can be heavy handed at times and may not have the configuration management discipline that is common within the agile community.
  3. Ensure data security. This is a very important aspect of security in general. The fundamental issue is to ensure that people get access to only the information that they should and that information is not available to people who shouldn’t have it. Data security must be addressed at both the virtual and physical levels.
  4. Specify data structures. At the enterprise level your models should be high level – lean thinking is that the more complex something is, the less detailed your models should be to describe it. This is why it is better to have a high-level conceptual model than a detailed enterprise data model (EDM) in most cases. Detailed models, such as physical data models (PDMs), are often needed for specific legacy data sources by delivery teams.
  5. Refactor legacy data sourcesDatabase refactoring is a key technique for safely improving the quality of your production databases. Where delivery teams will perform the short term work of implementing the refactoring, there is organizational work to be done to communicate the refactoring, monitor usage of deprecated schema, and eventually remove deprecated schema and any scaffolding required to implement the refactoring.
  6. Govern data. Data, and the activities surrounding it, should be governed within your organization. Data governance is part of your overall IT governance efforts.

Looking at the diagram above, traditional data management professionals may believe that some activities are missing. These activities may include:

  • Enterprise data architecture. This is addressed by the Enterprise Architecture process blade. The DA philosophy is to optimize the whole. When data architecture (or security architecture, or network architecture, or…) is split out from EA it often tends to be locally optimized and as a result does not fit well with the rest of the architectural vision.
  • Operational database administration. This is addressed by the Operations process blade, once again to optimize the operational whole over locally optimizing the “data part.”


External Workflow With Other IT Teams

Key tenets of agile and lean are to work collaboratively and to streamline your workflow respectively. In Figure 4 we see that Data Management is a collaborative effort that has interdependencies with other DA process blades and the solution delivery teams that Data Management is meant to support. This can be very different than the current traditional strategies. For example, with a DA approach, the Data Management team works collaboratively with the delivery teams, Operations, and Release Management to evolve data sources. The delivery teams do the majority of the work to develop and evolve the data sources, with support and guidance coming from Data Management. The delivery teams follows guidance from Release Management to add the database changes into their automated deployment scripts, getting help from Operations if needed to resolve any operational challenges. Evolution of data sources is a key aspect of Disciplined DevOps. This is very different than the typical traditional strategy that requires delivery teams to first document potential database updates, have the updates reviewed by Data Management, then do the work to implement the updates, then have this work reviewed and accepted, then work through your organization’s Release Management process to deploy into production.

Figure 4. Successful Data Management is collaborative.
Data management workflow

Internal Workflow of Data Management

Now let’s drill down and see what the workflow for a DA approach to Data Management looks like. First, notice how all of the activities depicted in Figure 5 are collaborative in nature. This is shown via the additional roles beside the activities or interacting with them. Second, how you address these activities will vary depending on the situation that you face. Our goal here, is to explore a baseline from which you can potentially start, but you’ll need to tailor it to address your actual situation.

Figure 5. Internal workflow for Data Management.

Data management internal workflow
Let’s work through each activity one at a time:

  1. Evolve organizational data artifacts. Organizational Data Management artifacts may include data models, including but not limited to a high-level conceptual model for your enterprise (typically a view within your enterprise architecture); metadata describing common concepts, entity types, and data elements within your organization; master data for critical entity types; and master test data to support database testing across multiple delivery teams. Data Managers will work closely with Product Managers to understand their overall vision for their products and the organization as a whole to ensure that their Data Management strategy aligns with your business roadmap. Data Managers will also work closely with Enterprise Architects to ensure that data concerns are addressed appropriately in your organization’s architecture and that your Data Management strategy aligns with your technology roadmap. These collaborations are often accomplished through regular working sessions that are often called in an as-needed, impromptu manner.
  2. Enable delivery teams. Data Managers work closely with delivery teams to train, educate, and coach them in data skills. The overall strategy is to enable delivery teams to be as self-sustaining as possible when it comes to data-related activities, to offload as much of the grunt work as possible to enable the teams to become more reactive and to allow Data Managers to focus on value-added activities such as evolving organizational data artifacts and guidance. The implication is that Data Managers will need to develop and maintain a training program around fundamental data skills (computer-based training often proves sufficient for this) such as data modeling, database design, and data security. They will also need coaching skills so that they can work side by side with delivery teams to help them to learn these critical skills.
  3. Support deliver teams. Delivery teams will need help from time to time to address hard database design problems, to gain access to and to understand legacy data sources, and to obtain and/or generate test data. The DA strategy is for Data Managers to work collaboratively with the delivery teams to do so, to get directly involved with the teams to do the actual work (and to transfer skills while doing so). In a pragmatic take on the sage advice around teaching a man to fish, the goal should be to teach the delivery team how to fish but while doing so provide enough fish to sustain them until they become self sufficient.
  4. Evolve and support data guidance. Delivery teams should follow your organizational conventions around data (and around security, and user experience, and so on). The Data Management team is the source of this data guidance, which should address fundamental issues such as data naming conventions, data security conventions, and your data architecture and design patterns. This guidance should be developed and evolved collaboratively with the delivery teams themselves to ensure that the guidance is understandable, pragmatic, and accepted by the teams.
  5. Support and monitor operations. Data Managers will work closely with Operations Managers to monitor your existing production data sources (Operations Managers monitor far more than just data sources of course). Ideally this monitoring is fully automated with dashboard technology used to render critical operational intelligence in real time. Note that operational database administration activities are addressed by the Operations process blade.
  6. Improve data quality. Data Managers will guide and collaborate in the data quality improvement efforts of your Database Administrators (DBAs) and Operations Engineers as well as your delivery teams. This is depicted below in Figure 6. They will monitor your automated database regression testing efforts (ideally a continuous effort) and your ongoing data source evolution efforts (implemented as database refactorings) that occur on a daily basis. Your Data Managers will oversee the long-term aspects of database refactoring, in particular the retirement of deprecated database schema and the scaffolding required during the deprecation periods for the appropriate refactorings.

Figure 6. A collaborative approach to data quality (click to enlarge image). 
Collaborative data quality

Let’s examine Figure 6 in a bit more detail. Common data quality activities are indicated towards the top (the blue bubbles). Immediately below each activity is the primary role(s) responsible for it – notice how in an agile environment data quality is so important that it isn’t left to just people in data roles. Below the primary roles, in come cases, we indicate secondary roles that may be involved in assisting with, or supporting, the activity.


How to Adopt This

We’ve found that the following strategies are critical to your success when adopting a Disciplined Agile approach to Data Management:

  1. Surface your challenges. You need to have an honest conversation about the effectiveness of your current approach to Data Management. This conversation must be driven from an organizational viewpoint so as to take into account stakeholder needs. We’ve found that a very effective way to do this is value stream mapping (VSM) sessions to reveal the true efficiency and quality levels of your critical Data Management processes. This will not be pleasant for anyone who still believes in traditional Data Management practices, but it is vital to your success that everyone recognizes that you need to improve.
  2. Expect better. One of the reasons why the Data Management field has languished for as long as it has is because the rest of the IT community allowed them to. For the most part we accepted their claim that they needed to work in a slow and onerous manner, that eventually they’d address our organization’s data quality challenges through some form of bureaucratic magic that only they understood. Enough is enough.
  3. Invest in your staff. Traditional data professionals tend to be overly specialized, often focusing on one aspect of Data Management such as logical data modeling, Meta Data Management, data traceability, and so on. Not only does this result in bureaucratic, drawn-out processes but many of these specialties are no longer required when you’ve adopted pragmatic, quality-focused agile strategies. To be effective you need T-skilled generalizing specialists, and that requires you to invest in training and long-term coaching to help people to modernize their skillset.
  4. Hire agile coaches with deep experience in both Agile and Data Management. Someone with agile coaching experience alone will struggle to gain the trust of experienced Data Management people, and a Data Management coach without deep agile experience will struggle to help people to overcome their deep-rooted traditional belief system. The bad news is that agile Data Management coaches are very hard to find right now due to high demand and low supply.

Improving your Data Management processes and organization structure to support a Disciplined Agile way of working is a daunting task. Along with evolving your governance strategy it is likely the hardest part of any organizational transformation, but it is one that you cannot ignore. Effective Data Management is critical to your success as an organization.