Playbook for Project Management in Data Science and Artificial Intelligence Projects
Click HERE to download the PDF
Table of Contents
Chapter 1 Business Context and Value Proposition
Chapter 2 Key Challenges in DS/AI Projects
Chapter 3 Framework Based on DS/AI Project Best Practices
I. Three Critical Gaps addressed by our best-practices-based toolkit
II. Best-Practices-Based Toolkit to address the Identified Gaps
III. Resources for Practitioners and Organizations
IV. Limitations of the Framework
Chapter 4 How PMI and NASSCOM CoE could support Organizations
AI is creating possibly the biggest technological wave of our lifetime. The geopolitical implications of AI are being seen beyond economic prosperity. In the race to establish AI supremacy, nations are competing to formulate their own AI strategies. Fully distributed innovation driven by decentralized development, coupled with low barriers to entry, is fueling the pursuit of “AI for all”. Thus, AI clearly is no longer a “secret sauce” but one that continues to provide significant advantages – with the right focus on people, processes and technology.
As more balanced views emerge on the utopian vs dystopian future of AI, the focus is now shifting to “how” to develop AI solutions. AI projects are significantly cost and time-intensive, and traditional project management frameworks may not be capable of negotiating the complexities of the workflow. Additionally, we need to explore if a uniform framework can guide the development of DS and AI solutions across organizations (service companies, startups and GCCs) and use cases.
Two established thought leaders, the Project Management Institute and NASSCOM's Center of Excellence for Data Science and AI have come together to bring to you a unique perspective on emerging project management practices in the DS and AI space. The recommendations are the outcome of expert consultations across a range of organizations with established DS and AI practices on specific stages of managing such projects.
We hope that you enjoy reading this publication and find some of the best practices worth implementing. Also, as organizations that value learning through community, we will immensely benefit from your feedback.
CEO—Center of Excellence, Data Science & AI, National Association of Software and Services Companies (NASSCOM)
Data Science (DS) and Artificial Intelligence (AI) are top of mind for most organizations and professionals these days. This is probably because “AI is likely to be either the best or worst thing to happen to humanity”, as the celebrated theoretical physicist Stephen Hawking once observed. The potential benefits of DS and AI are seen by many as key drivers of economic growth and prosperity, especially in these troubled times. Famed computer scientist Andrew Ng put it more bluntly: “AI is the new electricity.”
Yet, as more and more organizations adopt DS and AI in their ways of working, these initiatives, and those who work in them, have acquired almost a mythical status. This is due to a perception that these technologies call for high levels of innovation, creativity, speed and agility, among others. AI projects are seen as unique and a cut above “regular” business projects.
This study is an attempt to peer behind the veil of romantic mysticism often associated with AI projects to see how they really work, and what makes them tick. Of particular interest to us is a definition of how to effectively and efficiently manage such projects in order to help realize the promised business benefits. We believe this definition will help project managers in DS/AI environments embrace a “fit for purpose” framework to achieve value. At the same time, such a framework will be a fascinating practical guide to countless others who aspire to work in, and manage, AI initiatives.
We are indebted to NASSCOM's Center of Excellence for DS/AI for imbuing this study with their deep subject matter expertise and practical wisdom. We, and our readers, are richer for it.
Managing Director, PMI South Asia
The ability of DS and AI to solve problems and offer answers that go beyond the limitations of the human brain has spurred business interest and investments in these technologies. They guide decisions on an astounding range of problems across industries like automating customer service, quick appraisal for loans, precise diagnosis of pathology images, image recognition for better security, autonomous driving and smart irrigation.
However, there is often a shortfall between the projected benefits from a DS/AI led solution and what organizations realize on the ground. Numerous studies have pointed to a high failure rate of these projects and low or minimal impact that does not justify the investments being made.
Going by preliminary data, PMI postulated the lack of tailored project management practices for DS/AI projects as a major factor behind the high failure rate. This playbook aims to fill the gap by building a “fit for purpose” project management framework that will help organizations and project practitioners improve the outcomes of their DS/AI projects.
The playbook is a result of collaboration between PMI, a global leader in project management, and NASSCOM CoE, an eminent thought leader on DS/AI. It brings together best practices gleaned from interviews and surveys with DS/AI leaders from 25 organizations cutting across industries, geographies and types of organizations1. The playbook offers both leaders’ perspectives of managing DS/AI projects and an appreciation of challenges and workaround solutions by practitioners on the ground, captured through case studies.
1 The interviews were further supplemented by secondary research to provide depth to the playbook
Spectrum of Leaders Interviewed
10 INDUSTRIES COVERED
ITeS, Semiconductor, CPG (Consumer Packaged Goods), Computer Hardware, Agritech, Financial Services, Chemicals, Management Consulting, Telecom and Electrical equipment
3 KINDS OF ORGANIZATIONS
GCCs (Global Capability Centers), Start-ups and Service Companies
of organizations studied reported gaps in their practices for AI projects
of the total wastage in AI projects in 2023 can be recovered with effective project management practices
of organizations use their own customized methodologies for DS/AI projects
Three key challenges in DS/AI projects emerged:
There is limited effectiveness of traditional project management practices when applied directly to DS/AI projects
The need for experimentation is extremely high in DS/AI projects. This makes process adherence very difficult
Defining and measuring success is difficult, as setting KPIs and pegging them to a business value depend on the availability of data, model behavior and other factors
THE PLAYBOOK PRESENTS A PROJECT MANAGEMENT FRAMEWORK THAT COVERS:
- Resources for the capability-building of individuals and organizations to realize transformative project benefits, and
- A best practices-based toolkit for each stage of a DS/AI project derived from our study of leading organizations
The framework strengthens teams’ capabilities by:
•Enabling experimentation via recommended “unstructured practices”
•Supporting alignment on “what success means” by recommending workflow training, framework for success metrics and communication checklists
•Sharing best practices on managing data and model maintenance through the adoption of an organization-wide data strategy, appropriate technologies, and the reusability of models and databases
The role for thought leaders such as PMI and NASSCOM CoE in supporting organizations in their AI adoption journey cannot be overemphasized. In this playbook, we have identified potential next steps to enhance the DS/AI project management maturity of organizations.
Business Context and Value Proposition
Businesses globally are making AI a strategic priority, and committing bigger investments for AI projects. By 2023, global spending on AI systems will be nearly $98 billion, growing at a CAGR of 27.1 percent from 20191. Organizations expect these projects to bring in transformative benefits, with expected business value to be in multiples of the investments being made. For instance, AI augmentation, which is human-machine collaboration aimed at enhanced cognitive performance, is projected to create $2.9 trillion in business value and 6.2 billion hours of productivity globally by 20212.
1 Worldwide spending on artificial intelligence systems will be nearly $98 billion in 2023, according to new IDC spending guide. IDC. Retrieved November 10, 2020, from https://bit.ly/35kVMvP
2 Gartner says AI augmentation will create $2.9 trillion of business value in 2021. (2019, August 5). Gartner. Retrieved November 10, 2020, from https://gtnr.it/3eMbN0J
However, as these technologies are relatively new, many businesses are struggling to find the optimal practices that will improve the success rate in DS/AI projects. As many as 88 percent of the organizations covered in this study reported gaps in their current project management practices for AI projects. The low success rates reported in these projects help in corroborating our findings. A survey of 300 organizations (2015) revealed that the failure rate of big data projects was a whopping 55 percent3. A more recent study, conducted by Massachusetts Institute of Technology and Boston Consulting Group in 2019, showed that the problems persist. The study reported that 7 out of 10 companies derived minimal or no impact from AI projects4. These statistics present a conundrum for organizations: how can they realize the transformative benefits projected from AI-augmented projects? We believe the remedy for sub-optimal outcomes is in more efficiently and effectively managed projects.
3 Kelly, J., & Kaskade, J. (2013). CIOS & BIG DATA What Your IT Team Wants You to Know.
4 Winning with AI. (2019, October 15). MIT Sloan Management Review. Retrieved November 10, 2020, from https://bit.ly/2JPcNWA
Going by the estimate that 55 percent of DS projects either do not get completed or fall short of their objectives, organizations will collectively waste $54 billion in 2023. We estimate that at least $11 billion of this amount can be directly attributed to poor project management practices in AI projects5. Hence, we infer that effective project management practices can save up to approximately 21 percent of the total wastage in AI projects in 20236.
5 The $11 billion wastage (11.4% of $98 billion investments of AI in 2023) is based on PMI's Pulse of Profession study (2020), which found that ~11.4% of every dollar is wasted due to poor project performance globally. We assume that the ~11.4% is a reasonable estimate for wastage due to poor project performance even in 2023, and that it is representative of AI/DS projects.
6 Approximating ($11 Bn/$54 Bn) where $11 Bn is wastage because of poor project management practices and $54 Bn is the total wastage out of the $98 Bn investments in AI/DS projects in 2023
As our study indicates, there is a lack of “fit for purpose” project management practices for this nascent field. A majority of organizations are applying traditional software development methodologies, including agile, in their existing forms to DS/AI projects. However, these projects are fundamentally different from software development projects, thus presenting organizations with some insurmountable challenges. Our research highlights three principal challenges in current DS/AI project management practices:
FROM OUR FILES ::
The project team at a DS/AI product startup used agile for a year but reported their inability to stick to the cadence of the iteration cycles. They failed to adhere to the duration and frequency of sprints because they needed to experiment with multiple sets of data and models, and deal with the uncertainty of outcomes. The ‘undone’ sprints started demotivating the team because of which they moved to a hybrid methodology.
1. Business teams over-estimate the potential of DS/AI solutions and also their own preparedness to implement them. Current practices that have been borrowed from other fields like software development are generally too restrictive for DS/AI projects. There is an inherent need for extensive experimentation in these projects, which makes planning and preparation only partially successful in influencing the final outcome. However, what does help mitigate the challenge to some extent is better training of business stakeholders.
2. Organizations struggle with defining and measuring project success. This is partially driven by lack of maturity in adopting the right metrics. Even if an organization adopts appropriate metrics, there is a huge mismatch with their ability to apply the metrics effectively. The situation is further compounded by the fact that DS/AI solutions require continuous model maintenance. This makes it difficult to report immediate results.
3. Data preparation typically takes the largest proportion of time in a DS/AI project. Data preparation could take up 70-80 percent of the total project time, a fact that organizations are not often prepared to accept and support their teams with. We found that challenges include the existence of data silos, driven by poor collaboration between teams, and the unavailability of quality data. The latter could be due to the lack of a comprehensive data management strategy in the organization.
FROM OUR FILES ::
An Indian multinational conglomerate launched a DS/AI project, expecting it to generate high returns. But 3-4 months after deployment, the development team realized that the ROI expectations were unrealistic. This is because ROI was dependent on the usage rate, which was not accurately estimated in the ROI expectation calculation. Questions around end-user behavior and factors influencing usage rate were difficult to determine at the beginning of the project.
FROM OUR FILES ::
A large semiconductor company had undertaken a project to predict the failure rate of a circuit simulator application. During the data preparation phase, they realized that the team only had data regarding the successful runs, whereas data on the failure runs had been deleted to save space. The team worked around this data discrepancy by creating estimates of the failure rate, which greatly improved the accuracy of the predictions.
The lack of maturity in project management practices has led to overdependence on high performing talent. This is reminiscent of the early stages of the evolution of software development projects. Gartner's observation regarding this “special talent” is illuminating: “80 percent of AI projects are alchemy, run by wizards whose talents will not scale up in the organization.”
With increased project management maturity in DS/AI projects, the dependence on the extraordinary performance of a few “wizards” in the organization will come down. This will allow DS/AI solutions to scale up and enable bigger teams to work collaboratively toward more impactful solutions. Hence, the needs of the hour are project management practices that will help better harness the potential and promise of DS/AI technologies.
Professionals who are working in this field will attest to this requirement. In a survey conducted at a major data science conference in 2018, as many as 85 percent of data scientists said they believed adopting a better process would improve their results7. Our research backs up this contention, with a majority of study participants expressing a strong need for practices that are tailored to DS/AI projects.
The reason why this playbook caters to both DS and AI projects is because we view both DS as well as AI projects to be very similar and inter-connected as they need similar workflows and foundational knowledge. This is also evidenced from our research where most organizations are using the same methods to manage their DS and AI projects. That said, we acknowledge that there are differences between them in terms of their complexity. For example, AI projects deal with more unstructured data in form of images, videos, etc., as compared to DS projects.
7 Big Data Science Conference Survey Results (2018), Data Science Process Alliance
However, we find that they still need application of the same core principles for project management. So, from a project management perspective, we largely recommend the same practices for both types of projects.
“The level of ambiguity in analytics or DS projects is way higher than what you see in a software project. I have been talking about the need to change the way you look at project management in these projects. I am so glad that someone is addressing this issue.”
DS Leader at a top global management consulting company
Value Proposition of This Study:
Our central hypothesis is that DS/AI projects can realize their potential business value if they utilize a “fit for purpose” project management framework. Given the huge potential of DS/AI solutions, and the significant opportunity to unlock that potential by the application of tailored project management practices, this playbook offers a best practices-based framework to deliver better results. Through our study, we have developed a practitioner-centric framework that utilizes the experience of DS/AI leaders in eminent organizations globally. We believe organizations can derive transformative and accelerated outcomes from their DS/AI projects with the application of these practices.
Key Challenges in DS/AI Projects
Teams managing DS/AI projects grapple with challenges that are unique to these projects. As summarized in Figure 2.1, our research has uncovered characteristics that distinguish these projects from others, which give rise to challenges quite unlike any other, including software development projects. Hence, traditional waterfall and agile practices that have worked for other projects typically fail to produce a satisfactory outcome in DS/AI projects. The following are the same key challenges as in Figure 2.1 but presented in greater detail, that we have gleaned from our interactions with DS/AI leaders from 25 organizations, as well as additional research.
As a science, DS/AI projects are akin to R&D projects that require extensive experimentation, thereby making process adherence difficult:
•As many as 85 percent of organizations we spoke with have reported that their DS/AI projects required extensive experimentation across most stages. This need to experiment makes these projects somewhat similar to R&D projects and puts them in a tight spot in today's business environment with resource constraints. Unlike in R&D projects, teams in these projects are expected to manage stakeholders and provide them with regular updates, while managing tasks that have relatively unknown scope and time requirements. This attribute is most visible in the initial phases of business understanding to modelling.
•A similar view is echoed in the methods used for DS/AI projects in start-ups like Google Brain and Demand Jump. These companies advocate an unstructured method in the research phase and a more structured method in the deployment phase.
•Practitioners find it difficult to adhere to established processes, including agile methods, in these projects. Established project practices like agile require some level of estimation of time and scope, which may not be easy to undertake, especially in the early stages of a DS/AI project.
•We found that poor understanding of these unique challenges also gives rise to communication issues with business teams, and leads to unanticipated extension of the phases from discovery to modelling.
“Because there is so much experimentation, there is a possibility that the problem statement gets invalidated based on the outcomes derived.”
AI leader from the semiconductor industry
Difficulty in defining and measuring success:
•Nearly half of the organizations in our study have observed that though broad success criteria or key results areas are generally established in the business understanding phase of a project, there are issues in determining the underlying key performance indicators (KPI). Setting KPIs and pegging them to a business value in order to define success becomes difficult at the outset of a project. Factors such as the availability and quality of data, and the model's training and behavior influence the value of the success metrics.
•Project success often times cannot be established immediately after the completion of a project. The models require maintenance and refinement after deployment to improve the accuracy of the outcomes. Moreover, in such projects it is difficult to determine when they have achieved a “good enough” solution, and whether or not further efforts by the team would lead to valuable improvements.
•Some organizations have also mentioned that it is difficult to attribute business success to an AI solution as it may be a sub-set of a larger product or solution which yields business benefits.
“[It is] difficult to define success metrics of a DS/AI project. A model being successful in terms of accuracy may not mean that there will be business benefits from the outputs.”
Forbes technology council member and a leading LinkedIn influencer
Heavy dependence on the availability and quality of data, and the behavior of the data model increase uncertainty of outcomes.
•Data science projects require data cleaning and exploration to uncover insights from data. This is unique as compared to software projects, even though both kinds of projects need extensive programming. This insight from Domino Data Lab, a company that develops DS/AI solutions, lays out the problem well: “Unlike software, which implements a specification, models prescribe action based on a probabilistic assessment1.”
•An IDC-Alteryx data preparation survey in 2019 revealed that 54 million data workers worldwide spend 44 percent of their workday on unsuccessful data activities2. Lack of collaboration, clarity in requirements and knowledge of data sources are some challenges they face.
•In software projects, uncertainty is largely due to changing customer requirements. There is little unpredictability regarding project feasibility. However, as several studies have pointed out, DS/AI projects are marked by uncertainty. “In data science, if a customer has a wish, even an experienced data scientist may not know whether it's possible,” says Brian Godsey, data science thought leader and principal data scientist at probabilisti.co3. Any estimate carries a large element of uncertainty, as highlighted by Russell Jurney, principal consultant at boutique DS consulting firm, Data Syndrome. “Even relatively new fields such as software engineering, where estimates are often off by 100 percent or more, are more certain than the scientific (data science) process,” says Jurney4.
“Unlike software, which implements a specification, [DS/AI] models prescribe action based on a probabilistic assessment.”
Analytics leader at a major MNC in the IT industry
1 A data scientist's relationship with building predictive models. (2019, October 16). Dataconomy. Retrieved November 10, 2020, from https://bit.ly/38wyIfB
2 State of Data Science and Analytics report by IDC (April 2019)
3 Godsey, B. (2017, January 25). How to talk to customers to get better results. TowardsDataScience. Retrieved November 10, 2020, from https://bit.ly/2UeGm67
4 Managing data science as software engineering. Data Science Project Management. Retrieved November 11, 2020, from https://bit.ly/2GSxrnK
Most of the common project management practices have limited effectiveness when applied to DS/AI projects directly:
•Data science projects may require some horizontal slicing of work, for instance, focusing on data cleaning before modelling, which is more akin to the phases in waterfall methods. This approach is completely different from agile, which encourages frequent shipment of end-to-end vertical pieces of work.
•However, the delivery of each of the phases requires high iteration and continuous stakeholder management.
•Seventy six percent of the organizations in our study have mentioned that they use a customized methodology for managing AI/DS projects, which combines the CRISP-DM lifecycle with the stage-gate approach of waterfall, as well as the iterative nature of agile techniques.
“There is a lack of a tailored framework for DS/AI projects. Most of these frameworks are related to software development.”
Retail analytics head at one of India's largest conglomerates
Post-deployment planning for model maintenance is an important element of tackling unexpected model performance, unlike software applications that tend to behave more predictably:
•DS/AI models need to be retrained to maintain and improve the accuracy of outcomes. In other words, data science models “change as the world changes around them.” (Domino Data Lab, 2018)
•This increases the importance of model maintenance and governance in post-implementation phases, as models can generate self-reinforcing feedback loops or become self-canceling, if left unchecked. Consistent review, monitoring and quality control is necessary even after a project appears “complete.”
“Even with a standard framework, the nature of the space and experimentation mean that people will always approach the project differently.”
Data science leader in the computer hardware industry
Framework Based on DS/AI Project Best Practices
Our study validates our initial premise of the need for a project management framework that fits the unique requirements of DS/AI projects. We have selected best practices after examining the evidence presented to us by practitioners, as well as our research on the experience of companies in this area. The framework that we are presenting in this chapter consists of two key parts.
1. Resources for individuals and organizations to build the capabilities needed to drive DS/AI projects for transformative business outcomes, and
2. A best-practices-based toolkit for each stage of a DS/AI project. Depending upon the context, the practitioner may consider practices to apply from this toolkit.
The resources need to be developed or acquired before a project begins, while most practices in the toolkit can be used during a project.
The framework strengthens a team's ability to meet the unique needs of DS/AI projects in the following ways:
|Enables experimentation: The framework follows “unstructured practices” for the stages from business understanding to modelling. This includes multiple non-uniform iteration cycles (a flexible schedule as per scope of work, but committed prior to starting), that take place within a fixed, overarching timebox for the entire stage. It enables practitioners to fulfill their need for experimentation. As the models get closer to implementation, we recommend structured practices like agile with regular sprints or product increments.|
|Supports alignment on “what success means”: It encourages project workflow training for business teams, use of a framework for success metrics, and the development of high-level communication checklists. These facilitate the alignment of stakeholders to expectations and broad success metrics, which can be defined in the beginning and refined over time.|
|Shares best practices on managing data and model maintenance: As data preparation and model behavior greatly determine the success of an AI project, our framework lays the foundation for the adoption of an organization-wide data strategy, technologies like data lakes and AutoML, and the reusability of models and databases. These best practices may directly improve turnaround time or save DS practitioners from repetitive activities like data preparation. However, the extent of adoption depends on an organization's appetite for technology investments, and costs for training and hiring staff.|
|Combines aspects of agile, waterfall and Machine Learning Operations (MLOps): The framework breaks down the DS/AI project lifecycle into five broad stages similar to those in CRISP-DM and waterfall. The later stages of implementation and closing are relatively more structured and are hence suitable for iterative practices like in agile. Collaboration practices like MLOps are fit for the implementation phase. In the earlier stages that need more flexibility, the framework proposes a relatively more unstructured practice of multiple, non-uniform iteration cycles within an overarching timebox.|
|Acknowledges the impact of the framework on team members, not just team leaders; and in parallel, recognizes the need for dynamic organizational knowledge and capabilities, through experimentation and failure, to be built around this framework.|
THREE CRITICAL GAPS ADDRESSED BY OUR BEST-PRACTICES-BASED TOOLKIT:
The table to the right highlights the top three gaps in the lifecycle of a DS/AI project and the proposed toolkit to address those gaps (shown as per their sequence in the project lifecycle).
BEST-PRACTICES-BASED TOOLKIT TO ADDRESS THE IDENTIFIED GAPS:
In this section, we recommend best practices to follow for different project stages
A) “UNSTRUCTURED” BEST PRACTICES FOR BUSINESS UNDERSTANDING, DATA PREPARATION AND MODELLING:
I. Iterations with suitable timebox and scope defined upfront as per requirement:
These provide freedom for experimentation without prescribing a uniform cadence, but within a timebox as per the estimated scope of a task. While there would be an overarching objective, there is no need to create a detailed backlog for each individual iteration cycle. We do not recommend regularly timed agile sprints in these stages as they are difficult to follow. However, practitioners must be aware that giving up these practices may break the routine and commitment of teams who have been following agile sprints. With non-uniform iterations, teams can avoid burnout caused by the need to balance routine practices with extensive experimentation, something that we observed during our research. Communication with business stakeholders could happen continuously across these iterations, and their duration could be dependent on the complexity of tasks and resources at hand.
II. Timeboxing the overall stage and limiting the scope definition to avoid unexpected extension of the stage:
•A stage may have any number of iteration cycles, which would have timeboxing appropriate to the requirement. But an overarching, realistic timebox and ‘scope-box’ of the entire stage can prevent unexpected extensions. Limiting the scope definition would be important to avoid unplanned extensions in the schedule.
•A limitation of this practice is that the estimation of the timebox will be dependent on the maturity of the team and the complexity of the problem. Teams may not be able to come up with an optimal scope in the time period allotted.
III. The ‘spike story’ technique in agile may be used at this stage as a workaround to adapt agile practices1 to fit the need for research in DS/AI projects.
•What is a spike?
A spike is an experiment to which a certain portion of the team's capacity is dedicated ahead of the delivery of a ‘user story.’ With this, developers can gain enough information about the unknown elements of the story and estimate it effectively. They are timeboxed and generally do not lead to any development or a piece of code. Teams can refer to PMI's Disciplined Agile toolkit for details around the technique.
1 What's a spike, who should enter it, and how to word it? (2016, September 13). LeadingAgile. Retrieved November 10, 2020, from https://bit.ly/2It5W4K
•When is a Spike used?
•Spikes can be used in various situations:
-A story is too large or overly complex
-The implementation or a third-party tool or library is poorly understood by the team
-Inability of the team to estimate the story
-Uncertainty about completion of a story due to potential restrictions
•There are broadly two types of spikes:
•Work breakdown for complex stories
•Identifying and mitigating risks and complexities
•Making build-vs-buy decisions
•Understanding the impact of a new user story
•Evaluating solution approaches
•Can it be used for DS/AI projects?
To allow for experimentation, teams can create spikes and add them in the backlog along with other product increment ideas. Unlike a usual story, there is no expectation of progress on the deliverable itself, but rather an expectation of progress in the information gained from research to reduce uncertainty and improve planning and decision-making. For example, a use-case can be a spike story to conduct interviews with stakeholders to identify key business-based dependencies and drivers of the DS/AI project for modelling purposes.
But it may still not offer enough freedom for experimentation. This is because spike stories are considered ‘slippages’ that still need to fit into uniformly timed sprints. Achieving the expected research outcome in these uniformly timed sprints may be a challenge for DS/AI projects. So, we see that using spike stories may largely be useful only where an organization's leadership has mandated the application of agile practices.
B) BEST PRACTICES FOR THE BUSINESS UNDERSTANDING STAGE:
I. The use of checklists and SOPs: SOPs enable better process adherence and efficiency. However, organizations have so far shied away from using them extensively for DS/AI projects, as these are seen as straight-jacketed practices that might hinder experimentation. In the past, SOPs were used primarily in the deployment stage where the practices are largely structured. However, almost 45 percent of the organizations in our research have reported using SOPs for their business understanding phase. These SOPs could provide guidelines on information needed at this stage to allow better expectation setting and chalk out realistic success metrics.
II. Competency mapping for understanding training and resource requirement: Sixty-three percent of executives cite the lack of skills as a prime barrier to adopt AI technology2. This could be due to the fact that the growth of AI in the past few years has outpaced the development of competency models for AI practitioners. A competency matrix is a structure that can help assess the required and existing skills for a project, and determine what additional resources and training are required. This mapping can help improve data science process maturity and avoid over-dependence on a few expert data practitioners. Quite a few competency frameworks and matrices for DS/AI are available, for example, IBM's Data Science Skills Competency Model3. However, very few organizations actually use them. Only one organization in our study reported conducting competency mapping. Refer to the competency map on the next page4. It highlights the major skills required across the stages of a DS/AI project.
2 IBM Institute for Business Value survey on AI/cognitive computing in collaboration with Oxford Economics, 2018
3 The Data Science Skills Competency Model, IBM Analytics, 2020
4 Data Science Competency Framework, Data to Decisions CRC, 2017
Role-based Competency Mapping at an AI Service Startup
An AI service startup that builds image recognition solutions has a team of over 15 data scientists and over 50 employees in other roles. The organization is structured into development teams that build solutions, training teams for productization of the solution, verification teams for testing and project management teams for interfacing with clients.
Though the organization started with minimal project management practices and skill maps, formal processes came into place as the organization began to grow. In order to streamline its growing team, it adopted a competency matrix to identify skills required for DS/AI roles and other project management processes. For example, the skills required for an DS/AI project manager include process knowledge (organizational on-boarding process, system architecture, infrastructure requirements) and hands-on management skills (MS Projects, project planning and scheduling, costing, risk management, service agreement and statement of work formulation), among others. A role-map for various project types has also been created to map the deliverables with various roles, along with the skills required.
The competency map has enabled the organization to identify the additional resources needed for various projects, thus making the budgeting process more efficient. Employees are rated on their level of available skills as ‘experienced and can train others’, ‘experienced’, ‘on-boarded’ and ‘trainee.’ It has helped identify skill gaps and training requirements for goal setting during employees’ performance reviews. The adoption of this practice has allowed the startup to smoothly scale up a project for a client by three times in just six months.
C) BEST PRACTICES FOR THE DATA PREPARATION STAGE
I. Organizations recognize the need for collaboration for data sourcing among business, client teams, data science and engineering teams, but not many are practicing it. A whopping 80 percent of CIOs cited collaboration challenges and data silos as reasons for failure of AI projects5. Though our research indicates that as many as 68 percent of project teams in organizations collaborate with business teams and clients to identify data sources, the degree of collaboration leaves scope for improvement. We infer from our data that either the degree of collaboration is below the desired level, or organizations are simply unaware of how much collaboration is needed. Some organizations show progress on this front. A survey of 183 data science practitioners from IBM showed that data science teams are extremely collaborative, and work with a variety of stakeholders and tools during a DS project6.
Refer to the next page for the Model (Systems View) for Inter disciplinary Team Collaboration by PMI, Massachusetts Institute of Technology (MIT) and International Council on Systems Engineering (INCOSE), which is a handy resource for organizations to enable collaboration among teams.
•Organizations need to develop capabilities related to Dimensions I through IV outlined in the model. For example, in Dimension I, organizations need to develop combined standards and practices for different teams, define roles and responsibilities, and incorporate boundary-spanning systems and tools like data management platforms to facilitate data sharing.
•The ‘effective integration’ of constituents of Dimensions I to IV will likely lead to outcomes like ‘effective information sharing’ and ‘effective collaborative work,’ as highlighted in Dimension V. Organizations can re-look and re-tune aspects of Dimensions I-IV, if they do not achieve the outcomes of Dimension V.
•Achieving outcomes of Dimension V in terms of increased collaboration will eventually lead to achieving outcomes of Dimension VI (Program Performance).
5 2018 Trend Report: Enterprise AI Adoption, Databricks, 2018
6 How do Data Science Workers Collaborate? Roles, Workflows, and Tools”, Zhang, Muller, Wang, April 2020
II. Communicating shifts in timelines and planned outcomes to business teams. This depends on the availability of the business team, the ability of the data science team to handle changes in real-time and the importance of the shift with respect to the overall objective of the project. This can be done in two ways, as given below, but a large number of organizations (44%) in our research mentioned that they address service change requests almost immediately. Another 20 percent indicated that they understand the request but defer them to subsequent iterations.
•Option 1: Apprise business teams of impacts regularly and address their feedback immediately. But the regularity of the feedback cycle may be constrained by the availability of the business teams. Also, it may not always be possible for the data team to implement the changes immediately, given their prior commitments.
•Option 2: Complete the current iteration cycle and work on ad hoc feed back later. This is a way out when business teams are unavailable for regular apprising or if the data science team so prefers. This may help maintain rhythm in the work, but important updates may need to be deferred to the next cycle of iterations.
III. As covered in the capability section of the framework, organizations could invest in data management tools like data lakes and AutoML that help increase the efficiency of data collection, preparation and modelling. However, this investment may not be justifiable for just one DS/AI project.
D) BEST PRACTICES FOR THE MODELLING STAGE
•The ‘Champion-Challenger’ way of modelling to meet timelines and scope better: This involves testing multiple ‘challenger’ models at the same time before freezing on one based on the business requirements. The models’ accuracy could vary due to their inherent nature and the data available. The trade-off, for the additional time required for developing multiple models, is meeting customer requirements more effectively.
‘Champion-Challenger’ Modelling at Multibillion Dollar Technology Services Major
A global consulting and technology services organization has around 250,000 employees, with 300 employees working in its DS/AI Center of Excellence (CoE). The CoE works on projects for clients, internal organizational process improvements and futuristic solutions.
The team primarily uses CRISP-DM interspersed with agile for these projects. A key challenge is in meeting timelines due to the experimentative nature of project phases.
They started using the ‘champion-challenger’ way of modelling wherein a ‘champion’ model is developed in tandem with multiple ‘challenger’ models once the data set is available. If the ‘champion’ model fails to deliver the required level of accuracy due to its inherent characteristics or improper ‘learning’, the team shifts its attention to the ‘challenger’ models to deliver the expected outcomes. They account for the extra resources required in this approach during the resource requirement/budgeting segment of the business understanding stage.
Following this approach has helped the company in identifying the most appropriate model that works with the available data in significantly less time. If they discover new data sources in the later stages of the project, they use the tried-and-tested ‘challenger’ models to re-train and achieve better accuracy. This has not only helped in reducing the turnaround time but also in improving the outcomes of the models.
“For AI applications where accuracy is critical, like diagnosis of medical images or credit scoring in banking, the application of the champion-challenger technique has the potential to raise accuracy from 85 percent to 95 percent.”
DS LEADER AT THE ORGANIZATION
E) BEST PRACTICES FOR THE IMPLEMENTATION STAGE
I. Consistent involvement of the IT team through the project: The IT team must be engaged in the project from the beginning, and not brought in abruptly at the deployment or implementation stage. This helps ensure that risks regarding scalability can be identified sooner. However, factors like time and cost associated with involving them earlier in the process need to be considered.
II. Adoption of MLOps could help increase the likelihood of operationalizing the model and give more time to data scientists for the development of new models:
a. There is a chasm between modelling and production stages, as up to 75 percent of machine learning projects do not go beyond the experimental (modelling) phase7. It can also be a time-consuming phase for data scientists who spend at least 25 percent of their time on this stage8. This is where MLOps is beneficial, as it helps free up the bandwidth of the data scientist to focus on building new models and reduce the time-to-market for the model, as it allows for continuous training (automation of training and retraining).
b. The ability to operationalize models in AI is recognized by Gartner (2019) as a challenge for organizations. “A litmus test of organizations’ maturity is how quickly and repeatedly they can get these AI systems into production. Our surveys are showing that organizations are not managing to do this as quickly as they had hoped. The result is an organizational schism, given the high expectations executive boards have regarding the transformative power of AI,” says Gartner9.
7 Integrating Data Science and IT Operations with MLOps Capabilities, Gigaspaces, 2020
8 State of Enterprise Machine Learning, Algorithmia, 2020
9 Predicts 2020: Artificial Intelligence — the Road to Production, Mullen, Alaybeyi, Baker, Chandrasekaran, Linden, Revang, Sicular - Gartner Information Technology Research, 2019
III. An example of an MLOps workflow is given in Figure 3.6. MLOps is a development on the continuous integration/continuous delivery (CI/CD) framework, as promoted by DevOps. It further expands CI/CD on the integration side with model validation on the delivery side, along with additional complexities of ML deployments, and introduces the component unique to ML, (i.e., continuous training). This workflow may change with ML complexity, organization size, project type or business tasks.
IV. Pure agile practices for deployment: By this stage, the need for experimentation would have reduced. So, methodologies like agile in its conventional form can be used for deploying the solution efficiently. We saw overwhelming evidence in our study in support for pure agile practices at this stage.
Use of MLOps for Model Implementation at Multibillion Dollar Technology Services Major
A global consulting and technology services organization has around 250,000 employees, with 300 employees working in its DS/AI CoE. The CoE works on projects for clients, internal organizational process improvements and futuristic projects.
Many of these DS/AI projects are a part of larger software solutions. A key challenge is in maintaining the accuracy of the models after deployment as the models begin to ‘learn’ in different customer environments.
The development teams use MLOps to collaborate with IT for deployment, as it enables automation of the steps within the stage. It also incorporates continuous training, which is facilitated by a system that triggers alerts for the scenarios below:
•On-demand: If the model produces an accuracy less than the predefined threshold, then an on-demand maintenance gets scheduled. An automatic retraining (recalibration) is chosen if just retraining on new data will improve accuracy, without edits to the model. If accuracy still does not improve, the modelling team may intervene to manually add new variables (manual retraining) or algorithms (rebuilding). The team refers to these steps by the terms mentioned in the brackets.
•On-schedule: The retraining alert is generated after fixed intervals of time or when a significant amount of new data is added, even if the output accuracy does not fall below the set threshold.
MLOps has not only helped them collaborate better with the customers’ IT team, but has also improved model maintenance efficiency through the alert generation system.
“Until five years ago, a majority of our customers demanded support only to develop a standalone machine learning model. However, now 70 percent of our customers want support right from modelling through deployment, and hence, MLOps has become essential.”
DS LEADER FROM THE ORGANIZATION
F) BEST PRACTICES FOR THE CLOSING STAGE
I. The 3Es for measuring success – Efficiency, Effectiveness and Experience:
•We recommend categorizing success metrics under the categories of efficiency, effectiveness and customer experience, as per the framework in Figure 3.7. This framework represents the key focus areas in measuring outcomes for customers and emphasizes that DS teams focus on one or more as per the context, (i.e., as per the business value their customer measures).
“Identification and measurement of business impact can be vastly improved in most data science teams. The 3E framework is comprehensive and will help guide toward measurement of business impact. But during application it needs to be tailored to what the customer considers important to measure.”
Data science thought leader in the financial services sector
•A few examples of metrics under each category:
Efficiency: Resource usage (human hours saved, reduction in human error costs)
Effectiveness: Accuracy of prediction and reduction of risk during decision-making
Experience: Adoption rate, customer experience in moments of truth (NPS)
II. We recommend the use of surveys or other ways of collecting feedback from the business team, clients and users (if applicable) to understand the improvement of project practices.
G) OVERARCHING BEST PRACTICES FOR ALL STAGES OF THE FRAMEWORK:
• Risk management framework:
• The presence of a risk management framework allows for:
• Timely communication that helps align with clients regarding developments on the time and cost risks associated with any external change requests or changes identified internally by the team
• Enhanced focus and productivity of the project team as they know that the risks and the related triggers have been identified, and a process exists to deal with risks swiftly
• By mitigating and managing risks, the framework helps reduce the failure rate of the project, thus boosting team morale
• Risk mitigation could include failure management by outlining the accountability and ownership of risks/possible outcomes through an escalation matrix that triggers whom to be notified and steps to be taken if a risk seems highly probable. For example, in the modelling phase, make business teams aware of the risks associated with the model-failure and/or inaccurate predictions, and set up additional business processes to mitigate those risks.
• Quality Assurance framework:
•Overarching Quality Assurance (QA) practices help in maintaining quality standards throughout the project lifecycle
•Automating QA in every iteration and bringing it in earlier in a lifecycle is a best practice, though the feasibility to do it in all the stages in a DS/AI lifecycle needs to be validated. Some organizations in our study undertook QA only while testing the models before implementation. We recommend considering QA at every stage.
Risk Management Practices at Australian Analytics Company
A pure-play DS/AI-focused consulting company has 800 employees, with over 500 data scientists primarily based in Australia and India, and clients in over 20 countries. The company offers premium services to clients across different sectors.
It faced the challenge of providing time-bound solutions, while exceeding client expectations on quality.
The organization uses a stage-gate approach for its projects, overlaid with an end-to-end quality assurance framework that has evolved over the past 17 years. It uses Scrum practices for daily work. The quality assurance framework helps the teams to plan for quality right at the beginning of a project. The framework consists of a checklist of items to complete at every review, (e.g., data quality, data security and model accuracy). The stage gates have ensured that quality reviews take place continuously, with timely iterations being done if quality requirements are not met. The company has also constituted a Global Analytics Review Committee of internal experts that conducts regular reviews, which have enabled project teams to meet quality standards.
The organization also uses a risk management framework, and project managers are responsible for applying it continuously throughout the project in every Scrum sprint. This has enabled the organization to keep projects on track and ensure communication with clients. Key stakeholders may be involved earlier to identify downstream risks. For example, the IT team comes on board early on in a project to identify potential risks in deployment, which has helped save precious time fighting fires post deployment. These practices reinforced its overall intent to encourage the project team to identify risks upfront and get executives actively involved in mitigating risks. If a risk remains active, the client is informed in advance to set expectations and work collaboratively on mitigation or other risk response strategies.
The application of these frameworks has enabled the company to retain clients over several years and ensure compliance with the high data ethics standards of its clients.
RESOURCES FOR PRACTITIONERS AND ORGANIZATIONS
A) CAPABILITY BUILDING FOR ORGANIZATIONS IN THE DS/AI SPACE:
I. Adopt re-usability through low-code/no-code platforms, and employ citizen data scientists and knowledge management systems
Reusability/repeatability to meet customer needs faster and more effectively is an enterprise-level capability that leading organizations in the industry are focusing on. This could necessitate a shift in how work is carried out in such projects. This is happening through:
• Use of low-code platforms and citizen data scientists:
• Citizen data scientists are individuals who work in analytics and data science, and are knowledgeable in this line of work. But they do not operate at the depth of a data scientist, who are skilled at advanced data analytics and complex modelling.
• Citizen data scientists can help overcome the shortage of talent, enable speedy development of moderately complex models and facilitate communication between the core data science and business teams.
• Citizen data scientists can be empowered through low-code platforms that enable users to perform complex machine learning tasks with just a few lines of codes or a drag-and-drop visual environment.
• Enabling extensive knowledge management systems and staffing teams that maintain a reusable database of algorithms and models: Most of the organizations in our study relied on Microsoft SharePoint and GitHub to manage codes and databases. Many organizations have developed their custom knowledge management systems as well.
II. Support an enterprise-wide data strategy and invest in relevant tools like AutoML and data lakes
a) Establishing an organization-wide data strategy is critical to the success and efficiency of a DS/AI project. As many as 52 percent of organizations claim that they maintain high quality data to increase efficiency and cost savings10. A data management strategy, according to NASSCOM, covers11:
• How is the data collected?
• How is the data stored?
• How is the data documented?
• How is the data prepared?
• Are there sound data governance structures with formal procedures?
10 Embracing the data challenge in a digitized world, Experian 2018
11 Uncovering the True value of AI, NASSCOM, 2018
b) Depending upon their appetite for investment, organizations need to invest in data architecture and tools for unlocking higher efficiencies in their DS/AI solutions. For example, data lakes or integrated data warehouses can ensure better availability and collaboration, while AutoML tools enable automated data cleaning and preparation and modelling.
i. AutoML: In 2013, it was claimed that as much as 90 percent of the world's data was generated over the past two years12. Hence, the capability to manage such volumes of data must be robust. This is where AutoML comes in, with its ability to automate data pre-processing and transformation. AutoML acts as a productivity toolkit for practitioners, as it frees up their time for more value additive tasks.
ii. Data lakes: They have the ability to harness more data from more sources, and empower users to collaborate and analyze data in different ways, thus leading to better and faster decision-making. Faster big data analytics is reported to be one of the drivers for the adoption of data lakes. Almost 64 percent of respondents in a survey said that their data lake supports “large numbers of concurrent users,” while 56 percent of data lakes provide “consistent, fast performance for all types of queries13.”
c) Investing in data infrastructure shows an organization's AI vision and the maturity of its current data management practices. Becoming a data-driven organization is the key vision of most fast-growing companies in India14. Strong data management correlates with an organization's long-term AI strategy.
d) The model on the next page highlights the enablers, outcomes and components of an organization's data strategy15. Understanding the need for a data strategy through gap-analysis is an additional enabler of the data strategy. Strong data governance practices are also part of the outcomes of the data strategy, besides those outlined in the model.
12 Big Data, for better or worse, SINTEF / Science Daily, 2013
13 Data Lakes for Business Users, Eckerson Group – Arcadia, 2018
14 Uncovering the True value of AI, NASSCOM, 2018
15 Uncovering the True value of AI, NASSCOM, 2018
B) CAPABILITY BUILDING FOR EXISTING PRACTITIONERS IN THE DS/AI SPACE:
I. Specialized training for roles dedicated to manage DS/AI projects:
Training will not only empower individuals in the dedicated role to manage projects but will also enhance organizational capability for managing projects better.
Besides the basic workflow of DS/AI projects, these trainings include:
• Global best practices in project execution
• Developing DS/AI centric business case
• KPI setting and performance evaluation
• Scoping of DS/AI projects
II. Training on basic workflow of DS/AI projects:
This training is useful for practitioners or clients looking to become familiar with the management of DS/AI projects, such as development team members like data scientists or clients looking to deploy DS/AI solutions. Some of the topics include:
• The value proposition (including use-cases) and limitations of DS/AI for problem-solving
• Roles and responsibilities across the stages of an DS/AI project lifecycle
• Objectives and key deliverables of each stage, and the need for iteration and experimentation
• Basic understanding of timelines and resources needed for projects of different complexities
C) CAPABILITY BUILDING FOR NEW PRACTITIONERS:
Practitioners from other fields require awareness of the following areas:
• Understanding the world/landscape of DS/AI -Business applications of DS/AI projects, successes, failures and complexities by type of projects
• Understanding oneself - Common motivations to join the field and challenges faced, current knowledge level in terms of domain, programming and statistical techniques
• Equipping oneself - A clear understanding of roles, technical upskilling, exposure to projects and building experience. For project managers or other related roles, competencies highlighted in Figure 3.9 will be necessary16.
• Value delivery and getting better: Understanding the maturity of an organization to take up DS/AI projects, getting a mentor/role model and staying up-to-date with best practices in the field.
16 Project Management Competency Development Framework, PMI India, 2019
LIMITATIONS OF THE FRAMEWORK
This is a preliminary study that emphasizes the critical need for tailored project management practices to deliver more transformational outcomes for DS/AI solutions. We have not tried to provide a prescriptive solution. Instead, we have laid down a best practices-based framework that needs to be tailored as per the context of its application. The following are some of the limitations of the framework:
•We have not covered role-specific practices and role-specific capability enhancements
•The practices are based on learnings from some of the leading DS/AI organizations and supported by secondary research. They may still need to be tailored to an organizational context. We found significant variation in the maturity of practices among startups, service organizations and global capability centers. Depending on the investments an organization is prepared to make and the business constraints it faces, some of the proposed practices in the framework may be out of reach for them
•The framework mostly covers management related practices and may need to be supported by additional technical details for better usability
•The toolkit does not address a key issue of ethics and the impact on it due to usage of technology. That could be treated in subsequent iterations of this study or with a separate study on it
•As an exploratory work, this playbook doesn't delve deeper into other organizational capabilities and knowledge that may be necessary to be effective at AI/DS project management
How PMI and NASSCOM CoE could support Organizations
Given that organizations are facing a similar set of challenges in their DS/AI projects, there is considerable scope to improve outcomes by learning from the experience of one another. But in a business environment where organizations want to protect their competitiveness, exchange of ideas can only take place under the aegis of professional and industry associations.
PMI, the foremost name in project management in the world, and NASSCOM CoE for Data Science and Artificial Intelligence, a champion of AI adoption, have taken the lead to support organizations in their DS/AI adoption journey.
This section highlights the possible next steps for PMI and NASSCOM CoE to proliferate the adoption of best practices for DS/AI projects. The following steps could further enhance capability building of individuals and organizations, thus increasing the effectiveness of the proposed framework.
•Explore the possibility of creating a DS/AI Project Management Book of Knowledge, a repository of best practices, benchmarks, methods and tools that will be updated on a periodic basis
•Look at developing content/courseware and certifications on mission critical skills required for effective project management
•PMI and NASSCOM CoE could engage with organizations to create customized reskilling programs for their existing project managers to transition to a DS/AI project environment. There is an opportunity for the two premier organizations to co-develop upskilling courses to better equip practitioners with project management skills needed for roles such as data engineers and business analysts to help them collaborate better on projects and grow their career.
•Based on the DS/AI Project Management Book of Knowledge, PMI could consider developing an Organizational Maturity Model (OMM) similar to the Capability Maturity Model (CMM) to assess organizations and/or teams on their current state of readiness to support DS/AI projects, while providing benchmarks against best practices. The OMM may be utilized to develop a toolkit to prepare a gap analysis and recommendations for closing the gaps
•Understand the possibility of preparing an implementation toolkit to help organizations derive tangible value
•Evaluate developing a practical guide for setting up transformation offices for DS/AI, including governance models and methodologies to bring about strategic alignment between business objectives and DS/AI project execution
•The guidelines could help organizations and consultants establish centralized functions like transformation offices or enterprise project management offices, or upgrade their existing ones.
Organization Background and Existing Practices
How is your organization structured for work on projects?
Roles present in your DS/AI project team
What kind of projects has your DS team carried out in the last 12-24 months?
What stages distinguish a DS project from an AI project in terms of types of activities being carried out?
Does the organization/team follow any standard methodology for DS/AI projects?
Under which of the following metrics are there gap(s) in the current project management framework?
Functioning of Current Methodologies
Business understanding and data understanding phase
Which of the following factors are considered to decide the resource requirement for the project?
How are data sources identified?
Who are the primary entities involved in the data check process?
Team members who are not part of data or business team who are consulted during the data check process
Which stage under data quality check is the most iterative?
Are tollgate reviews with stakeholders scheduled in advance at the planning stage?
What are the common types of errors/aberrations in the data, that are corrected to ensure data quality?
How are service change requests taken care of while building/testing the model?
Do you have a model to engage with IT during the implementation phase?
What aspects are considered while creating a deployment plan?
Does your team have a way to measure project benefits?
Do you roll out a survey to capture business feedback on the project?
Do you have a knowledge management system that captures project details for reproducibility?
Role of Project Management
How are the functions of a “project manager” distributed in a project?
Which of the following stages of the AI/DS project lifecycle are iterative?
Which of the following stages of the DS/AI projects have Standard Operating Procedures (SOPs) charted out?
Which teams are expected to follow Standard Operating Procedures (SOPs) strictly?
Sanjeev Kumar Singh
Dr Kiran Marri
Nitika Singh Gaba
Beijing | Bengaluru | Brussels | Buenos Aires | Chengdu
Dubai | Dundalk | London | Mumbai | New Delhi
Philadelphia | Rio de Janeiro | São Paulo | Shanghai | Shenzhen
Singapore | Sydney | Washington, D.C.
Project Management Institute
14 Campus Blvd
Newtown Square, PA 19073-3299 USA
Tel: +1 610 356 4600
©2020 Project Management Institute. All rights reserved. PMI, the PMI logo and PMP are marks of Project Management Institute, Inc. (9/20)