Big data

new tools for mitigating project complexity

Marco Guerini, Trento RISE, Researcher


If we analyse the Standish Group statistics of the last decade (2000-2009) we are faced with a contradiction: a significant downward trend of failed projects (from 23% in 2000 to 15% in 2004), followed by a rise, to a level worse than the starting one (24% in 2009).

We believe that the reason for this inversion of trend lies in the increase of uncertainty and complexity of the operation contexts that project, program, and portfolio managers must deal with.

While it is true that factors like uncertainty and variability of the project context cannot be easily managed, it is also true that ignorance is a key complexity amplifier— i.e., the inadequate knowledge of the information available and the relationships among the elements involved.

We believe that the digital material of a project generally considered irrelevant or useless can provide valuable information to explore the stakeholders’ world and the strengths/weaknesses of the communication network of a project.

Project as a Complex Adaptive System (CAS)

During the last decade many studies and researches (e.g., Curlee & Gordon, 2011) showed that projects have many characteristics in common with living systems and, in wider terms, with the so-called Complex Adaptive Systems (CAS).

Some of the typical characteristics that we can find within both are:

  • Non linear relationships: The presence of many autonomous elements interacting with each other in various ways, not referable to simple cause-effect relationships.
  • Systemic view: The difficulty of comprehending and representing their global structure.
  • Auto-organization: The capability to autonomously generate collective behaviours as the result of reciprocal interactions, constant feedbacks and the application of simple rules of behaviour.
  • Edge of chaos: The capability to modify their internal structure and co-evolve with the external context thanks to a dynamic balance between order and disorder.
  • Deduction versus abduction: The difficulty to forecast the future state and, as a consequence, the limits of the approaches based upon historical data.

Being aware that a project follows the same dynamics and behaviours of a CAS, the PMI Northern Italy Chapter has been developing a multi-year research (Varanini & Ginevri, 2012) aimed at answering to the following questions:

  • What are the main elements and factors necessary for understanding and handling project complexity? In other words, following the metaphor used by Michael Cavanagh (2011) in his presentation during a recent PMI Congress, what are the typical “complexity amplifiers” in a project?
  • What are the approaches and tools than can enhance the comprehension of these factors? In other words, can we identify some “complexity absorbers” that can mitigate the effects of the above-mentioned “complexity amplifiers”?

With regard to the first question, the list can be very long depending on the project type. Typical “complexity amplifiers” are requirements stability, the contractual agreement, the impacts of new technologies, the severity of time/cost/quality constraints and many others. For this reason, we decided to concentrate our attention on two interrelated elements that are critical for the success of a project: the network of the involved stakeholders and the characteristics of the communication system.

As far as the second question is concerned, our attention was concentrated on the so-called “big data,” that is the huge amount of digital information where the most part of the “tacit knowledge” of a project is hidden (and therefore not directly available for decision making).

How to Explore the Stakeholders Networks

Previous considerations have shown that, in order to understand the characteristics of a complex project, we need tools to explore the so called “stakeholders’ world.” This is a system of relationships that connects the nodes representing the stakeholders themselves. Within this context, the network structure is certainly the most intuitive and powerful form of representation. First of all, because it allows us to model different types of ties, such as family, collaboration, trust, shares ownership, etc. without necessarily knowing their behaviour. Secondly, because, on the basis of the theory of social networks, we can extract a set of measures to characterize the network itself, estimate the evolution over time and compare the expected behaviour of different networks. Let's imagine representing the network of the stakeholders in a project using a graph made of nodes (indicating the stakeholders) and ties connecting them. We can limit our analysis to the exchange of information through e-mails and we can adopt the shape of “direct” or “indirect” graph. The first indicates mono-directional links and the second simple links without specifying if they are reciprocal or not.

Independently from the number of entities or the type of graph used, the social network analysis gives us the following useful measures for the governance of the project:

  1. Node's degree: This is the number of links that a node has with others and quantifies its importance in the exchange of information. If the direct graph is adopted, this measure is expressed through incoming and outgoing grades. This is a very useful distinction because we can easily identify, not only the so-called “hubs” but also the so-called “bottlenecks” that share only little information they have.
  2. Minimal path: This is related to a pair of nodes and corresponds to the shortest path that links them. Generally, this measure is expressed as an average, by calculating the minimal path between two nodes and then determining the total average for the whole graph.
  3. Clustering coefficient: This is related to a pair of nodes A and B, both connected to a third node C and expressed the probability that A and B are connected to each other. This measure is particularly useful in a project context too. In fact, for a node with a very low coefficient, it is probable that the network structure will not change if it disappears. On the contrary, for a node with a very high coefficient, it is probable that the network will change significantly, even splitting into smaller networks if the node disappears. If this measure is applied to the whole network, a high coefficient can show the presence of cohesive teams within the network.
  4. Number of components: Considering that a component corresponds to a group of interconnected nodes, a network is usually characterized by two types of elements: a major component, containing 90% of the network, and one or more minor components, which can be absorbed by the major one when they are connected to it.
  5. Betweeness centrality: This is related to a node or a link and identifies the minimum number of paths that it crosses. This information is useful for a project as well, because links with high betweeness correspond to the main information channels in the stakeholders’ network.
  6. Weight: This corresponds to the number of times that this relationship appears in the input information, such as the e-mails exchanged in a project.
  7. Diameter: This is the longest path to connect two nodes. This measure became famous thanks to the studies of Stanley Milgram, who showed that six link are enough to connect any two individuals in the world (Wikipedia, 2008). If we consider the stakeholders’ network, this measure can help us to discover other important actors, such as “peripheral people” and the so called “boundary spanners” who connect the centre of the network with the external context.
  8. Average distance: This measure is strongly related to the previous one and corresponds to the average of the distances between all pairs of nodes.

Having highlighted the wide set of possibilities of analysing the stakeholder's network of a project and identified its strengths/weaknesses, the questions are: What the role of the big data is and how can we facilitate the exploration?

We know that there is a big range of tools to analyse big data in order to understand the market demands and its trends. However, our objective was simply to adopt an open source tool to import the basic information of the network, analyse it, and generate a set of graphical and statistical reports.

We then decided to exploit the functionalities of Cytoscape, which is one of the numerous free tools available in the network with the aim of analysing some projects developed in recent years. We chose to use the database containing all the e-mails exchanged in each project by extracting the names of senders and receivers. Even if the dimension of our experiment is not big enough to create statistical evidence, we can say that the results obtained have been very encouraging for at least two reasons:

  • First, much information relative to the size and weight of relationships between stakeholders didn't match at all with those contained in the official documentation (plans, meeting minutes, stakeholder register, etc.).
  • Second, we found that the lowest performing projects in terms of correspondence between objectives/results and customer/team satisfaction were those in which the measures extracted from the network analysis showed some criticalities. These are the presence of many bottlenecks, the excessive length of the path between nodes, the lack of uniformity in the distribution of the weights associated to each node, and the very low clustering coefficient if compared to those of other projects.

In any case, we intend to continue our analysis in order to strengthen our hypotheses and provide the right keywords to interpret the stakeholders’ network and its metrics.

How to Explore the Digital Communication

Information is vital: it creates emotions, moves ideas, and brings people to act; this holds for projects environments as well. By analysing social and linguistic dynamics, we want to understand how to build consensus and promote spreading of information within a given context. Usually the big picture includes the analysis of who delivers the content, what the content is about, how the content is presented, when it was delivered, and where it was delivered. In this paper, by means of a series of exemplar scenarios, we will focus on some of the characteristics that linguistic communication needs to be effective. In particular, we will rely on linguistic analysis—using approaches from the field of Natural Language Processing (NLP)—for the automatic recognition of the persuasive impact of communication. The focus will be on the analysis of big corpora specifically developed for the task. Such approaches that refer to political communication and social networks diffusion can be successfully imported in project management scenarios, as the previous sections suggested.

Corpora and Persuasion Indicators

A Corpus is a digital collection of texts from a specific author, on a given topic, of a given type. For our purpose linguistic data should be possibly augmented with annotation of various audience reactions and metadata.

  • To analyze consensus we will focus on Political Speeches tagged with audience reaction.
  • To analyze spreading we will focus on posts from social networks annotated with I_like, number of comments, etc.

Political Speeches. A corpus of 2,700 political speeches transcriptions annotated with audience reactions has been developed (Guerini, Strapparava, & Stock, 2008), relying on the hypothesis that these tags, such as APPLAUSE, are indicators of hot-spots where persuasion attempts succeeded or, at least, a persuasive attempt had been recognized by the audience; on this point see Bull and Noordhuizen (2000) on mistimed applauses in political speeches. We can then perform specific analyses—and extractions—of persuasive linguistic material that caused the audience reaction. Given that the corpus is composed of transcriptions of speeches mostly given at public mass gathering, in general the audience is favorable to the speakers and the context is one of support. Therefore, the audience, so to say, resonates to a fragment of speech, which is meant to be of a persuasive genre and mostly concerned with a concept or a conceptual framework the audience is already persuaded of. To be successful, the speaker's expression that immediately leads to the audience reaction must have been coherently composed. As for what concern audience reactions, we individuate three main groups of tags:

  • Positive-Focus: this group indicates a persuasive attempt that sets a positive focus in the audience. Tags considered: {APPLAUSE}, {STANDING-OVATION}, {CHEERING}, etc.
  • Negative-Focus: It indicates a persuasive attempt that sets a negative focus in the audience. Note that the negative focus is set towards the object of the speech and not on the speaker herself (e.g., “Do we want more taxes?”) Tags considered: {BOOING}, {AUDIENCE} No! {/AUDIENCE}.
  • Ironical: Indicate the use of ironical devices in persuasion. Tags considered: {LAUGHTER}.

Social Networks posts. Virality, also known as information spreading, is a phenomenon strictly connected to the nature of the content being spread, rather than to the influencers who spread it. Furthermore, virality is a phenomenon with many facets; i.e., under this generic term several different effects of persuasive communication are comprised and they only partially overlap. Some of these effects include:

  • Appreciation: How much people like a given content, for example by clicking an I like button.
  • Simple buzz: How much people tend to comment a given content.
  • White buzz: How much people tend to comment in a positive mood (e.g., “The best product I have ever bought”).
  • Black buzz: How much people tend to comment in a negative mood (e.g., “Do not buy this product, it is a rip-off”).
  • Controversially: The ability to split the audience in different parties (usually pro and against the given content).

In the following we will provide some examples in the dimensions of the big picture (namely who, when, what and how).


Who. While Opinions Leaders—and similar—can be easily represented as nodes in a graph with a high “centrality” coefficient, they also have a particular language style that characterize them. This suggests that we can identify those who can potentially draw a crowd, within a group, by analyzing their language (Quercia, Ellis, Capra, & Crowcroft, 2011). Also a gender issue can have an impact on communication: a study showed that a female's rhetoric is far less aggressive than a male's: negative-focus tags density is 60 times higher in male's rhetoric (Guerini, Giampiccolo, Moretti, Sprugnoli, & Strapparava, in press). Project managers should carefully choose who should deliver the communication according to context.

When. Some works (e.g., TrackSocial, 2012) already showed that there are preferable times for conveying information (e.g., posting on Twitter), depending on the desired effect (reading in mid morning and early afternoon, resharing late afternoon). This means that it is important to deliver content, like an e-mail, when users are highly receptive, taking into account the desired effect (only reads or in-depth analysis). Still, time can play a major role with regard to persuasive language also in connection to events that split the timeline in a before and after. For example, the word “war” was used five times more by G. W. Bush after 9/11. But, while before 9/11 it was widely used to get applauses (in different contexts like “war on drugs”) after, when war was a real option, it never got applauses (Guerini et al., 2008). This suggest that specific events can lead a good communicator/leader to change, not as much his/her words, rather their rhetorical/persuasive use.

What. It is widely agreed that in highly updated information flow environments adding images to the content can foster the spreading of the content. This holds true for a post, but also for an e-mail, a presentation, etc. In fact, graphical and pictorial information grab users’ attention, and are consistent with “rapid cognition” models interpretation (Kenny, 1994). In these models the user exploits cues, not directly related to the content, to decide, in a limited amount of time, what action to take.

How. Emotions can play a major role in information diffusion. Positive language is more viral than negative (anger and fear are viral, but not sadness). In details, what matters the most is affective arousal—joy, anger and fear have high arousal, while sadness has a low arousal (Berger & Milkman, 2009). These findings can help in critical decision making; e.g., to convey negative news without getting others down it is better to use language with high arousal. Another example is regarding the difficulty in readability of a text (Guerini et al., 2012). A study on the diffusion of scientific abstracts showed that a text that is easy to read brings about an immediate action, while a text hard to read induces people to procrastinate. Let us now consider the use of vulgar language: counterintuitively, it does not necessarily bring about negative reactions. Coarse language is found in posts with lots of comments or likes (coverage 1.2), but not in controversial posts (coverage 0.9). A good leader can actually use coarse language to obtain positive reactions (Strapparava, Guerini, & Özbal, 2011).

Finally, irony and simple language can bring about consensus: Ronald Reagan—aka the “great communicator”— used irony a lot and far less than the classical “positive focus” rhetoric (laughter density three times higher as compared to other speakers). With regard to Reagan's overall style, his criterion was “Would you talk that way to your barber?” as reported in Collier (2006). He wanted his style to appear “simple and conversational.” To verify this statement, the hypothesis that a simple and conversational style is more polysemic than a “cultured” style was checked. His persuasive words (and only those) had a polysemy degree doubled as compared to other speakers (Guerini et al., 2008).


We believe that digital material is the main asset to exploit for the governance of a project. This is particularly true for unstructured information that is not referenced as input or output of a project management process. We are convinced of the importance and potentiality of “big data” and that the research initiatives will be intensified in the near future in order to provide new approaches for managing project complexity and uncertainty. The PMI Northern Italy Chapter will continue to promote this effort.

Berger, J. A., & Milkman, K. L. (2009). Social transmission, emotion, and the virality of online content. Social Science Research Network Working Paper Series.

Bull, P., & Noordhuizen, M. (2000). The mistiming of applause in political speeches. Journal of Language and Social Psychology 19, 275-294.

Cavanagh, M. (2011, October). 2nd order project management. PMI Global Congress 2011, USA, Dallas, TX.

Collier, K. (2006). Writing for the great communicators: Speechwriting for Roosevelt and Reagan. In Proceedings of the Southwest Political Science Association Meetings. San Antonio, TX.

Curlee, W., & Gordon, R.L. (2011). Complexity theory and project management. Hoboken, NJ: John Wiley & Sons.

Guerini, M., Giampiccolo, D., Moretti, G., Sprugnoli, R., & Strapparava, C. (in press). The new release of CORPS: A corpus of political speeches annotated with audience reaction.

Guerini, M., Pepe, A., & Lepri, B. (2012). Do linguistic style and readability of scientific abstracts affect their virality? In Proceedings of ICWSM-12.

Guerini, M., Strapparava, C., & Stock, O. (2008). CORPS: A corpus of tagged political speeches for persuasive communication processing. Journal of Information Technology & Politics, 5(1),19-32.

Kenny, D. (1994). Interpersonal perception: A social relations analysis. The Guilford Press.

Quercia, D., Ellis, J., Capra, L., & Crowcroft, J. (2011). In the mood for being influential on twitter. In Proceedings of IEEE SocialCom-11.

Small-world experiment. (2008). In Wikipedia, the free encyclopedia. Retrieved March 10, 2013, from

Strapparava, C., Guerini, M., & Özbal, G. (2011). Persuasive language and virality in social networks. In Proceedings of ACII-11.

TrackSocial. (2012). Optimizing Facebook Engagement, whitepaper.

Varanini, F., Ginevri, W. (2012). Projects and complexity. Boca Raton, FL: CRC Press.

© 2013, Walter Ginevri & Marco Guerini
Originally published as a part of 2013 PMI Global Congress EMEA Proceedings – Istanbul, Turkey



Related Content