Scaling massive infrastructures--the project management effect
“Do not repeat the tactics which have gained you one victory, but let your methods be regulated by the infinite variety of circumstances”
(Sun Tzu 490 BC. ¶6)
“The demands and the amount of work that it takes to put something like [Facebook] into place, it’s just so much that if you weren’t completely into what you were doing and you didn’t think it was an important thing, then it would be irrational to spend that much time on it.”
(Zuckerberg, 2010, ¶1)
In order to truly understand what it’s like to be a project manager at Facebook you have to understand how it all began. In February 2004, Harvard University sophomore Mark Zuckerberg founded “The Facebook.” By the end of the month, over half of the undergraduate population at Harvard was registered on the service. By December of the same year, the service exceeded one million users.
Now, just six years later, Facebook has grown from college hobby to one of the preeminent Internet companies in the world and a model of innovation.
How did this happen? The answer to that question will be the subject of debate for many years in news articles, scholarly papers, and even Hollywood movies. The subject of this paper is more modest: How did program management contribute and how can program management skills be applied in similar situations of massive infrastructure scaling?
Overcoming Cultural Challenges
“The most important thing is to keep the most important thing the most important thing.”
The Hacker and the Program Manager
At Facebook, the term “hacker” is used to describe our engineers. If you look up the definition of a “hacker,” you will find a variety of answers, including many negative ones, but at Facebook the phrase is used as a sign of endearment and respect. Fundamentally, a hacker is someone who can be counted on to get things done; not only that, but he or she can get things done quickly and creatively. It’s interesting to compare the characteristics and values associated with hackers with those traditionally associated with program managers.
Characteristics of a hacker:
- High intelligence/IQ
- Emphasis on action
- Pride in ability to accomplish goals independently and with minimal overhead
- Aversion to oversight and bureaucracy
Characteristics of a program manager:
- Balance of IQ and EQ
- Attention to detail
- Organizational and logistical skills
- Ability to plan ahead
- Excellent communicator
A quick review of these characteristics suggests that, the hacker and the project manager do not naturally live together in harmony. Our first challenge, then, will be to overcome these potential areas of conflict.
Most importantly, the program manager needs to avoid micromanagement. He or she needs to give clear guidance about what needs to be done, and then let the hackers hack. Not only is it important to avoiding bureaucracy, but it is equally important to avoid the perception of bureaucracy. Find the right balance of oversight needed for each person on your project team and manage him or her accordingly.
Respect the hacker’s time. Large status meetings are usually a waste of time; keep them short and action-oriented or avoid them entirely.
Demonstrate the value of program management. If you take care of the “business stuff,” then the hackers are free to hack! Not only is this what they are happiest doing, this is their value to the company! Always keep this in mind, because a hacker who is constantly spending time keeping you up to date on his or her progress is a hacker who isn’t delivering his or her principle value.
Lastly, hackers respect knowledge and experience. Hire project managers with technical backgrounds. Spend time learning the subject matter and don’t expect hackers to explain everything to you. After all, the chief motto of the hacker community is “Read the Freaking Manual (RTFM)!” Take this to heart.
One of the key cultural values at Facebook is moving fast. In general, in the Internet services space, in order to keep up with the competition, new feature additions need to be rolled out quickly and iterated upon constantly in order to stay ahead of the evolving demands of your user base. Facebook has truly taken this lesson to heart and the implications are far reaching. The company is constantly moving forward at an incredible pace in all areas, not just those related to developing the facebook.com product.
The Facebook “Hackathon” is an example of how quickly the company moves. A Hackathon is an intentionally unstructured, hacker-friendly event in which engineers are encouraged to drop what they do on a daily basis and spend 24 hours (literally overnight) on something completely different. Hackathons are the incubators in which many of Facebook’s most profound products are born. There have been examples of products that were conceived during Hackathons and pushed out to the production environment within days or weeks.
As project and program managers, you will likely have a number of concerns with that last point. There are tangible risks associated with moving fast and Facebook management accepts these risks. How then, can we as program managers adapt?
Be flexible and emphasize fast context switching. In order to stay ahead of this, it is just as critical for the project management team to have streamlined processes and be able to pull together last-minute critical projects in a realistic, time-efficient, and cost-effective manner. Avoid unnecessary meetings; if a meeting is necessary, consider the default meeting time to be 30 minutes or less and always have a well-structured agenda.
Communicate efficiently. Don’t overload your customers with more information and questions than they want or need. Avoid “send and forget” e-mails; be more active in your communication. At Facebook, we have the advantage of working in the same building. This is obviously not the case in all companies, but the lesson is the same: face-to-face contact is better than telephone conversations, and telephone conversations are better than e-mails. On a related note, understand the response expectations of various communication media. E-mail is not appropriate for “real-time communication.”
Clearly communicate risks without slowing down the effort
Be “the rock.” The rock is the source of consistency and truth that can be relied on by others throughout the turbulence of fast-moving projects. Status meetings should not be used to bring you up to speed; they should be used to solve problems.
Be the face of the solution. For example, product teams trying to launch a new product want to focus on that product and not have to worry about the logistical complexities involved. Understand and execute their needs without forcing them to jump through hoops or answer unnecessary questions. This allows the project management team to be able to construct processes on the backend that enable the deployment of the services to be congruent with the philosophy of the architecture (e.g., redundancy, space for organic growth, and forethought toward future feature changes). If you are doing your job correctly, it looks easy.
Know your key processes and invest the time and effort needed to optimize them. For example, a key tool for the project management at Facebook is our ticketing system. By integrating our project plans with our ticketing system, we were able to greatly reduce the amount of manual analysis and reconciliation involved in status reporting. Another example is project documentation. If a tree falls in the forest and nobody hears it, does it make a sound? Documentation is only useful to the extent that it is read, so only spend time documenting what will ultimately provide value. Keep in mind that documentation often becomes outdated almost immediately, so the smaller the scope of the documentation, the easier it will be to keep up to date.
Lastly, don’t insist on process for the sake of process. Sometimes it’s vital, but other times it’s just annoying. Know the difference.
Overcoming Technical Challenges
“Technological progress is like an axe in the hands of a pathological criminal.”
Albert Einstein (2010)
Project managers at Facebook are held to a high standard of technical competency, and demonstrating technical competency is a key aspect in gaining trust and communicating effectively. But the technical challenges facing project managers do not end at communication; Facebook is a company of “doers.” Project managers are given a large degree of leeway to solving the problems they are given, with limited resources and time. It is rare to have a full set of requirements at the onset of the project. As a result, we are expected to wear many hats and often serve in roles outside of traditional or “pure” project management, including architectural design, process building, strategic planning, and business analysis. Following are a few examples of how the project management team helped overcome technical challenges.
Infrastructure Architecture Design at 90 MPH
Although simplicity is certainly the ideal, building and operating a complex Internet service, individualized for 500 million users, inherently leads to unique architecture and scaling challenges, and a degree of complexity is inevitable.
We have one of the largest open-source software installations in the world, which includes:
- Largest MySQL database implementation
- Largest memcache (distributed caching system) implementation
- This technology allows for a reduction in direct database calls through caching
One of the major value-adds that project managers provide at Facebook is acting as information aggregators and disseminators across functional groups. For this reason, my project management team is responsible for coordinating architecture reviews. As new technologies and tools are proposed, the project management team makes sure that the right people are informed and present when decisions are made.
If micro-level architecture decisions are made too hastily, they can have long-term ramifications on the macro-level architecture, but at the same time we need to be cognizant of “paralysis by analysis” and move fast in accordance with Facebook’s ideals. We do this by calling short, intensive meetings with the key decision makers and owning the action items that come out of those meetings.
Beating the Shell Game
The incredible rate of improvement in computing hardware (e.g., Moore’s law) presents companies with large computing infrastructures with great challenges as well as opportunities. More computing power for the same price is a good thing, but this also means that investments depreciate quickly. The phrase “hardware refresh cycle” (Exhibit 1) describes the process of periodically updating computing resources.
Exhibit 1 - Hardware Refresh Cycle
The hardware refresh cycle for large companies can come quickly and with force (if you are purchasing tens of thousands of servers). It’s critical that migrations become the norm, with hardware refresh cycles of <=3 years. These are large-scale programs that need constant maintenance.
An efficient and scalable way to tackle hardware refresh cycles is to make sure that your architecture is carved with the foresight of these future changes in mind. One way to do this is to look at the most critical components within the service (holistically) and scale them into logical chunks. As seen below (Exhibit 2), there are tiers of the architecture that are larger and more predominant than other tiers. These tiers should be broken down into separate logical chunks in order to allow for quicker, more effective and scalable changes.
Exhibit 2 – Architecture Tiers
Note the “other” server; “other” is the consolidated group of smaller services, outside the major three. “Other” is still vital to the overall architecture, but it’s not part of the 80th percentile. What I have found with scaling massive architectures is that “other” components need to be combined with like “other” components in order to form their own logical chunk. This allows for two things:
- You are able to make sure that the “other” components do not pollute the 80th percentile components
- It’s easier to maintain the “other” chunk from a macro-level perspective
In all the large-scale Internet service companies I have worked for, this has been a truth and a critical principle for scaling the environment, because what’s new today will be old in three years...
Feeding the Monster
Another key technical challenge that cannot be forgotten is scaling server provisioning needs, especially, when it comes to “other” one-off requests. In a large, fast moving company such as Facebook, not all server needs can be forecasted in advance.
Many large-scale Internet service companies have project managers dedicated to provisioning and deployments, which is not a bad thing. That being said, your ability to automate the processes that it takes to provision a server will enable you to reduce the number of project managers it takes to manually manage a large magnitude of provisioning requests. In order to understand how to decrease the need for manual intervention, you need to understand the processes. Once the processes are completely understood, you need to have a separate program to tackle the provision (time-to-deliver) goals you have set.
Overcoming Organizational Challenges
“It must be considered that there is nothing more difficult to carry out nor more doubtful of success, nor more dangerous to handle than to initiate a new order of things.”
(Machiavelli, nd, ¶1)
There are many ways to organize a project management team and, in many cases, there’s a lot of passion behind the debates for the various methodologies. Let’s look at these methodologies in a simplistic way.
Centralized project management organization – also referred to as a project management office (PMO), this model is usually managed by a project manager and reports to a central program/project sponsor (Exhibit 3)
Exhibit 3 - Centralized project management organization
Decentralized project management organization – In this model (Exhibit 4), the project management team is distributed across various business units and can report to engineering managers directly, or a project manager who reports to an engineering director
Exhibit 4: Decentralized project management organization
Throughout my career, I have worked in both models and have seen the pros and cons of each first-hand. For example, with the centralized project management organization, you are able to more easily scale across other organizations (in a matrixed way) and use project management resources more efficiently. Whereas the decentralized model, usually allows a project manager to get deeper into a specific area of knowledge but can also be costly from a resourcing perspective and cut-off critical cross-functional communication.
Today, my team is formed in a centralized model (Exhibit 5), but it has been virtually embedded into other teams.
Exhibit 5: Centralized Model
This allows me to achieve the best of both worlds. Project managers are assigned logically to programs and/or projects that match their backgrounds and strengths. The key win is below:
- Functionally distribute project managers in order to provide the most horizontal impact, while still supporting the company’s business goals
The model that we are currently operating in is constantly revisited as we progressively grow as a company. That being said, we have found that scaling the organization and providing the most value with the least amount of resources tend to keep pointing us back to this model.
A key component to our success with leveraging project managers in a matrixed fashion and embedding them is ensuring that they have technical DNA, which, to me, means that they are knowledgeable in the areas they are program managing. Some of the key benefits of following this practice are listed below:
- This gives you the ability to leverage the technical expertise to gain credibility and drive programs to completion
- The more expert knowledge the project manager has, the more likely he or she is able to quickly understand and identify risks, issues, or slips.
- Overall, better integration and trust are demonstrated when working with functional organizations
Determining organizational priorities is the most critical “magic” the project manager can provide for fast-growing Internet services. The key principles for effectively prioritizing and communication programs and projects are listed below:
- Roll up programs and/or projects into a portfolio and make sure that status is communicated weekly
- It’s critical to make sure that your communication e-mails are crisp, clear, and consistent
- Make sure you have a simple priority scale
- We currently use four priority types: Critical, high, medium, and low
- Make sure that the criticality scale is strategically aligned with the company’s growth goals
- Weekly program and/or project priority updates ensure consensus across the organization, holistically
- Priorities can, and should be, changed in an agile way: This allows for continuous alignment and flexibility
- Resources should be applied based on program priority
- This is critical when resources are scarce and programs and/or projects are numerous
“In the end, it’s not the years in your life that count. It’s the life in your years.”
(Lincoln, nd, ¶3)
We covered a number of recommendations throughout the course of this paper. The key takeaway is that in order to effectively scale massive infrastructures, as a project manager, you need to be cognizant of the following four key components (Exhibit 6):
- Technical competency
- Be a knowledge expert and you will build trust with project teams
- Change agility
- There’s always room for change, and in order to stay competitive, dynamic change is a must.
- “Light touch” management
- Keep engagements light, but impactful
- Willingness to get your hands dirty
- Don’t just be a thought leader, lead by example when you have the opportunity to.
Exhibit 6: Four Key Components
The experience of working in environments with massive infrastructures is unique and something that stretches you both personally and professionally. The combination of just in time requirements and the need for sustainable organization and planning, at times, can collide. That being said, it’s critical to make sure that you always keep things light, have fun, and stay flexible. In my opinion, having fun is the key to anything you do in your career. By maintaining an upbeat attitude and an approachable demeanor, you allow yourself to succeed, especially in the project management space.
This approach is near and dear to my heart. I preach this attitude within my team and practice it in my management style, both in people management and program/project management. By keeping this as a priority, my teams have been able to deliver remarkable accomplishments, while at the same time keeping our work–health index extremely high.
In closing, whether you are managing 1 project or 100 projects, these principles will guide you to succeed and enable you to build healthy sustainable project teams.
Coduto, D. P. (2001). Foundation design Prentice Hall: Upper Saddle River, NJ
Lincoln, A. (nd) In Brainy Quote. Retrieved 7/17/10, from http://wwwv.brainyquote.com/quotes/keywords/end.html.
Machiaveli (nd) In Project Management Quotes. Retrieved 7/17/10, from http://www.12manage.com/quotes_pp.html
SunTzu (490) In Project Management Quotes. Retrieved 7/1/10, from http://www.12manage.com/quotes_pp.html
Zuckerburg, M. (2010) In Dealmakers Quote of the Week – Mark Zuckerburg Facebooks CEO. Retrieved 7/7/10 from http://www.dealmakersblog.com/dmark-zuckerberg-quotes-facebook-ceo/
Originally published as part of Proceedings PMI Global Congress 2010 – Washington D.C.