Facebook, Palo Alto, California, USA
Many companies look at innovation as an endeavor that's both expensive and risky.
Not Facebook. For the global social networking site, it's standard operating procedure.
“Being innovative is just a natural part of how we do business,” says David Gardner, PMP, group technical program manager at the company. “Innovation is so second nature to how we do things, we almost take it for granted.”
Working on the bleeding edge carries a certain risk, of course. But in the constantly evolving world of social media, Mr. Gardner and his team have little choice but to find ways to deliver projects faster and more efficiently.
“Our customer base constantly wants more, and we are always fending off competition,” he says. “Speed is critical.”
This need for speed was especially paramount on a multimillion-dollar project to move 1 terabyte—that's 1 trillion bytes—of data from the company's near-capacity data center to a new warehouse.
And it all had to happen without even the slightest disruption of the site experience for Facebook users and the revenue-generating internal work groups that rely on the network.
“Moving such a massive data store to a new physical location was risky,” says Doug Tai, the program manager in charge of the project. “The potential for hardware failure, transition failures or network outages was all a concern.”
The project was conceived in September 2009 and had to be completed by the end of December. Phase one involved building and outfitting a structure to better manage massive amounts of the user base's data, store more bytes in a smaller space and cut energy costs. Once the infrastructure was in place, phase two called for transferring the data safely and quickly.
PHOTO BY ROBERT HOUSER
The social networking giant moves a terabyte of data without missing a beat.
“There is a misconception in business today that being innovative means you have to spend a lot of money. But to me, innovation and saving money goes hand in hand.”
—David Gardner, PMP
The new data center's next-generation equipment required a significant up-front investment, but it wasn't a hard sell to the finance team. Clearly, there was a payoff to be had.
“There is a misconception in business today that being innovative means you have to spend a lot of money,” Mr. Gardner says. “But to me, innovation and saving money go hand in hand.”
In this case, the servers could hold four times as much data as the previous ones. The project team also chose more powerful processors and upgraded the software.
When complete, the new center would have the capacity to hold eight times more data than the previous locale and could move and manage data more efficiently. The move would not only diminish the data center's ecological footprint, it would also cut millions of dollars from its energy bill.
That is, if the team could get it built.
1 trillion bytes
The amount of data Facebook moved from a near-capacity data center to a new warehouse
“Obviously, getting all of the hardware installed and tested was critical,” Mr. Tai says.
Team members had to choose and test-drive the technology, make final selections, and get everything shipped, installed and stacked with the necessary wiring, cooling and pipes.
“It involved a lot of coordination with our hardware team, the vendors and my team to ensure all of our dates were met so that we could transfer the data on schedule,” Mr. Tai says.
To reduce the project risks, Mr. Gardner set clear expectations with both internal and external stakeholders. “If everyone understands what you are trying to accomplish and how their target goals fit into the macro goal to migrate the data, it gives people a mutual goal and focuses them on what needs to be done,” he says.
Vendors had to scramble to deliver the required equipment in such a short time. In addition, the project team had to conduct around-the-clock tests to determine whether the new technology would work with the company's systems and deliver the speed and accuracy it sought.
“When you move to a new platform, you have to be sure it's all compatible,” Mr. Gardner says.
DOWN TO THE WIRE
It was time to make the big move.
And that's when team members decided to take a major gamble—a carefully planned major gamble, of course.
Rather than loading the data onto thousands of servers and then physically transporting them, the company ramped up its private network and flowed the data directly to the new site.
“We weighed the risks and this made the most sense, considering how much data needed to be moved,” Mr. Tai says.
The possibility of lost or damaged equipment was minimized, but new dangers arose: How could the project team effectively flow so much information in a small amount of time without disrupting the entire website?
“A terabyte is a huge amount of data to move over our own network in a way that didn't create downtime,” Mr. Tai says. To put this in context, that equals 250 billion “likes” on Facebook.
He sat down with his network team and calculated how long it would take to move the data based on the existing capacity and the network's gigabyte-per-second flow rate. They determined that, barring network failures or power outages, the shift could be accomplished in three weeks.
There was still the chance that the data flow would use up the entire network capacity. To avoid the risk of downtime or, worse, a site crash, the development team built a custom application to throttle the data, limiting and monitoring the bandwidth throughout the transfer. The application also performed constant error-checking and data-level corrections to keep the flow in synch and alert the project team if problems came up.
It paid off, and the project was delivered on time, with no delays or downtime.
“There are always risks on a project like this,” Mr. Tai says, “but the constant monitoring and careful planning helped minimize them.” —Sarah Fister Gale
PM NETWORK SEPTEMBER 2010 WWW.PMI.ORG
SEPTEMBER 2010 PM NETWORK