For Making Voice-Activated Computer Interface a Part of Everyday Life (Most Influential Projects: #13)
Alexa, what's the weather? Alexa, set a timer for 3:30. Alexa, what year did Brazil host the World Cup? Alexa, play “Purple Rain” by Prince.
In homes around the world, the voice-activated assistant Alexa has become ubiquitous, integrated into a million everyday moments. Amazon has sold over 100 million Alexa-enabled devices—and sparked a frenzy of efforts to develop skills for the voice assistant to perform. There are now some 70,000 skills available through the device, each one making the technology more valuable—and more mainstream.
Alexa's runaway success and influence started in a shroud of secrecy. In 2011, when Amazon bought voice-recognition startup Yap, former vice president of research Jeff Adams and his team of engineers were flown to Amazon's headquarters with no advance details. Only once on-site was the team briefed on Amazon's top-secret project idea: create a voice recognition system that would let users interact with a smart speaker from anywhere in a room with no screen involved.
“My first response was, ‘I'm sorry, it can't be done,’” says Adams. Amazon wasn't deterred. “They knew it was a moonshot,” Adams goes on. “I was told to spend whatever it took to make it happen.”
A project that sponsors originally hoped would be completed in nine months ultimately took three years and a budget that's estimated to exceed US$200 million for the voice technology alone. The high price tag didn't dissuade Amazon CEO Jeff Bezos. “To Amazon's credit, once we proved it could be done with a lot of work, they committed the resources,” Adams says. “They understood that it was a problem worth solving.” The journey proved to be a memorable one.
The Noise and the Signal
Within Amazon, the Alexa effort was referred to as Project D. “No one knew what we were up to,” Adams says. The company had previously greenlighted three top-secret initiatives, with results ranging from the hit Kindle e-reader to the phenomenal flop of the Fire Phone smartphone. A public misstep might have prompted some companies to rethink their innovation labs, but Amazon took the phone failure in stride, says Ahmed Bouzid, former head of product on the Alexa project team.
“It was the culture at Amazon that failures are learning opportunities,” explains Bouzid, now CEO of Alexa skills development company Witlingo. “The people on that team felt like they did what they could, and they were ready to tackle the next project.” Many of those team members were moved over to Alexa.
Anyone who has set a voice-activated timer while stirring a simmering pan and listening to music might be hard-pressed to remember that, in the early days of speech recognition, the technology couldn't interpret human language from more than a short distance away. “So many echoes are created that the mic gets eight or nine copies of a sound at different delays,” Adams says. It would become muddled and impossible to interpret.
To solve that challenge, Adams assembled a team of 60 of the best speech and language scientists and engineers. For nearly a year, his team fine-tuned machine-learning technologies capable of parsing out human speech from background din and ambient echoes. Then, to train the technology to actually interpret that language, the team fed it massive amounts of voice data. They rented spaces across the country and spent a year hiring temporary workers—with different accents, speech patterns and vocal cadences—to read scripts at various distances from the microphone. (Adams left Amazon just before Alexa launched to found voice-tech firm Cobalt Speech and Language.)
By the end of its third year, the Alexa team had created speech technology capable of recognizing and interpreting speech from more than 5 feet (1.5 meters) away. At the same time, parallel teams were developing the hardware for the Echo speaker, creating Alexa's voice and developing applications, like reporting the weather and time, turning on music, and setting reminders and alarms. The final Echo device includes a speech-recognition tool that only listens for its name and, once heard, triggers a second, cloud-based speech recognition tool that interprets what the user says, gets the answer and responds.
PHOTO COURTESY OF AMAZON
Allowing third parties to build applications for the platform was a vital part of Amazon's strategy to grow Alexa's reach and boost the overall voice assistant marketplace, says David Attwater, a user experience expert at Enterprise Integration Group. While most people still use Alexa for only a handful of applications, they have an enormous range to choose from. “From a project perspective,” Attwater says, “the team thoroughly considered the aspects of what would make it useful and delivered a well-thought-through solution.”
When the project began, Amazon estimated an Echo Dot—a smart speaker module with Alexa technology—would retail for US$50. By launch, project costs had driven the consumer price close to US$200. Amazon rolled out the product as an invitation-only pilot in November 2014. “By June, it was clear the product was going to be a success,” Bouzid says.
Since then, Alexa hasn't only moved into the mainstream—it's become integrated into everything from home stereo systems and electrical outlets to cars, vacuum cleaners, tabletop lamps, bathroom mirrors, televisions, smoke detectors and thermostats.
“We are just beginning to see how voice-first technology can help people in their daily lives,” says David Hakanson, vice president, chief information officer and chief innovation officer at Saint Louis University, which hosts a competition for students to develop Alexa skills in St. Louis, Missouri, USA. Says Bouzid: “We didn't just create a platform. We helped create a whole new voice-first industry.”