Skip to main content Skip to secondary navigation
Page Content
Image
An engineer tests a "Xiaoyi" robot, a Siri-like voice assistant, which links the user to Lanchuang's intelligent elderly care system, at the headquarters of Lanchuang Network Technology Corp in Weifang, Shandong province, China, July 25, 2019.

REUTERS/Jason Lee

An engineer tests a voice assistant called "Xiaoyi". Voice assistants require massive amounts of data for training. Now Stanford researchers have created a tool that automatically synthesizes sample conversations.

Amazon Alexa and Google Assistant may seem like entertaining if occasionally spooky smart-speakers, but these and other voice-based virtual assistants are becoming a primary gateway between the Internet and our private lives.

Alexa can order groceries from Whole Foods or products from Amazon, check your bank balance, pay bills, and connect to your insurance company. It can turn on house lights, start up your car, and tell you how much gas is in the tank. Google Assistant can track your appointments, screen calls, order a ride-sharing service, and book restaurant reservations. It can also control smart devices in the home, from thermostats and doors to entertainment systems. All told, Amazon Alexa now has 100,000 “skills” and Google Assistant has more than 1 million capabilities, the companies say.

Because voice-based platforms are so difficult to build and train, however, Google and Amazon have what amounts to a duopoly in smart-speakers. Companies wanting to connect with customers through their virtual assistants will probably have to go through one of those two platforms. If one of the major platforms wants to favor its products over those of its rivals, it could easily do so. And as consumers share more private information with these two companies, that data reinforces Google and Amazon’s dominance.

Now, a team of researchers at Stanford has developed an open-sourced virtual assistant that they hope will dramatically widen the competition and protect privacy. Stanford’s Open Virtual Assistant Lab, or OVAL, uses a novel and comparatively inexpensive approach for training virtual assistants. And because it’s an open-source platform, it can continuously improve as its users contribute additional code or skill sets to a shared repository.

“We see ourselves as the new Firefox — an open platform that’s not dependent on a giant company with its own competitive interests,” says Monica Lam, the faculty director of OVAL, who is also a Stanford professor of computer science and a Stanford Institute of Human-Centered Artificial Intelligence faculty member.

The key PhD students working on the new virtual assistant, named Almond, are Giovanni Campagna, Silei Xu, Sina Semnani, and Mehrad Moradshahi.

Early results for Almond have been so promising that the Alfred P. Sloan Foundation recently awarded a $1 million grant to OVAL for developing practical prototypes.

OVAL’s Genie

At the moment, training virtual assistants is prohibitively expensive for most companies because the task requires vast datasets of natural language samples that have been laboriously annotated by human experts. Those samples come mainly from paid workers who engage in dialogues or from eavesdropping on what customers say.

To get around that roadblock, the Stanford team created a “Genie” tool that automatically synthesizes sample conversations. It does so based in part on generic principles about transactional dialogues and in part on knowledge about a particular arena. Because the system doesn’t rely on real dialogue collected from untold numbers of people, the training data can be acquired at a tiny fraction of the normal cost.

“It sounds like a lot of work, but we can automate the process because there’s a general model for transactional conversations,” Lam says. “That model works whether you’re trying to book a restaurant or find and play a movie. The key is to generate enough variety at both the sentence level and the dialogue level to capture what you would encounter in real life.”

The generic template might start with a question like “What is the X of Y?” The letters are placeholders for words that are likely to come up in specific subject areas, such as “address of the restaurant.”

Beyond making the training process much less expensive, the automated conversation generator offers a big benefit to privacy: It eliminates the need for eavesdropping on customers, which Amazon and Google both do to improve the skills of their virtual assistants.

Early Results

In recent tests, the upcoming Almond 2.0 assistant that was trained about restaurants correctly answered around 70 percent of crowdsourced complex questions on its knowledge base. It actually outperformed leading commercial virtual assistants on certain complex restaurant questions, such as “Can you find me a restaurant that serves seafood and has an average rating of at least 4 stars?” Last but not least, the researchers also found that the neural network was able to transfer some of its restaurant-based skills to the hotel arena.

Those are remarkable results, given that the Almond assistant was created by only a handful of people. For perspective, Amazon’s Alexa business employs more than 10,000 people.

Equally important, says Lam, is that the open-source platform is designed to spur collaboration between developers, which should steadily improve its capabilities over time.

“Everything we do is open-source,” says Lam. “Skill builders can contribute to the system in many ways. They can expand and improve the generic dialogue model. They can contribute information to the knowledge base, because many companies in the same industry share common properties. And all of this data can be used to transfer learning from one domain to another.”

Stanford HAI's mission is to advance AI research, education, policy and practice to improve the human condition. Learn more