Over the last two decades, the role of the web browser has evolved in a profound way. What was once a piece of software all its own, not unlike a word processor or photo editor, is now the backdrop to much of our online lives, whether we’re banking, collaborating with coworkers, or playing video games. It’s the web itself that changed the world—the browser is merely our gateway to it.
Likewise, it’s tempting to treat today’s virtual assistants like isolated gadgets; novelties that let us order shoes or skip to the next song without getting up from the couch. Like web browsers, however, their true value comes from the world they connect us to: a linguistic web, where thousands of capabilities are accessible through natural language. It’s a fundamentally new way to interact with our technology, and we’ve only just begun to understand the role it can play in our lives.
Unfortunately, the spirit of openness that characterized the early web is absent in today’s virtual assistants—a concerning thought given the rapidly expanding reach and power of these devices. That’s why we’re founding The Stanford Open Virtual Assistant Lab, or OVAL. It’s a world-wide open-source initiative intended to confront what we believe are the three major challenges facing the future of this technology: avoiding fragmentation of the linguistic web, democratizing the power of natural language interfaces, and putting privacy back in the hands of consumers.
Imagine if every web browser connected to its own, proprietary version of the internet, complete with its own formats and protocols. Consumers would face inconsistent access to online content, and the task of creating a website—let alone maintaining it—would be orders of magnitude more complex.
This is the reality of today’s virtual assistants. Platforms like Amazon’s Alexa and Google Assistant may be open to third parties, but their proprietary nature means nothing created on one can be accessed by the others. As a result, they connect their users to a linguistic web, not the linguistic web. And the landscape grows more fractured by the day.
At OVAL, we’re building an alternative. It’s called Thingpedia, and it uses open-world collaboration to collect every task, feature and data source a virtual assistant could want in a non-proprietary format. Its rapidly growing capabilities already include access to content from outlets like the New York Times and apps like Spotify, interfaces for online accounts like Dropbox and Twitter, and integration with devices ranging from your Fitbit to your Nest thermostat.
Thingpedia means virtual assistants of all kinds can connect their users to the same shared world. It encourages competition by sparing upstart virtual assistant developers the burden of reinventing the wheel (or rather, tens of thousands of wheels) simply to catch up with incumbents. It lets consumers comparison shop without worrying about whether a particular function will be accessible to the assistant that suits them best.
Best of all, because Thingpedia’s skill representation includes all information expected by Alexa and Google Assistant’s skill platforms, Thingpedia skills can be automatically added to both without additional work. This dramatically eases development for third parties relying on voice assistants to reach their users in new ways—including thousands of startups and small businesses. Rather than juggle multiple platforms, they can focus their development on Thingpedia while maintaining the largest possible audience.
Organizing all these features in one place is a great start, but it’s only a first step. What about the underlying technology that allows users to trigger them?
Today’s virtual assistants are based on neural networks capable of transcribing the human voice and intelligently interpreting the results. The accuracy of such networks requires a significant amount of training data, typically acquired through manual annotations of real data by a large workforce.
However, while the tech giants have made some truly incredible progress, we believe the linguistic user interfaces, or LUIs, of tomorrow will simply be too complex—and too fundamental to the future of computing—to leave their destiny in private hands. That’s why we’re building LUInet, an open-source neural network that provides an alternative to the capabilities at the heart of today’s commercial assistants. Additionally, we’ve developed an innovative tool called Genie that helps domain experts create natural language interfaces for their products, at a greatly reduced cost, and without in-house machine learning expertise. By empowering independent developers and by collecting their contributions from different domains, LUInet is positioned to surpass even the most advanced proprietary model developed by a single company.
LUInet’s sophistication, however, is best exemplified by our unique ability to understand never-before-heard sentences that combine functions from different domains. While most virtual assistants are limited to narrow, transactional commands like “skip to the next song” or “open the garage door”, LUInet is built to understand the flexible logic we use in everyday conversation, like “send me a text notification whenever I get an email from work with a PDF attachment over ten megabytes.” With a single phrase, entire problems can be solved. This is made possible by having LUInet directly translate natural language into programs.
Finally, the proprietary nature of today’s virtual assistants means their creators have total control over the data passing through them. That includes personal information, preferences and behavior, as well as hours upon hours of voice recordings.
OVAL is changing that with Almond, a complete virtual assistant with a unique focus on privacy and transparency. Not only can it access every function in Thingpedia, and interpret complex commands thanks to LUInet, but it was built from the ground up with privacy preserving measures that let you explicitly control if, when and how data is shared. For example, a user can tell her Almond assistant, running on her own device, that “my father can see motion on my security device, but only if I am not home”. No third party sees any of the shared data.
Almond provides a model for the flexibility we should expect from tomorrow’s assistants. For those willing to share their data with advertisers, cloud-based assistants available at little or no cost make perfect sense. For power users with an eye on privacy, locally-run solutions can ensure personal data never leaves the device. And countless variations exist between these extremes. And globally, assistants hosted by different organizations should inter-operate so data can be shared without centralization—just as email does today. The prescription, therefore, is simple: more choice. After all, it won’t be long before our virtual assistants know us nearly as well as our human colleagues. Shouldn’t we have a say in where that knowledge goes?
The breakout success of assistants from companies Amazon, Google and Apple is a testament to the power of natural language interfaces. But as big as these brands are, the potential of the linguistic web is even bigger. At OVAL, we envision a future in which this potential is accessible, interoperable, and above all, worthy of our trust.
Visit the open-source Almond project at https://almond.stanford.edu. We welcome all contributions, and all of our software is publicly available at https://github.com/stanford-oval. More information on the lab can be found on https://oval.cs.stanford.edu.
And stay tuned for the first Open Virtual Assistant Workshop, to be held on October 30, 2019, part of the Stanford HAI Conference.