Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Will the Future of the Internet Be Voice? Proposing a World Wide Voice Web | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
news

Will the Future of the Internet Be Voice? Proposing a World Wide Voice Web

Date
November 02, 2021
Topics
Natural Language Processing

Stanford AI researchers map out the technology for adding voice to the decentralized web that can be accessed openly by any virtual assistant.

The World Wide Web (WWW) and the WWW browser have permeated our lives and revolutionized how we get information and entertainment, how we socialize, and how we conduct business.

Using novel tools that make it easy and inexpensive to develop voice-based agents, researchers at Stanford are now proposing the creation of the World Wide Voice Web (WWvW), a new version of the World Wide Web that people will be able to navigate entirely by using voice.

About 90 million Americans already use smart speakers to stream music and news, as well as to carry out tasks like ordering groceries, scheduling appointments, and controlling their lights. But two companies essentially control these voice gateways to the voice web, at least in the United States – Amazon, which pioneered Alexa; and Google, which developed Google Assistant. In effect, the two services are walled gardens. These oligopolies create large imbalances that allow the technology owners to favor their own products over those from rival companies. They control which content to make available, and what fees to charge for acting as intermediaries between companies and their customers. On top of all that, their proprietary smart speakers jeopardize privacy because they eavesdrop on conversations as long as they’re plugged in.

The Stanford team, led by computer science Professor Monica Lam at the Stanford Open Virtual Assistant Laboratory (OVAL), has developed an open-source privacy-preserving virtual assistant called Genie and cost-effective voice agent development tools that can offer an alternative to the proprietary platforms. The scholars also hosted a workshop on Nov. 10 that discussed their work and proposed the design of the World Wide Voice Web (watch the full event).

What Is the WWvW?

Just like the World Wide Web, the new WWvW is decentralized. Organizations publish information about their voice agents on their websites, which are accessible by any virtual assistant. In WWvW, Lam says, the voice agents are like web pages, providing information about their services and applications, and the virtual assistant is the browser. These voice agents can also be made available as chatbots or call-center agents, making them accessible on the computer or over the phone as well.

“WWvW has the potential to reach even more people than WWW, including those who are not technically savvy, those who don’t read and write well, or may not even speak a written language,” Lam says. For example, Stanford computer science Assistant Professor Chris Piech, with graduate students Moussa Doumbouya and Lisa Einstein, are working to develop voice technology for three African languages that could help bridge the gap between illiteracy and access to valuable resources including agricultural information and medical care. “Unlike the commercial voice web spearheaded by Amazon and Google, which is only available in select markets and languages, the decentralized WWvW empowers society to provide voice information and services in every language and for every use, including education and other humanitarian causes which do not have big monetary returns,” Lam says.

Why have these tools not been created before? The Stanford team says: It is just very hard to create voice technology. Amazon and Google have invested tremendous amounts of money and resources to provide the AI Natural Language Processing technologies for their respective assistants and employ thousands of people to annotate the training data. “The technology development process has been expensive and extremely labor-intensive, creating a huge barrier to entry for anyone trying to offer commercial-grade smart voice assistants,” Lam says.

Unleashing Genie

Over the past six years, Lam has worked with Stanford PhD student Giovanni Campagna, computer science Professor James Landay, and Christopher Manning, professor of computer science and of linguistics, at OVAL to develop a new voice agent development methodology that is two orders of magnitude more sample-efficient than current solutions. The open-source Genie Pre-trained Agent Generator they created offers dramatic reductions in costs and resources in the development of voice agents in different languages.

Interoperability is a key component to ensure that devices can interact with each other seamlessly, Lam notes. At the core of the Genie technology is a distributed programming language they created for virtual assistants called ThingTalk. It enables interoperability of multiple virtual assistants, web services, and IoT devices. Stanford is currently offering the first course on ThingTalk, Conversational Virtual Assistants Using Deep Learning, this fall.

As of today, Genie has pre-trained agents for the most popular voice skills such as playing music, podcasts, news, restaurant recommendations, reminders, and timers, as well as support for over 700 IoT devices. These agents are openly available and can be applied to other similar services.

World Wide Voice Web Conference

The OVAL team presented these concepts at a workshop focused on the World Wide Voice Web on Nov. 10.

The conference included speakers from academia and industry with expertise in machine learning, natural language processing, computer-human interaction, and IoT devices, and panelists discussed building a voice ecosystem, pretrained agents, and the social value of a voice web. The Stanford team also conducted a live demonstration of Genie.

“We want other people to join us in building the World Wide Voice Web,” says Lam, who is also a faculty member of the Stanford Institute for Human-Centered Artificial Intelligence. “The original World Wide Web grew slowly at the beginning, but once it caught on there was no stopping it. We hope to see the same with the World Wide Voice Web.”

Genie is an ongoing research project funded by the National Science Foundation, the Alfred P. Sloan Foundation, the Verdant Foundation, and Stanford HAI.

Share
Link copied to clipboard!
Contributor(s)
Edmund L. Andrews

Related News

MedArena: Comparing LLMs for Medicine in the Wild
Eric Wu, Kevin Wu, James Zou
Apr 24, 2025
News

Stanford scholars leverage physicians to evaluate 11 large language models in real-world settings.

News

MedArena: Comparing LLMs for Medicine in the Wild

Eric Wu, Kevin Wu, James Zou
HealthcareNatural Language ProcessingGenerative AIApr 24

Stanford scholars leverage physicians to evaluate 11 large language models in real-world settings.

Language Models in the Classroom: Bridging the Gap Between Technology and Teaching
Instructors and students of CS293
Apr 09, 2025
News

Instructors and students from Stanford class CS293/EDUC473 address the failures of current educational technologies and outline how to empower both teachers and learners through collaborative innovation.

News

Language Models in the Classroom: Bridging the Gap Between Technology and Teaching

Instructors and students of CS293
Education, SkillsGenerative AINatural Language ProcessingApr 09

Instructors and students from Stanford class CS293/EDUC473 address the failures of current educational technologies and outline how to empower both teachers and learners through collaborative innovation.

An Open-Source AI Agent for Doing Tasks on the Web
Katharine Miller
Mar 27, 2025
News

NNetNav learns how to navigate websites by mimicking childhood learning through exploration.

News

An Open-Source AI Agent for Doing Tasks on the Web

Katharine Miller
Machine LearningNatural Language ProcessingMar 27

NNetNav learns how to navigate websites by mimicking childhood learning through exploration.