Will the future of the Internet be the voice? Proposing a World Wide Voice Web

The World Wide Web (WWW) and the WWW browser have permeated our lives and revolutionized the way we get information and entertainment, socialize, and do business.

Using novel tools that make it easy and inexpensive to develop voice-based agents, Stanford researchers now propose the creation of the World Wide Voice Web (WWvW), a new version of the World Wide Web that people will be able to navigate complete. through the use of voice.

Some 90 million Americans already use smart speakers to stream music and news, as well as perform tasks like ordering food, scheduling appointments and controlling their lights. But two companies essentially control these voice-to-voice web gateways, at least in the US: Amazon, which pioneered Alexa; and Google, which developed the Google Assistant. In effect, the two services are walled gardens. These oligopolies create large imbalances that allow technology owners to favor their own products over those of rival companies. They control what content to make available and what fees to charge for acting as intermediaries between companies and their customers. On top of all that, their proprietary smart speakers endanger privacy because eavesdropping on conversations as long as they are plugged in.

The Stanford team, led by computer science professor Monica Lam in it Stanford Open Virtual Assistant Laboratory (OVAL), has developed an open source, privacy-preserving virtual assistant called Genie and cost-effective voice agent development tools that can offer an alternative to proprietary platforms. The academics also organized a workshop on November 10 in which they analyzed his work and proposed the design of the World Wide Voice Web (see the whole event).

What is the WWvW?

Like the World Wide Web, the new WWvW is decentralized. Organizations post information about their voice agents on their websites, which can be accessed by any virtual assistant. In WWvW, Lam says, voice agents are like web pages, providing information about your services and applications, and the virtual assistant is the browser. These voice agents can also be available as chatbots or call center agents, making them accessible on the computer or by phone as well.

“WWvW has the potential to reach even more people than WWW, including those who aren’t technically savvy, those who can’t read or write well, or don’t even speak a written language,” says Lam. For example, Stanford assistant professor of computer science Chris Piech with graduate students Moussa Doumbouya and Lisa Einstein, are working to develop voice technology for three African languages that could help bridge the gap between illiteracy and access to valuable resources, including agricultural information and health care. “Unlike the commercial voice web spearheaded by Amazon and Google, which is only available in select markets and languages, the decentralized WWvW empowers society to deliver voice information and services in all languages ​​and for all uses, including education and other humanitarian causes that don’t have big monetary returns,” says Lam.

Why haven’t these tools been created before? The Stanford team says: It’s very difficult to create voice technology. Amazon and Google have invested huge amounts of money and resources to provide AI natural language processing technologies for their respective assistants and employ thousands of people to annotate the training data. “The technology development process has been expensive and extremely labor-intensive, creating a high barrier to entry for anyone trying to deliver commercial-grade intelligent voice assistants,” says Lam.

unleashing the genie

For the past six years, Lam has worked with Stanford doctoral student Giovanni Campagna, a professor of computer science james landayY Christopher Manning, a professor of computer science and linguistics, at OVAL to develop a new voice agent development methodology that is two orders of magnitude more sample-efficient than current solutions. The open source Genie Pre-trained Agent Generator they created offers dramatic cost and resource reductions in developing voice agents in different languages.

Interoperability is a key component to ensuring devices can seamlessly interact with each other, Lam says. At the core of the Genie technology is a distributed programming language they created for virtual assistants called ThingTalk. It enables the interoperability of multiple virtual assistants, web services, and IoT devices. Stanford currently offers the first course on ThingTalk, Conversational virtual assistants using deep learningthis autumn.

Starting today, Genie has pre-trained agents for the most popular voice skills like playing music, podcasts, news, restaurant recommendations, reminders and timers, as well as support for more than 700 IoT devices. These agents are openly available and can be applied to other similar services.

World Voice Web Conference

The OVAL team presented these concepts in a workshop focused on the World Wide Voice Web on November 10.

The conference featured speakers from academia and industry with expertise in machine learning, natural language processing, human-computer interaction, and IoT devices, and panelists discussed building a voice ecosystem, pre-trained agents, and the social value of a voice network. The Stanford team also did a live demo of Genie.

“We want other people to join us in building the World Wide Voice Web,” says Lam, who is also a Stanford faculty member. Human Centered Artificial Intelligence Institute. “The original World Wide Web grew slowly at first, but once it became popular, there was no stopping it. We hope to see the same with the World Wide Voice Web.”

Genie is an ongoing research project funded by the National Science Foundation, the Alfred P. Sloan Foundation, the Verdant Foundation, and Stanford HAI.

Leave a Reply

Your email address will not be published.