FAQ – Open Voice Interoperability

Frequently asked questions

Why Interoperability?

This is about access, opportunity, and commercial freedom in the emerging world of conversational artificial intelligence.

Today’s conversational assistants operate largely in isolation. When a user is talking to an assistant and needs some information that it doesn’t have, the user is forced, in the best case, to open another assistant and start over. Sometimes finding another assistant is even impossible because the right assistant only works on a device the user doesn’t have.

This situation is in sharp contrast to the World Wide Web, where web browsers allow users to find, and freely visit, billions of web pages. All of these billions of web pages are, for the most part, hosted by very different organizations. This is possible because web pages are based on standards that all web browsers understand.

The Open Voice Interoperability Initiative envisions a similar ecosystem of standards-based, collaborating, conversational assistants which would be much more similar to the web. Assistants using these standards would allow users to easily switch between conversational assistants – and their language models – when they need different information, just as they do when navigating among web pages.

Will generative AI eliminate the need for interoperability?

No. In fact, we believe that it will increase the need for interoperability.

Generative AI is a type of artificial intelligence system that is trained on massively large amounts of text, and can understand, summarize, generate, and predict new content. Examples of generative AI systems include OpenAI’s ChatGPT and Google’s Gemini.

No one conversational assistant will do it all, and no one generative AI system will do it all. Generic systems do not know (and users don’t want them to know!) sensitive information such as your address, social security number, or salary. But a conversational assistant from your bank, university, or doctor WILL know your sensitive information!

Depending upon what they want to do, users will move back and forth between assistants that don’t know their sensitive information and domain- and brand-specific assistants that do know your sensitive information. Users’ lives will be simplified if all assistants can interoperate.

Why will conversational assistant developers prefer a universal API rather than different APIs for each service?

Simplicity: Developers need to learn how to use and maintain only the single universal interface rather than multiple APIs for each service.
Consistency: the single universal interface cannot be changed by any service provider.
Scalability: the single universal interface can handle a wider range of requests than any vendor-specific API.

What will we need to do to use the Open Voice Network universal API? Will we need to modify our conversational assistant in any way?

Existing conversational assistants will need to be able to interpret and generate the standard set of OVON messages whenever they need to communicate with other OVON-compliant assistants. The internal operations of existing assistants will not need to be modified.

What Open Voice Interoperability Initiative specifications have been published?

Conversation Envelope

The OVON Conversation Envelope is a universal JSON structure whose purpose is to allow human or automatic agents (assistants) to interoperably participate in a conversation. When coupled with a specific protocol, such as HTTPS, a dialog agent that can generate and send Conversation Envelopes is capable of inter-operating with any other OVON-compliant agent, regardless of the technology or architecture on which that other agent is based.

Dialog Events

The purpose of a dialog event is to define a generic standardized data structure that can be used in any component of a dialog system to express a ‘language event’, that is to say, any features associated with a phrase, utterance or part of an utterance. Dialog events span a certain time period and are associated with a single speaker.

Assistant Manifest

The Assistant Manifest is a structured description of the key characteristics and capabilities of a conversational assistant that is associated with a unique serviceEndpoint. The manifest can be thought of as the curriculum vitae of the conversational agent and a public record of the services that it offers. It can be used, for example, by other agents or users to decide whether to invite a particular agent to join a conversation. In this regard, it is particularly relevant to discovery agents who provide services to other agents to help them find assistants to achieve certain tasks for them.

The Assistant Manifest is intended to be used as a component of other specifications; it is not a stand-alone document.

What about proprietary protocols?

There is no expectation that adopting these standard patterns and messages would prevent existing conversational assistants from continuing to support native or proprietary communication protocols in addition to the standards described above. The standard patterns and messages can coexist with proprietary or legacy protocols as needed in any given system. Systems that need to communicate with each other using proprietary protocols can still do so, but if they need to communicate with standards-based systems, they can use the standards with those systems

What will we need to do to use the Open Voice specifications? Will we need to modify our conversational assistant in any way?

Existing conversational assistants will need to be able to interpret and generate the standard set of interoperable messages whenever they need to communicate with other interoperable assistants. The internal operations of existing assistants will not need to be modified.