Advanced Voice Mode (beta)

Your guide to voice chats in BoltA

Voice conversations allow you to have a spoken conversation with GPT-4o using the OpenAI's Realtime API. You can ask question or have discussions through voice input and receive spoken response from GPT-4o.

BoltAI supports both OpenAI and Azure OpenAI Service.

Prerequisites

Advanced Voice Mode (AVM) relies on OpenAI's Realtime API and requires a valid OpenAI API key, or a valid Azure OpenAI deployment.

To setup AVM with Azure OpenAI Service, follow the guide below.

How do I start a voice conversation?

To start a voice conversation, select the Voice icon on the bottom right of the chat window.

You will be taken to a screen with an animated orb in the center.

Click Start
Grant BoltAI the microphone permission
Start your conversation

How to change the AVM configurations?

Tweak AVM System Prompt:

BoltAI automatically uses your conversation's System Prompt for the voice conversation. With this, you can reuse your already-defined AI Assistant.

Change AVM assistant's voice:

To change the assistant's voice, click the gear button on top right of the AVM screen. You can choose one the the following voices: Shimmer (default), Alloy or Echo.

Don't forget to reconnect after updated.

Show realtime cost:

Realtime API can be really expensive when using for a long time. BoltAI automatically calculates the cost and updated the conversation. In the AVM settings dialog, you can choose to show the estimated costs in realtime.

How to use AVM with Azure OpenAI Service?

First, make sure you have your deployment ready. Follow this official guide from Azure to deploy yours. Once deployed, you should have your API Endpoint and API Key.

In BoltAI, go to Settings > Advanced > Advanced Voice Mode, check "Use Azure OpenAI Service" and fill the form.

Make sure the API endpoint and key are correct. BoltAI won't verify the configurations.

Known Issue: Usage data seems to be incorrect for Azure OpenAI Service.

FAQs

For how long can I have voice chats?

The Realtime API currently sets a 15 minute limit for session time for WebSocket connections. After this limit, the server will disconnect. In this case, the time means the wallclock time of session connection, not the length of input or output audio.

Can I use plugins (function calling)?

Not yet.

Which model is used for AVM?

BoltAI currently uses gpt-4o-realtime-preview-2024-10-01. Note that the LLM model of the current chat configuration has no effect in AVM.

Can I get a transcription of the voice conversation?

Yes. After the conversation finished, you can find the full transcription of the conversation in the current active chat.

I cannot connect to the Realtime API?

Please check your internet connection and make sure your API account has enough credits. If the issue persist, please file a bug request at https://boltai.com/ideas

Other caveats:

Due to macOS privacy policy with embeded scripts, the OS might ask for the microphone permission every time.

PreviousDocument Analysis NextImage Generation

Last updated 8 months ago

Was this helpful?