Advanced Voice Mode (beta)
Your guide to voice chats in BoltA
Last updated
Your guide to voice chats in BoltA
Last updated
Voice conversations allow you to have a spoken conversation with GPT-4o using the OpenAI's Realtime API. You can ask question or have discussions through voice input and receive spoken response from GPT-4o.
BoltAI supports both OpenAI and Azure OpenAI Service.
Advanced Voice Mode (AVM) relies on OpenAI's Realtime API and requires a valid OpenAI API key, or a valid Azure OpenAI deployment.
To setup AVM with Azure OpenAI Service, follow the guide below.
To start a voice conversation, select the Voice icon on the bottom right of the chat window.
You will be taken to a screen with an animated orb in the center.
Click Start
Grant BoltAI the microphone permission
Start your conversation
Tweak AVM System Prompt:
BoltAI automatically uses your conversation's System Prompt for the voice conversation. With this, you can reuse your already-defined AI Assistant.
Change AVM assistant's voice:
To change the assistant's voice, click the gear button on top right of the AVM screen. You can choose one the the following voices: Shimmer (default), Alloy or Echo.
Don't forget to reconnect after updated.
Show realtime cost:
Realtime API can be really expensive when using for a long time. BoltAI automatically calculates the cost and updated the conversation. In the AVM settings dialog, you can choose to show the estimated costs in realtime.
First, make sure you have your deployment ready. Follow this official guide from Azure to deploy yours. Once deployed, you should have your API Endpoint and API Key.
In BoltAI, go to Settings > Advanced > Advanced Voice Mode, check "Use Azure OpenAI Service" and fill the form.
Make sure the API endpoint and key are correct. BoltAI won't verify the configurations.
Known Issue: Usage data seems to be incorrect for Azure OpenAI Service.
The Realtime API currently sets a 15 minute limit for session time for WebSocket connections. After this limit, the server will disconnect. In this case, the time means the wallclock time of session connection, not the length of input or output audio.
Not yet.
BoltAI currently uses gpt-4o-realtime-preview-2024-10-01
. Note that the LLM model of the current chat configuration has no effect in AVM.
Yes. After the conversation finished, you can find the full transcription of the conversation in the current active chat.
Please check your internet connection and make sure your API account has enough credits. If the issue persist, please file a bug request at https://boltai.com/ideas
Due to macOS privacy policy with embeded scripts, the OS might ask for the microphone permission every time.