Whisper

The Whisper plugin lets you transcribe audio files using OpenAI's whisper model.

What is Whisper?

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

How to set up the Whisper plugin?

Go to Settings > Plugins > Whisper. Select tab Settings then enter your OpenAI API key.

How to use the Whisper Plugin?

To use the Whisper plugin, make sure you've setup the API key. Then:

  1. Start a new chat. Choose an LLM that supports Function Calling (for example GPT-4o)

  2. Enable the Whisper plugin

  3. Drag the audio file to the chat input field (not the chat list) and tell the LLM to transcribe it

The audio file input is limited at max 25 MB. You may want to downsample the file before sending for transcription.

To do it, enable the ffmpeg plugin to downsample the audio file before sending to OpenAI server.

FAQ

  1. Can I use this offline? No. This plugin uses the OpenAI API and requires Internet connection and a paid OpenAI API account.

  2. Which whisper model does it use? The Audio API provides two speech to text endpoints, transcriptions and translations, based on our state-of-the-art open source large-v2 Whisper model. [source] To use the better large-v3 model, please use the Whisper via Groq plugin.

  3. What are the limitations? File uploads are currently limited to 25 MB and the following input file types are supported: mp3, mp4, mpeg, mpga, m4a, wav, and webm. [source]

Last updated