Whisper
OpenAI's open-source speech-to-text model that converts audio to text with high accuracy across 99 languages.
What is Whisper?
Whisper is OpenAI's open-source automatic speech recognition (ASR) model that transcribes audio into text.
It's trained on 680,000 hours of multilingual audio from the web, which means it handles accents, background noise, and technical language better than most transcription tools.
Builders use it through the OpenAI API, run it locally via Python, or integrate it into apps for real-time transcription. It works for both transcription (audio to text in the same language) and translation (audio to English text).
The API costs $0.006 per minute of audio. The model is fully open-source on GitHub, so you can run it locally for free if you have the compute.
Good to Know
How Vibe Coders Use Whisper
Frequently Asked Questions
Your Idea to AI Business In Days
Join Dan, Zehra and 0 others building AI businesses in days with video tutorials and 1 on 1 support.
Related Terms
The practice of crafting specific instructions to get better outputs from AI models like ChatGPT, Claude, or Gemini.
AI voice platform that generates ultra-realistic speech from text, clones voices, and dubs content into 29+ languages.
The AI research company behind ChatGPT, GPT-4, DALL-E, and the APIs that power thousands of AI products.
Computer systems that learn from data and perform tasks that typically require human intelligence, like recognizing patterns and making decisions.
AI technology that creates a digital replica of someone's voice from audio samples, capturing their unique speech patterns and tone.
Join 0 others building with AI