Apple clearly increases in its upcoming operating system versions in the transcription of live audio or recordings. In various tests, the performance has now been compared with common other speech recognition models. However, the results are mixed: Apple’s new API, which is provided in iOS 26, Ipados 26 and MacOS 26 Taoe, is significantly better at the speed than, for example, the widespread whisper model from Openai. However, there is still room for improvement in accuracy.
The Apple News Blog MacStories tried out the improved speech framework With a 34-minute video file. For the transcription, Apple’s APIs test A tool called Yap used that can be called up on Github. It did the task in just 45 seconds, while the rather popular MacWhisper tool with its large models took between 1:41 minutes and 3:55 minutes.
How the models are compared
The 9to5mac news page had Apple competed against Nvidia Parakeetwhich is considered very quick, and against Openai Whisper Large V3 Turbo. The test calculator was a MacBook Pro with M2 Pro and 16 GB Unified Memory. While Parakeet managed the 7:31 minutes long Audio file in 2 seconds, Apple’s transcription needed 9 seconds. The Openai model was only finished after 40 seconds. The longer the audio file was, the further the models were apart.
But Whisper’s slowness in turn paid off in the accuracy. A distinction was made between the proportion of character defects (character rate, cer) and word errors (Word Error Rate, who). On average, Whisper Large V3 Turbo proved to be 0.3 percent and one of 1 percent as the most precise solution. Apple had an average rate of 3 percent in the signs and 8 percent for words. Parakeet is significantly back with a CER of 7 percent and a who of 12 percent.
What Apple’s API is recommended for
As a result, Apple’s transcription promises a clear speed advantage over Whisper and does not make as many mistakes as the Nvidia model. The testers come to the conclusion that the choice of the model is primarily a question of the application. Apple’s model is recommended for time-critical applications such as live subtitles or the rough transcription of longer content for indexing. Whisper has the front when only minimal post -processing is desired or in applications where it depends on the accuracy.
Discover more from Apple News
Subscribe to get the latest posts sent to your email.