FastVLM: Apple's new image-to-text-Ki should be much faster

When Apple Intelligence indicated in spring 2024, the research publications of Apple engineers were the first harbing. In retrospect, it was already evident at that time which focus on Apple in its own AI models, including the suitability, to operate some models locally on iPhone and Co. If this is repeated this year, a significantly better model can be expected for the AI functions of the iPhone manufacturer that can recognize and process images locally on the device. A now published research paper via FastVLM (Fast Vision Language Models) can be found in the first details.

FastVLM should be characterized above all by a higher speed, it says in the paper. Also in the Machine Learning Research Blog Apple’s reported. The variant FastVLM-0.5B is 85 times faster than LLAVAONEVISION, the 7B variant is 7.9 times as fast as Cambrian-1-8b with comparable accuracy. In addition, the model is very small and can be operated locally on Apple devices, which means that users remained independent of the cloud and the model of high data protection standards is sufficient. It fits well with the previous focus of the Apple Intelligence.

New encoder for high -resolution images

The basis for faster image processing is the new Fastvithd Vision Encoder, which process high -resolution images more efficiently than other models. A previous reduction. The encoder still produces significantly less visual tokens. Less training data was also necessary for the model.

For users, the fast processing means that the model can be created much faster with the model of text descriptions of images and that previous waiting times are eliminated. Possible purposes also lie in the document analysis (OCR), in the area of barrier-free functions and in visual searches in photo libraries.

Examples of use

In three examples, the Apple researchers show what the model can do and how quickly it works. In a test case, the number of fingers is counted, which shows a hand in a video. In another example, a block is quickly flipped and the handwritten notes contained therein are recognized in real time. In the third example, the AI describes an emoji that is shown to her.

Apple has already used image recognition in various areas in the operating system and in apps. This includes the visual intelligence for object recognition or the visual search in the photo app. With the new model, these functions are likely to work faster and better. In addition, further applications are conceivable, such as an additional picture description in the mail app or an assistant in the camera app.

Developer conference in June

Whether the new model will make it in iOS 19 will be shown on June 9th when Apple presents a keynote to the opening of the WWDC IOS 19 developer conference and the other new versions of the operating systems.

Discover more from Apple News

Subscribe to get the latest posts sent to your email.

FastVLM: Apple’s new image-to-text-Ki should be much faster

New encoder for high -resolution images

Examples of use

Developer conference in June

Related

Discover more from Apple News

Leave a Comment Cancel reply