Microsoft computers

9/28/2023

Microsoft being the profit-driven business that it is, that is, no doubt, a plus. The second reason is, multimodal models tend to be more efficient from a computational standpoint - leading to speedups in processing and (presumably) cost reductions on the backend. For example, an AI assistant that understands images, pricing data and purchasing history is likely to offer better-personalized product suggestions than one that only understands pricing data. Why not string several “unimodal” models together to achieve the same end, like a model that understands only images and another that understands exclusively language? A few reasons, the first being that multimodal models in some cases perform better at the same task than their unimodal counterpart thanks to the contextual information from the additional modalities. Naturally, multimodal models - models that, once again, understand multiple modalities, such as language and images or videos and audio - are able to perform tasks in one shot that unimodal models simply cannot (e.g. The AI research community, which includes tech giants like Microsoft, have increasingly coalesced around the idea that multimodal models are the best path forward to more capable AI systems.

“Ask Florence to find a particular frame in a video, and it can do that ask it to tell the difference between a Cosmic Crisp apple and a Honeycrisp apple, and it can do that.” As a result, it’s incredibly versatile,” John Montgomery, CVP of Azure AI, told TechCrunch in an email interview.

“Florence is trained on billions of image-text pairs. The Florence-powered Microsoft Vision Services launches today in preview for existing Azure customers, with capabilities ranging from automatic captioning, background removal and video summarization to image retrieval. Now, as a part of Microsoft’s broader, ongoing effort to commercialize its AI research, Florence is arriving as a part of an update to the Vision APIs in Azure Cognitive Services. Unlike most vision models at the time, Florence was both “unified” and “multimodal,” meaning it could (1) understand language as well as images and (2) handle a range of tasks rather than being limited to specific applications, like generating captions. Two years ago, Microsoft announced Florence, an AI system that it pitched as a “complete rethinking” of modern computer vision models.

0 Comments

Author

Archives

Categories

Microsoft computers

Leave a Reply.