Microsoft’s Azure AI team has introduced Florence-2, a groundbreaking vision foundation model that can handle a wide range of vision and vision-language tasks using a unified, prompt-based representation. Available under a permissive MIT license, Florence-2 comes in two sizes, 232M and 771M parameters, and excels at tasks such as captioning, object detection, visual grounding, and segmentation, performing on par or better than many large vision models. This innovative model has the potential to revolutionize the way enterprises approach vision applications, providing a single, unified approach that can save investments on separate task-specific vision models.

What sets Florence-2 apart is its ability to understand spatial data across different scales, from broad image-level concepts to fine-grained pixel details, as well as semantic details such as high-level captions to detailed descriptions. Microsoft’s approach involved generating a comprehensive visual dataset called FLD-5B, which includes 5.4 billion annotations for 126 million images, covering details from high-level descriptions to specific regions and objects. Florence-2 uses a sequence-to-sequence architecture, integrating an image encoder and a multi-modality encoder-decoder, enabling the model to handle various vision tasks without requiring task-specific architectural modifications.

The model’s performance is impressive, outperforming larger models in various tasks, including object detection, captioning, visual grounding, and visual question answering. Its compact size and versatility make it an attractive option for developers, who can now offload the need for separate vision models for different tasks, reducing compute costs and development time.

Source.

TOP STORIES

Unauthorized Users Breach Anthropic's Mythos Cybersecurity Tool
Unauthorized users have gained access to Anthropic’s Mythos, raising security concerns …
Clarifai Deletes 3 Million Photos Amid FTC Investigation Over Data Use
Clarifai has deleted millions of photos from OkCupid amid an FTC investigation into data misuse …
Nvidia's AI Revolution - The Vera Rubin Platform and Future Demand
Nvidia’s Vera Rubin platform is set to revolutionize AI inference with unmatched performance …
Tim Cook's Departure - A Strategic Shift in Apple's AI Landscape
Apple’s leadership transition highlights a strategic focus on silicon for AI innovation …
Tim Cook's Departure Marks a New Era for Apple's AI Strategy
Apple’s leadership changes signal a strategic shift towards AI and silicon innovation …
New Tennessee Law on AI and Mental Health - A Step Forward or Backward?
Tennessee’s new law restricts AI claims in mental health but may create loopholes …

latest stories