fbpx
News

Apple research paper reveals AI that understands visual elements

We'll likely learn more about Apple's AI developments at WWDC on June 10th

Researchers at Apple have reportedly developed a new AI system called ReALM (Reference Resolution As Language Modeling), that can read and understand visual elements, essentially being able to decipher on-screen prompts.

The research paper suggests that the new model reconstructs the screen using “parsed on-screen entities” and their locations in a textual layout. This essentially captures the visual layout of the on-screen page, and according to the researchers, when a model is specifically fine-tuned for this approach, it could outperform even GPT-4, and lead to more natural and intuitive interactions.

“Being able to understand context, including references, is essential for a conversational assistant,” reads the research paper. “Enabling the user to issue queries about what they see on their screen is a crucial step in ensuring a true hands-free experience in voice assistants.” The development could one day make its way to Siri, helping it become more conversational and “true hands-free.”

While it is unlikely that we’ll hear more about ReALM this year, we should be learning more about AI-related developments, including features coming to Siri at WWDC 2024 on June 10th.

Read more about ReALM here.

Image credit: Shutterstock

Source: Apple Via: VentureBeat

MobileSyrup may earn a commission from purchases made via our links, which helps fund the journalism we provide free on our website. These links do not influence our editorial content. Support us here.

Related Articles

Comments