AI systems able to interpret information in images via a phone camera, in videos and sounds and in spoken language have been shown off by Google.
In one demo a prototype AI powered assistant running on a phone was able to answer the age-old question “where did I put my glasses”.
It comes a day after rival OpenAI’s launch of its latest AI system, GPT-4o, which included an eye-catching presentation in which it read human expressions via a phone camera, and chatted – and flirted – fluently.
Google appears keen to stress its tools are as capable of this kind of so-called “multimodal” understanding as its rival.
As a sign of this “anything you can do I can do better” style competition, Google had teased the capability of its systems running on a phone just ahead of OpenAI’s announcement.
The firm showcased multimodal features in Gemini Nano, an AI assistant that runs “on device” on its Pixel phone, and in the Gemini App.
It also demonstrated a prototype scam alert feature it was testing for Gemini Nano that could listen to a phone call and warn that it was a scam, without any information about the call leaving the phone.
The new AI powered demos were revealed at Google I/O, the firm’s annual presentation for software developers.
A quick AI-powered transcription of proceedings, by BBC News, suggested that the word “multimodal” came up at least 22 times.
– BBC