Google’s Gemini unveiling this week included several impressive-looking demo videos. However, the company later admitted that one of the more impressive videos was staged.
The news isn’t totally surprising — I even wrote in my coverage of Gemini that the video was “clearly polished and dramatized,” and the demo video declares that it was edited to speed up the outputs. But the staging goes further than that; a Google spokesperson admitted to Bloomberg that the video was made “using still image frames from the footage, and prompting via text.”
In the demo, which is embedded below, a person appears to interact with Gemini in real-time, showing objects, drawings and more that Gemini then reacts to. But the smooth back-and-forth conversation implied by the video isn’t actually how interacting with Gemini works.
Google DeepMind’s VP of research and lead of deep learning, Oriol Vinyals, shared a breakdown of how the demo was made, and it’s a far cry from what the video implies is happening. You can check out the full blog post breakdown here.
Really happy to see the interest around our “Hands-on with Gemini” video. In our developer blog yesterday, we broke down how Gemini was used to create it. https://t.co/50gjMkaVc0
We gave Gemini sequences of different modalities — image and text in this case — and had it respond… pic.twitter.com/Beba5M5dHP
— Oriol Vinyals (@OriolVinyalsML) December 7, 2023
This should serve as a reminder that, like so many things in the tech world, AI tools are as much about hype and marketing as actual capability (and sometimes, it’s more about hype than anything else).
Header image credit: Google