Google and OpenAI announced new Artificial Intelligence assistants: tools that can talk to you in real time and catch up when you interrupt them, analyze your environment through live video and translate conversations in real time.
OpenAI was the first to “attack”, on May 13th, when it debuted its new flagship model, the GPT-4o. The live demo showed the assistant reading bedtime stories and solving math problems, all with a voice that sounded eerily like Joaquin Phoenix’s girlfriend in the movie Her (a feature that didn’t go unnoticed by executive director Sam Altman).
The next day, it was Google’s turn, which announced its own new tools, including a conversational assistant called Gemini Live. It can do many of the same things as OpenAI. The company also revealed that it is building a kind of “do-it-all” AI agent, which is currently in development, but will only be launched later this year.
Soon, you’ll be able to explore for yourself and assess whether these tools will be as useful in your daily life as their creators hope, or whether they’re more like a sci-fi party trick that ends up losing its charm. See below what you need to know about accessing these new tools, what might be useful and how much they will cost.
GPT-4o from OpenAI
What it’s capable of: The new model can talk to you in real time, with a response delay of around 320 milliseconds, which, according to OpenAI, is on the same level as natural human conversations. You can ask the tool to interpret anything, just by pointing your smartphone’s camera, and from there, provide assistance with tasks such as coding or translating text. With it, it is also possible to summarize information and generate images, fonts and 3D renderings.
How to get access to it: OpenAI says it will begin implementing GPT-40’s text and vision features in the web interface as well as the GPT app, but has not yet set a date. The company says it will add voice functions in the coming weeks, although it has not yet set an exact date for this. Developers can access text and vision features in the API now, but voice mode will initially only be released to a “small group” of developers.
How much it costs: GPT-40 will be free to use, but OpenAI will set limits on how you can use the model before you need to upgrade to a paid plan. Those who sign up for one of OpenAI’s paid plans, which start at $20 per month, will get five times more power on GPT-40.
Google’s Gemini Live
What is Gemini Live? This is the Google product that most closely resembles GPT-40. It’s a version of the company’s AI model that you can talk to in real time. Google says it will also be possible to use the tool to communicate via live video “later this year”. The company promises it will be a useful conversational assistant for tasks like preparing for a job interview or rehearsing a speech.
How to access it: Gemini Live will be launched, according to the company, in the coming months, through Google’s premium AI plan, Gemini Advanced.
How much it costs: Gemini Advanced offers a two-month free trial, after which it costs $20 per month.
But wait, what is Project Astra? Astra is a project to create a do-it-all AI agent, which was demonstrated at Google’s I/O conference but won’t launch until later this year.
What is better?
It’s hard to say without having the full versions of these models in hand. Google showcased Project Astra in a polished video, while OpenAI chose to debut GPT-40 with a seemingly more authentic live demo. But in both cases, the models were asked to do things that the designers had probably already practiced. The real test will be when they are introduced to millions of users, with unique demands.
That said, if you compare the videos published by OpenAI with those from Google, the two leading tools are very similar, at least when it comes to their ease of use. Overall, the GPT-40 seems to be a little further ahead in audio, with realistic voices, conversation flow, and even singing, while the Project Astra features more advanced visual features, like the ability to remember where you left your glasses. OpenAI’s decision to implement new features faster could mean that its product will be used more initially than Google’s, which will only be fully available later this year. It’s too early to say which model generates false information less frequently or creates more useful responses.
Are they safe?
Both OpenAI and Google say their models have been well-tested: OpenAI says GPT-40 has been evaluated by more than 70 experts in areas like disinformation and social psychology, and Google says Gemini “has the reviews of most comprehensive safety measures of any Google AI model to date, including against bias and toxicity.”
But these companies are building a future where AI models search, examine, and evaluate the world’s information so we can provide concise answers to our questions. Even more than with the simplest chatbots, it is advisable to remain skeptical about what they say to us.
( fonte: James O’Donnell/ MIT Technology Review )