Google’s Gemini Era: Watch out AI rivals!

Spread the love

Here we are now with the updates from Google IO 2024.

Last year Google introduced Gemini, their response to OpenAI. Two months later they released the Gemini 1.5 Pro model, which was initially available to limited group of developers and enterprise customers who could test it with a context window of up to 1 million tokens. Now they have rolled out to the users in the US with plans for further expansion to other countries. This Search Generative Experience (SGE), previously available on Labs, will be now integrated on the top of your search results.

Google's Photo of Organization — *^{Photo by Pawel Czerwinski on Unsplash}*

Additional capabilities are also in the pipeline, with most of them currently in the testing phase at the Labs.

With Google’s “Ask Photos”, you can use AI to fetch your license plate number directly from your pictures and intelligently recognize your car and your license plate. Ah, I can see your jaw dropping. But wait, there’s more. Want to know when your child learnt to ride a cycle? Well, it will do it for you, going through your photos and voila!

Google uses the term “AI agents”. Sundar Pichai, Google CEO, describes it as an intelligence that has reasoning, planning and memory, capabilities to plan multiple steps ahead, that works across software and systems under your control and for you. It could do more wonders like returning your purchased product, scanning your invoice, filling out the form and scheduling a pickup. Are you planning for a Paris trip? This multi-step planning model will create an itinerary for you.

We often miss important mails and meetings. Well, you can tell AI to catch you up on emails, go through the recordings from the Google meet and highlight the key notes, and finally draft replies. Good news for your kids too – they have Notebook LM to assist your kid with education, providing real time examples just like a teacher.

That’s not all, the 1 million token limit would be increased to 2 million, so developers and creators let’s celebrate!

Even though everything so far is impressive, the standout projects for me are Project Astra and Project Veo.

Project Astra uses AI to continuously encode video frames, combining videos and speech in timeline of events and caching, resulting in impressive outcomes. Just show it a video and it will tell you what’s where. Did you forget your glasses or do you need some help with the design interview questions, and maybe solve some riddle challenges, well, my friend, AI would do all that for you.

Project Veo is the incredible experience of creating videos or footages from the texts that you describe. You could get a cinematic view or a timelapse video created by AI, just say it.

The “AI” word was mentioned exactly 120 times. It incremented by one after Sundar Pichai, Google CEO, mentioned this. For Google this is Gemini Era. OpenAI may not agree and it is 4o for them. What about Apple? Well, lets wait till June 10th for Apple’s Worldwide Developers Conference (WWDC). But hey! Will they be using the model powered by OpenAI or could it be Gemini? Suspense is killing me.