The launch of ChatGPT in November 2022 shook Google to its foundations. The popular chatbot posed such a threat to the company’s business that it had to declare a code red and began investing in catching up on the generative AI bandwagon.
This effort has not only resulted in the release of Google Bard but also Google Gemini.
Gemini was launched on Wednesday, December 6, 2023, and as the months go by, we will follow their journey — there’s a serious chance that Google can seize the popular AI crown from ChatGPT.
What is Google Gemini?
Gemini is a set of large language models (LLMs) that leverage training techniques taken from AlphaGo, including reinforcement learning and tree search, which has the potential to unseat ChatGPT as the most dominant generative AI solution on the planet.
It comes months after Google combined its Brain and DeepMind AI labs to create a new research team called Google DeepMind, and after the launch of Bard and its next-generation PaLM 2 LLM.
With researchers anticipating that the generative AI market estimated will be worth $1.3 trillion by 2032, it is clear that Google is going all-in on investing in the space to maintain its position as a leader in AI development.
READ MORE: Google Gemini is LIVE! Find Out More
Everything We Know So Far About Gemini
Back in May, Sundar Pichai, CEO of Google and Alphabet, released a blog post with a high-level look at the LLM, explaining:
“Gemini was created from the ground up to be multimodal, highly efficient at tool and API integrations and built to enable future innovations, like memory and planning.”
Pichai also noted that “While still early, we’re already seeing impressive multimodal capabilities not seen in prior models.
“Once fine-tuned and rigorously tested for safety, Gemini will be available at various sizes and capabilities, just like PaLM 2.”
Google DeepMind CEO Demis Hassabis’s interview with Wired noted that Gemini will be “combining some of the strengths of AlphaGo type systems with the amazing language capabilities of the large models.”
Will Gemini Take the Crown from ChatGPT?
One of the biggest conversations around the release of Gemini is whether the language model has what it takes to unseat ChatGPT, which this year reached over 100 million monthly active users.
Initially, Google was using Gemini’s ability to generate text and images to differentiate it from GPT4, but on the 25th September 2023, OpenAI announced that users would be able enter voice and image queries into ChatGPT.
Now, that OpenAI is experimenting with a multimodal model approach and has connected ChatGPT to the Internet, perhaps the most threatening differentiator between the two is Google’s vast array of proprietary training data. Google Gemini can process data taken across services, including Google Search, YouTube, Google Books, and Google Scholar.
The use of this proprietary data in training the Gemini models could result in a distinct edge in the sophistication of the insights and inferences that it can take from a data set. This is particularly true if early reports that Gemini is trained on twice as many tokens as GPT4 are correct.
In addition, the partnership between the Google DeepMind and Brain teams this year can’t be underestimated, as it puts OpenAI head-to-head with a team of world-class AI researchers, including Google co-founder Sergey Brin and DeepMind senior AI scientist and machine learning expert Paul Barham.
This is an experienced team that has a deep understanding of how to apply techniques like reinforcement learning and tree search to create AI programs that can gather feedback and improve their problem-solving over time, which the DeepMind team used to teach AlphaGo to defeat a Go world champion 2016.
The AI Arms Race
Gemini’s combination of multi-modal abilities, use of reinforcement learning, text and image generation capabilities, and Google’s proprietary data are all the ingredients that Gemini needs to outperform GPT-4.
The training data is the key differentiator, after all, the organization that wins the LLMs arms race will largely be decided based on who trains their models on the largest, richest data set.
That being said, with OpenAI reportedly working on a new next-generation multimodal LLM called Gobi, we can’t write off the generative AI giant just yet. The question now is, who executes multimodal AI better?