Google Fixes Issues with Gemini’s People-Generating Feature
- 2024-08-28 (3 weeks ago) |
- Syed Shahzaib |
- Tech News
Back in February, Google temporarily halted its AI-powered chatbot, Gemini, from generating images of people. The decision came after users raised concerns about historical inaccuracies. For example, when asked to depict “a Roman legion,” Gemini presented a diverse group of soldiers that didn’t fit the era. Similarly, “Zulu warriors” were inaccurately portrayed with stereotypical features.
Google CEO Sundar Pichai issued an apology, and Demis Hassabis, co-founder of Google’s AI research division DeepMind, promised a swift fix within a few weeks. However, the actual resolution took significantly longer, despite some employees working up to 120 hours a week to address the issue. Now, Google is finally ready to reintroduce the feature—but with some limitations.
Who Gets Access?
For now, only users subscribed to Google’s paid Gemini plans—Gemini Advanced, Business, or Enterprise—will have early access to the people-generating feature. This limited rollout is part of an English-language-only test, and Google has not yet announced when or if the feature will be available to users on the free tier or in other languages.
According to a Google spokesperson, “Gemini Advanced gives our users priority access to our latest features. This helps us gather valuable feedback while delivering a highly-anticipated feature first to our premium subscribers.”
What’s Been Fixed?
So, what changes did Google make? The latest image-generating model, Imagen 3, now powers Gemini. According to Google, Imagen 3 has been enhanced to produce more “fair” images. The model was trained on AI-generated captions designed to increase the variety and diversity of concepts associated with the images in its training data. Additionally, the training data was filtered for safety and reviewed with fairness in mind.
While Google didn’t provide specific details about Imagen 3’s training data, they emphasized that the model was trained on a large dataset of images, text, and annotations. Google also conducted extensive internal and external testing to minimize the potential for undesirable results before reactivating the people-generating feature.
Imagen 3 and Gems
In some good news, all Gemini users will receive the upgraded Imagen 3 model within the week—except for the people-generating feature, which remains exclusive to premium subscribers. Google claims that Imagen 3 is better at understanding text prompts, more creative, and produces fewer errors compared to its predecessor, Imagen 2.
To address concerns about deepfakes, Imagen 3 will incorporate SynthID, a technology developed by DeepMind that applies invisible, cryptographic watermarks to AI-generated media. This ensures that the content is traceable and authentic.
Additionally, Google is introducing a new feature called "Gems" for users on the Advanced, Business, and Enterprise plans. Similar to OpenAI’s GPTs, Gems are custom-tailored versions of Gemini that act as experts in specific topics, like vegetarian cooking or project management.
According to Google, “With Gems, you can create a team of experts to help you with challenging projects, brainstorm ideas, or even draft the perfect social media caption. Your Gem can also remember detailed instructions, saving you time on repetitive tasks.”
Gems are available on both desktop and mobile in 150 countries and in most languages, although they are not yet supported in Gemini Live. While Google currently has no plans to allow users to publish or share Gems, the company is focused on learning how people use them for creativity and productivity.
Final Thoughts
Google's fix for Gemini’s people-generating feature has been a long time coming, but the improvements in fairness and accuracy are welcome. While access remains limited to premium users for now, the broader rollout of Imagen 3 should benefit all Gemini users, offering more creative and accurate AI-generated images. The introduction of Gems adds another layer of customization, making Gemini a more versatile tool for a wide range of tasks.