Google Revises Gemini Usage Limits Following User Feedback
In May 2026, during the I/O conference, Google introduced a new compute-used model for its Gemini app, replacing the previous daily prompt limits. This model considers factors such as prompt complexity, utilized tools, and chat length, with usage limits refreshing every five hours until the weekly cap is reached. The intention was to allocate resources more effectively, recognizing that a simple text prompt consumes significantly less compute power than complex video or coding tasks.
However, this shift led to user dissatisfaction, as many found themselves reaching their usage limits more quickly than anticipated. In response, Google has implemented several adjustments to address these concerns:
1. Quota Caps on Complex Prompts: For users of Gemini 3.1 Pro, Google has introduced a cap on the amount of quota a single prompt can consume. This change aims to prevent complex prompts, especially those involving large files, from rapidly depleting a user’s available quota.
2. Error Handling: Google clarified that unsuccessful requests will not count against a user’s quota. Only successful completions will be deducted, ensuring that system errors do not unfairly impact usage limits.
3. Enhanced Usage Transparency: Recognizing that tasks like Deep Research require more compute resources, Google plans to provide users with detailed usage breakdowns and notifications. This initiative is designed to help users better understand and manage their compute consumption.
4. Free Access to 3.1 Flash-Lite Prompts: To alleviate concerns about quota consumption, prompts using the 3.1 Flash-Lite model are now free and do not count against a user’s quota.
5. Model Selection Persistence: Once a user selects a specific model, the Gemini app will remember this choice across all future sessions. The selection will only change if the user manually adjusts it or if an automatic fallback occurs due to reaching a usage cap.
6. Bug Fixes and Increased Omni Generations: Google addressed a bug that caused a small number of Omni video generations to disproportionately drain user quotas. Additionally, AI Ultra users now have double the number of Omni generations available.
These adjustments reflect Google’s commitment to responding to user feedback and enhancing the Gemini app’s user experience. By refining usage limits and providing clearer insights into compute consumption, Google aims to offer a more balanced and transparent AI service.