Google Unveils Gemini 2.5 Computer Use Model with Enhanced Web and Android Performance

Google has introduced the Gemini 2.5 Computer Use model, now available for developer preview. This specialized AI model is designed to interact seamlessly with graphical user interfaces (GUIs), particularly within web browsers and websites. It forms the backbone of Project Mariner and the agentic features in AI Mode, enabling more intuitive and efficient user interactions.

Operational Workflow:

The Gemini 2.5 Computer Use model operates through a cyclical process to accomplish tasks:

1. Initiating a Request: The system receives inputs comprising the user’s request, a screenshot of the current environment, and a history of recent actions.

2. Analyzing Inputs: The model evaluates these inputs and generates a response, typically in the form of a function call representing a specific UI action, such as clicking or typing.

3. Executing Actions: Client-side code executes the action as specified by the model’s response.

4. Updating the Environment: After executing the action, a new screenshot of the GUI and the current URL are sent back to the model, restarting the loop until the task is complete.

Supported UI Actions:

The model is capable of performing a variety of UI actions, including:

– Navigating backward and forward

– Conducting web searches

– Accessing specific URLs

– Hovering the cursor

– Executing keyboard shortcuts

– Scrolling through content

– Dragging and dropping elements

Demonstrative Examples:

Google has provided examples to illustrate the model’s capabilities:

1. Pet Care Data Management: The model retrieves details of pets residing in California from a specified URL and adds them as guests in a spa CRM system. It then schedules follow-up appointments with a specialist, aligning the visit with the requested treatment.

2. Task Organization for an Art Club: The model organizes a chaotic brainstorming board by categorizing tasks into predefined sections, ensuring each note is appropriately placed.

Performance Optimization:

While the Gemini 2.5 Computer Use model is primarily optimized for web browsers, it also demonstrates strong potential in mobile UI control tasks, as evidenced by Google’s AndroidWorld benchmark. However, it is not yet optimized for desktop operating system-level control.

Benchmark Comparisons:

In performance evaluations, the Gemini 2.5 Computer Use model has shown superior results in web and mobile control benchmarks when compared to competitors like Claude and OpenAI’s offerings. It leads in browser control quality with the lowest latency, highlighting its efficiency and responsiveness.

Foundation and Applications:

Built upon the visual understanding and reasoning capabilities of Gemini 2.5 Pro, this model powers Project Mariner and AI Mode’s agentic features. Internally, Google has utilized it for UI testing to accelerate software development processes. Additionally, an early access program is available for third-party developers aiming to build assistants and workflow automation tools.

Availability:

The Gemini 2.5 Computer Use model is now in public preview and can be accessed through the Gemini API in Google AI Studio and Vertex AI.

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Related Posts

Google Unveils Veo 3.1 and Enhances Flow with Advanced Controls and Tools

T-Mobile’s Strategic Shift: Phasing Out LTE to Embrace a 5G Future

MediaTek Unveils Kompanio Ultra: A Leap Forward for Chromebook Performance and AI Integration