使用 Gemini API 的 Computer Use 工具集¶
The Computer Use Toolset allows an agent to operate a user interface of a computer, such as browsers, to complete tasks. This tool uses a specific Gemini model and Playwright testing tool to control a Chromium browser and can interact with web pages by taking screenshots, clicking, typing, and navigating.
For more information about the computer use model, see Gemini API Computer use or Google Cloud Vertex AI API Computer use.
预览版本
The Computer Use model and tool is a Preview release. For more information, see launch stage descriptions.
设置¶
You must install Playwright and its dependencies, including Chromium, to be able to use the Computer Use Toolset.
推荐: 创建并激活 Python 虚拟环境
Create a Python virtual environment:
Activate the Python virtual environment:
To set up the required software libraries for the Computer Use Toolset:
- Install Python dependencies:
- Install Playwright dependencies, including the Chromium browser:
使用工具¶
Use the Computer Use Toolset by adding it as a tool to your agent. When you
configure the tool, you must provide an implementation of the BaseComputer
class which defines an interface for an agent to use a computer. In the
following example, the PlaywrightComputer class is defined for this purpose.
You can find the code for this implementation in the playwright.py file of the
computer_use
agent sample project.
from google.adk import Agent
from google.adk.models.google_llm import Gemini
from google.adk.tools.computer_use.computer_use_toolset import ComputerUseToolset
from typing_extensions import override
from .playwright import PlaywrightComputer
root_agent = Agent(
model='gemini-2.5-computer-use-preview-10-2025',
name='hello_world_agent',
description=(
'computer use agent that can operate a browser on a computer to finish'
' user tasks'
),
instruction='you are a computer use agent',
tools=[
ComputerUseToolset(computer=PlaywrightComputer(screen_size=(1280, 936)))
],
)
For a complete code example, see the computer_use agent sample project.