使用 Gemini API 的 Computer Use 工具集¶

ADK 中支持Python v1.17.0预览

The Computer Use Toolset allows an agent to operate a user interface of a computer, such as browsers, to complete tasks. This tool uses a specific Gemini model and Playwright testing tool to control a Chromium browser and can interact with web pages by taking screenshots, clicking, typing, and navigating.

For more information about the computer use model, see Gemini API Computer use or Google Cloud Vertex AI API Computer use.

预览版本

The Computer Use model and tool is a Preview release. For more information, see launch stage descriptions.

设置¶

You must install Playwright and its dependencies, including Chromium, to be able to use the Computer Use Toolset.

推荐: 创建并激活 Python 虚拟环境

Create a Python virtual environment:

python -m venv .venv

Activate the Python virtual environment:

Windows CMDWindows PowershellMacOS / Linux

.venv\Scripts\activate.bat

.venv\Scripts\Activate.ps1

source .venv/bin/activate

To set up the required software libraries for the Computer Use Toolset:

Install Python dependencies:

pip install termcolor==3.1.0
pip install playwright==1.52.0
pip install browserbase==1.3.0
pip install rich

Install Playwright dependencies, including the Chromium browser:
```
playwright install-deps chromium
playwright install chromium
```

使用工具¶

Use the Computer Use Toolset by adding it as a tool to your agent. When you configure the tool, you must provide an implementation of the BaseComputer class which defines an interface for an agent to use a computer. In the following example, the PlaywrightComputer class is defined for this purpose. You can find the code for this implementation in the playwright.py file of the computer_use agent sample project.

from google.adk import Agent
from google.adk.models.google_llm import Gemini
from google.adk.tools.computer_use.computer_use_toolset import ComputerUseToolset
from typing_extensions import override

from .playwright import PlaywrightComputer

root_agent = Agent(
    model='gemini-2.5-computer-use-preview-10-2025',
    name='hello_world_agent',
    description=(
        'computer use agent that can operate a browser on a computer to finish'
        ' user tasks'
    ),
    instruction='you are a computer use agent',
    tools=[
        ComputerUseToolset(computer=PlaywrightComputer(screen_size=(1280, 936)))
    ],
)

For a complete code example, see the computer_use agent sample project.