Skip to content

Context caching with Gemini

使用 Gemini 进行上下文缓存

Supported in ADKPython v1.15.0

When working with agents to complete tasks, you may want to reuse extended instructions or large sets of data across multiple agent requests to a generative AI model. Resending this data for each agent request is slow, inefficient, and can be expensive. Using context caching features in generative AI models can significantly speed up responses and lower the number of tokens sent to the model for each request.

在使用智能体完成任务时,您可能希望在多个智能体请求中重复使用扩展指令或大量数据集来请求生成式 AI 模型。为每个智能体请求重新发送此数据很慢、低效,并且可能很昂贵。在生成式 AI 模型中使用上下文缓存功能可以显著加快响应速度,并降低每次请求发送到模型的令牌数量。

The ADK Context Caching feature allows you to cache request data with generative AI models that support it, including Gemini 2.0 and higher models. This document explains how to configure and use this feature.

ADK 上下文缓存功能允许您缓存与支持的生成式 AI 模型(包括 Gemini 2.0 及更高版本模型)的请求数据。本文档说明了如何配置和使用此功能。

Configure context caching

配置上下文缓存

You configure the context caching feature at the ADK App object level, which wraps your agent. Use the ContextCacheConfig class to configure these settings, as shown in the following code sample:

您在 ADK App 对象级别配置上下文缓存功能,该对象包装您的智能体。使用 ContextCacheConfig 类配置这些设置,如以下代码示例所示:

from google.adk import Agent
from google.adk.apps.app import App
from google.adk.agents.context_cache_config import ContextCacheConfig

root_agent = Agent(
  # configure an agent using Gemini 2.0 or higher
  # 使用 Gemini 2.0 或更高版本配置智能体
)

# Create the app with context caching configuration
# 创建带有上下文缓存配置的应用
app = App(
    name='my-caching-agent-app',
    root_agent=root_agent,
    context_cache_config=ContextCacheConfig(
        min_tokens=2048,    # Minimum tokens to trigger caching
        # 触发缓存的最小令牌数
        ttl_seconds=600,    # Store for up to 10 minutes
        # 存储最多 10 分钟
        cache_intervals=5,  # Refresh after 5 uses
        # 使用 5 次后刷新
    ),
)

Configuration settings

配置设置

The ContextCacheConfig class has the following settings that control how caching works for your agent. When you configure these settings, they apply to all agents within your app.

ContextCacheConfig 类具有以下设置,用于控制您的智能体的缓存工作方式。当您配置这些设置时,它们应用于应用程序中的所有智能体。

  • min_tokens (int): The minimum number of tokens required in a request to enable caching. This setting allows you to avoid the overhead of caching for very small requests where the performance benefit would be negligible. Defaults to 0.

    min_tokens (int): 启用缓存所需的最小令牌数。此设置允许您避免对非常小的请求进行缓存的开销,因为性能提升可以忽略不计。默认值为 0

  • ttl_seconds (int): The time-to-live (TTL) for the cache in seconds. This setting determines how long the cached content is stored before it is refreshed. Defaults to 1800 (30 minutes).

    ttl_seconds (int): 缓存的生存时间(TTL),以秒为单位。此设置确定在刷新之前缓存内容存储多长时间。默认值为 1800(30 分钟)。

  • cache_intervals (int): The maximum number of times the same cached content can be used before it expires. This setting allows you to control how frequently the cache is updated, even if the TTL has not expired. Defaults to 10.

    cache_intervals (int): 同一缓存内容在过期前可以使用的最大次数。此设置允许您控制缓存更新的频率,即使 TTL 尚未过期。默认值为 10

Next steps

下一步

For a full implementation of how to use and test the context caching feature, see the following sample:

有关如何使用和测试上下文缓存功能的完整实现,请参阅以下示例:

  • cache_analysis: A code sample that demonstrates how to analyze the performance of context caching.

    一个演示如何分析上下文缓存性能的代码示例。

If your use case requires that you provide instructions that are used throughout a session, consider using the static_instruction parameter for an agent, which allows you to amend the system instructions for a generative model. For more details, see this sample code:

如果您的用例要求提供在整个会话中使用的指令,请考虑为智能体使用 static_instruction 参数,该参数允许您修改生成式模型的系统指令。有关更多详细信息,请参阅以下示例代码:

  • static_instruction: An implementation of a digital pet agent using static instructions.

    使用静态指令的数字宠物智能体的实现。