User Simulation¶

用户模拟¶

Supported in ADKPython v1.18.0

When evaluating conversational agents, it is not always practical to use a fixed set of user prompts, as the conversation can proceed in unexpected ways. For example, if an agent needs the user to supply two values to perform a task, it may ask for those values one at a time or both at once. To resolve this issue, ADK can dynamically generate user prompts using a generative AI model.

在评估对话智能体时,并不总是实用使用固定的用户提示词集,因为对话可能会以意想不到的方式进行。例如,如果智能体需要用户提供两个值来执行任务,它可能会一次询问一个值或同时询问两个值。为了解决这个问题,ADK 可以使用生成式 AI 模型动态生成用户提示词。

To use this feature, you must specify a ConversationScenario which dictates the user's goals in their conversation with the agent. A sample conversation scenario for the hello_world agent is shown below:

要使用此功能,您必须指定 ConversationScenario,该方案规定了用户在与智能体对话中的目标。hello_world 智能体的示例对话方案如下所示:

{
  "starting_prompt": "What can you do for me?",
  "conversation_plan": "Ask the agent to roll a 20-sided die. After you get the result, ask the agent to check if it is prime."
}

The starting_prompt in a conversation scenario specifies a fixed initial prompt that the user should use to start the conversation with the agent. Specifying such fixed prompts for subsequent interactions with the agent is not practical as the agent may respond in different ways. Instead, the conversation_plan provides a guideline for how the rest of the conversation with the agent should proceed. An LLM uses this conversation plan, along with the conversation history, to dynamically generate user prompts until it judges that the conversation is complete.

对话方案中的 starting_prompt 指定了用户应该用来开始与智能体对话的固定初始提示词。为与智能体的后续交互指定此类固定提示词并不实用,因为智能体可能会以不同的方式响应。相反,conversation_plan 提供了与智能体对话的其余部分应该如何进行的指导。LLM 使用此对话方案以及对话历史记录来动态生成用户提示词,直到它判断对话已完成。

Try it in Colab

!!!提示"在 Colab 中试用"

Test this entire workflow yourself in an interactive notebook on
[Simulating User Conversations to Dynamically Evaluate ADK Agents](https://github.com/google/adk-samples/blob/main/python/notebooks/evaluation/user_simulation_in_adk_evals.ipynb).
You'll define a conversation scenario, run a "dry run" to check the
dialogue, and then perform a full evaluation to score the agent's responses.
您可以在 [模拟用户对话以动态评估 ADK 智能体](https://github.com/google/adk-samples/blob/main/python/notebooks/evaluation/user_simulation_in_adk_evals.ipynb)的交互式笔记本中亲自测试整个工作流程。您将定义一个对话方案,运行"试运行"以检查对话,然后执行完整评估以对智能体的响应进行评分。

Example: Evaluating the `hello_world` agent with conversation scenarios¶

示例:使用对话方案评估 `hello_world` 智能体¶

To add evaluation cases containing conversation scenarios to a new or existing EvalSet, you need to first create a list of conversation scenarios to test the agent in.

要将包含对话方案的评估用例添加到新的或现有的 EvalSet,您需要首先创建一个对话方案列表来测试智能体。

Try saving the following to contributing/samples/hello_world/conversation_scenarios.json:

尝试将以下内容保存到 contributing/samples/hello_world/conversation_scenarios.json:

{
  "scenarios": [
    {
      "starting_prompt": "What can you do for me?",
      "conversation_plan": "Ask the agent to roll a 20-sided die. After you get the result, ask the agent to check if it is prime."
    },
    {
      "starting_prompt": "Hi, I'm running a tabletop RPG in which prime numbers are bad!",
      "conversation_plan": "Say that you don't care about the value; you just want the agent to tell you if a roll is good or bad. Once the agent agrees, ask it to roll a 6-sided die. Finally, ask the agent to do the same with 2 20-sided dice."
    }
  ]
}

You will also need a session input file containing information used during evaluation. Try saving the following to contributing/samples/hello_world/session_input.json:

您还需要一个会话输入文件,其中包含评估期间使用的信息。尝试将以下内容保存到 contributing/samples/hello_world/session_input.json:

{
  "app_name": "hello_world",
  "user_id": "user"
}

Then, you can add conversation scenarios to an EvalSet:

然后,您可以将对话方案添加到 EvalSet:

# (optional) create a new EvalSet
# (可选)创建一个新的 EvalSet
adk eval_set create \
  contributing/samples/hello_world \
  eval_set_with_scenarios

# add conversation scenarios to EvalSet as new eval cases
# 将对话方案作为新的评估用例添加到 EvalSet
adk eval_set add_eval_case \
  contributing/samples/hello_world \
  eval_set_with_scenarios \
  --scenarios_file contributing/samples/hello_world/conversation_scenarios.json \
  --session_input_file contributing/samples/hello_world/session_input.json

By default, ADK runs evaluations with metrics that require the agent's expected response to be specified. Since that is not the case for a dynamic conversation scenario, we will use an EvalConfig with some alternate supported metrics.

默认情况下,ADK 使用需要指定智能体预期响应的指标来运行评估。由于对于动态对话方案并非如此,我们将使用 EvalConfig以及一些替代支持的指标。

Try saving the following to contributing/samples/hello_world/eval_config.json:

尝试将以下内容保存到 contributing/samples/hello_world/eval_config.json:

{
  "criteria": {
    "hallucinations_v1": {
      "threshold": 0.5,
      "evaluate_intermediate_nl_responses": true
    },
    "safety_v1": {
      "threshold": 0.8
    }
  }
}

Finally, you can use the adk eval command to run the evaluation:

最后,您可以使用 adk eval 命令来运行评估:

adk eval \
    contributing/samples/hello_world \
    --config_file_path contributing/samples/hello_world/eval_config.json \
    eval_set_with_scenarios \
    --print_detailed_results

User simulator configuration¶

用户模拟器配置¶

You can override the default user simulator configuration to change the model, internal model behavior, and maximum number of user-agent interactions. The below EvalConfig shows the default user simulator configuration:

您可以覆盖默认用户模拟器配置以更改模型、内部模型行为和最大用户-智能体交互次数。以下 EvalConfig 显示了默认用户模拟器配置:

{
  "criteria": {
    # same as before
    # 与之前相同
  },
  "user_simulator_config": {
    "model": "gemini-2.5-flash",
    "model_configuration": {
      "thinking_config": {
        "include_thoughts": true,
        "thinking_budget": 10240
      }
    },
    "max_allowed_invocations": 20
  }
}

model: The model backing the user simulator.
model: 支持用户模拟器的模型。
model_configuration: A GenerateContentConfig which controls the model behavior.
model_configuration: 控制模型行为的 GenerateContentConfig。
max_allowed_invocations: The maximum user-agent interactions allowed before the conversation is forcefully terminated. This should be set to be greater than the longest reasonable user-agent interaction in your EvalSet.
max_allowed_invocations: 在对话被强制终止之前允许的最大用户-智能体交互次数。这应该设置为大于您的 EvalSet 中最长的合理用户-智能体交互。