vLLM model host for ADK agents¶

ADK 智能体的 vLLM 模型托管¶

Supported in ADKPython v0.1.0

Tools such as vLLM allow you to host models efficiently and serve them as an OpenAI-compatible API endpoint. You can use vLLM models through the LiteLLM library for Python.

像 vLLM 这样的工具允许您高效地托管模型并将其作为 OpenAI 兼容的 API 端点提供服务。您可以通过 Python 的 LiteLLM 库使用 vLLM 模型。

Setup¶

设置¶

Deploy Model: Deploy your chosen model using vLLM (or a similar tool). Note the API base URL (e.g., https://your-vllm-endpoint.run.app/v1).
- Important for ADK Tools: When deploying, ensure the serving tool supports and enables OpenAI-compatible tool/function calling. For vLLM, this might involve flags like --enable-auto-tool-choice and potentially a specific --tool-call-parser, depending on the model. Refer to the vLLM documentation on Tool Use.
部署模型: 使用 vLLM (或类似工具) 部署您选择的模型。记下 API 基础 URL (例如,https://your-vllm-endpoint.run.app/v1)。
- 对 ADK 工具很重要: 部署时,确保服务工具支持并启用 OpenAI 兼容的工具/函数调用。对于 vLLM,这可能涉及标志如 --enable-auto-tool-choice,以及可能需要特定的 --tool-call-parser,具体取决于模型。请参阅 vLLM 关于工具使用的文档。
Authentication: Determine how your endpoint handles authentication (e.g., API key, bearer token).
身份验证: 确定您的端点如何处理身份验证(例如,API 密钥、Bearer 令牌)。

Integration Example¶

集成示例¶

The following example shows how to use a vLLM endpoint with ADK agents.

以下示例展示了如何在 ADK 智能体中使用 vLLM 端点。

import subprocess
from google.adk.agents import LlmAgent
from google.adk.models.lite_llm import LiteLlm

# --- Example Agent using a model hosted on a vLLM endpoint ---

# Endpoint URL provided by your vLLM deployment
api_base_url = "https://your-vllm-endpoint.run.app/v1"

# Model name as recognized by *your* vLLM endpoint configuration
model_name_at_endpoint = "hosted_vllm/google/gemma-3-4b-it" # Example from vllm_test.py

# Authentication (Example: using gcloud identity token for a Cloud Run deployment)
# Adapt this based on your endpoint's security
try:
    gcloud_token = subprocess.check_output(
        ["gcloud", "auth", "print-identity-token", "-q"]
    ).decode().strip()
    auth_headers = {"Authorization": f"Bearer {gcloud_token}"}
except Exception as e:
    print(f"Warning: Could not get gcloud token - {e}. Endpoint might be unsecured or require different auth.")
    auth_headers = None # Or handle error appropriately

agent_vllm = LlmAgent(
    model=LiteLlm(
        model=model_name_at_endpoint,
        api_base=api_base_url,
        # Pass authentication headers if needed
        extra_headers=auth_headers
        # Alternatively, if endpoint uses an API key:
        # api_key="YOUR_ENDPOINT_API_KEY"
    ),
    name="vllm_agent",
    instruction="You are a helpful assistant running on a self-hosted vLLM endpoint.",
    # ... other agent parameters
)