Skip to content

vLLM model host for ADK agents

ADK 智能体的 vLLM 模型托管

Supported in ADKPython v0.1.0

Tools such as vLLM allow you to host models efficiently and serve them as an OpenAI-compatible API endpoint. You can use vLLM models through the LiteLLM library for Python.

vLLM 这样的工具允许您高效地托管模型并将其作为 OpenAI 兼容的 API 端点提供服务。您可以通过 Python 的 LiteLLM 库使用 vLLM 模型。

Setup

设置

  1. Deploy Model: Deploy your chosen model using vLLM (or a similar tool). Note the API base URL (e.g., https://your-vllm-endpoint.run.app/v1).
    • Important for ADK Tools: When deploying, ensure the serving tool supports and enables OpenAI-compatible tool/function calling. For vLLM, this might involve flags like --enable-auto-tool-choice and potentially a specific --tool-call-parser, depending on the model. Refer to the vLLM documentation on Tool Use.
  2. 部署模型: 使用 vLLM (或类似工具) 部署您选择的模型。记下 API 基础 URL (例如,https://your-vllm-endpoint.run.app/v1)。

    • 对 ADK 工具很重要: 部署时,确保服务工具支持并启用 OpenAI 兼容的工具/函数调用。对于 vLLM,这可能涉及标志如 --enable-auto-tool-choice,以及可能需要特定的 --tool-call-parser,具体取决于模型。请参阅 vLLM 关于工具使用的文档。
  3. Authentication: Determine how your endpoint handles authentication (e.g., API key, bearer token).

  4. 身份验证: 确定您的端点如何处理身份验证(例如,API 密钥、Bearer 令牌)。

Integration Example

集成示例

The following example shows how to use a vLLM endpoint with ADK agents.

以下示例展示了如何在 ADK 智能体中使用 vLLM 端点。

import subprocess
from google.adk.agents import LlmAgent
from google.adk.models.lite_llm import LiteLlm

# --- Example Agent using a model hosted on a vLLM endpoint ---

# Endpoint URL provided by your vLLM deployment
api_base_url = "https://your-vllm-endpoint.run.app/v1"

# Model name as recognized by *your* vLLM endpoint configuration
model_name_at_endpoint = "hosted_vllm/google/gemma-3-4b-it" # Example from vllm_test.py

# Authentication (Example: using gcloud identity token for a Cloud Run deployment)
# Adapt this based on your endpoint's security
try:
    gcloud_token = subprocess.check_output(
        ["gcloud", "auth", "print-identity-token", "-q"]
    ).decode().strip()
    auth_headers = {"Authorization": f"Bearer {gcloud_token}"}
except Exception as e:
    print(f"Warning: Could not get gcloud token - {e}. Endpoint might be unsecured or require different auth.")
    auth_headers = None # Or handle error appropriately

agent_vllm = LlmAgent(
    model=LiteLlm(
        model=model_name_at_endpoint,
        api_base=api_base_url,
        # Pass authentication headers if needed
        extra_headers=auth_headers
        # Alternatively, if endpoint uses an API key:
        # api_key="YOUR_ENDPOINT_API_KEY"
    ),
    name="vllm_agent",
    instruction="You are a helpful assistant running on a self-hosted vLLM endpoint.",
    # ... other agent parameters
)