Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

调用 LLM

Agent 改变了你与 LLM 协作的方式。在无状态的 Worker 中,每次请求都从零开始——你重建上下文、调用模型、返回响应,然后忘掉一切。Agent 在调用之间保留状态,通过 WebSocket 与客户端保持连接,并能在没有用户在场的情况下按自己的节奏调用模型。

本文介绍当 LLM 调用发生在有状态 Agent 内部时新增的可能性。关于 provider 配置和代码示例,请参考使用 AI 模型

状态即上下文

每个 Agent 都有内置的 SQL 数据库和 key-value 状态。Agent 不再让客户端在每次请求时传完整的对话历史,而是自己存下来,基于自己的存储构建提示词。

JavaScript


import { Agent } from "agents";


export class ResearchAgent extends Agent {

  async buildPrompt(userMessage) {

    const history = this.sql`

      SELECT role, content FROM messages

      ORDER BY timestamp DESC LIMIT 50`;


    const preferences = this.sql`

      SELECT key, value FROM user_preferences`;


    return [

      { role: "system", content: this.systemPrompt(preferences) },

      ...history.reverse(),

      { role: "user", content: userMessage },

    ];

  }

}


TypeScript


import { Agent } from "agents";


export class ResearchAgent extends Agent<Env> {

  async buildPrompt(userMessage: string) {

    const history = this.sql<{ role: string; content: string }>`

      SELECT role, content FROM messages

      ORDER BY timestamp DESC LIMIT 50`;


    const preferences = this.sql<{ key: string; value: string }>`

      SELECT key, value FROM user_preferences`;


    return [

      { role: "system", content: this.systemPrompt(preferences) },

      ...history.reverse(),

      { role: "user", content: userMessage },

    ];

  }

}


这意味着客户端不需要在每条消息时都发送完整的对话。Agent 拥有历史记录,可以裁剪它,可以用检索到的文档丰富它,或者把更老的轮次摘要后再发给模型。

经受断连考验

像 DeepSeek R1 或 GLM-4 这样的推理模型,响应时间可能从 30 秒到几分钟不等。在无状态的请求-响应架构里,客户端必须在整个时间内保持连接。一旦连接断开,响应就会丢失。

Agent 在客户端断开后仍会继续运行。响应到达时,Agent 可以把它持久化到状态,等客户端重连时再交付——哪怕过去了几小时甚至几天。

JavaScript


import { Agent } from "agents";

import { streamText } from "ai";

import { createWorkersAI } from "workers-ai-provider";


export class MyAgent extends Agent {

  async onMessage(connection, message) {

    const { prompt } = JSON.parse(message);

    const workersai = createWorkersAI({ binding: this.env.AI });


    const result = streamText({

      model: workersai("@cf/zai-org/glm-4.7-flash"),

      prompt,

    });


    for await (const chunk of result.textStream) {

      connection.send(JSON.stringify({ type: "chunk", content: chunk }));

    }


    this.sql`INSERT INTO responses (prompt, response, timestamp)

      VALUES (${prompt}, ${await result.text}, ${Date.now()})`;

  }

}


TypeScript


import { Agent } from "agents";

import { streamText } from "ai";

import { createWorkersAI } from "workers-ai-provider";


export class MyAgent extends Agent<Env> {

  async onMessage(connection: Connection, message: WSMessage) {

    const { prompt } = JSON.parse(message as string);

    const workersai = createWorkersAI({ binding: this.env.AI });


    const result = streamText({

      model: workersai("@cf/zai-org/glm-4.7-flash"),

      prompt,

    });


    for await (const chunk of result.textStream) {

      connection.send(JSON.stringify({ type: "chunk", content: chunk }));

    }


    this.sql`INSERT INTO responses (prompt, response, timestamp)

      VALUES (${prompt}, ${await result.text}, ${Date.now()})`;

  }

}


使用 AIChatAgent 时,这一切是自动处理的——消息会持久化到 SQLite,流也会在重连时恢复。

自主模型调用

Agent 不需要用户请求也能调用模型。你可以把模型调用安排在后台运行——用于夜间摘要、周期分类、监控,或任何应在无人参与下完成的任务。

JavaScript


import { Agent } from "agents";


export class DigestAgent extends Agent {

  async onStart() {

    this.schedule("0 8 * * *", "generateDailyDigest", {});

  }


  async generateDailyDigest() {

    const articles = this.sql`

      SELECT title, body FROM articles

      WHERE created_at > datetime('now', '-1 day')`;


    const workersai = createWorkersAI({ binding: this.env.AI });

    const { text } = await generateText({

      model: workersai("@cf/zai-org/glm-4.7-flash"),

      prompt: `Summarize these articles:\n${articles.map((a) => a.title + ": " + a.body).join("\n\n")}`,

    });


    this.sql`INSERT INTO digests (summary, created_at)

      VALUES (${text}, ${Date.now()})`;


    this.broadcast(JSON.stringify({ type: "digest", summary: text }));

  }

}


TypeScript


import { Agent } from "agents";


export class DigestAgent extends Agent<Env> {

  async onStart() {

    this.schedule("0 8 * * *", "generateDailyDigest", {});

  }


  async generateDailyDigest() {

    const articles = this.sql<{ title: string; body: string }>`

      SELECT title, body FROM articles

      WHERE created_at > datetime('now', '-1 day')`;


    const workersai = createWorkersAI({ binding: this.env.AI });

    const { text } = await generateText({

      model: workersai("@cf/zai-org/glm-4.7-flash"),

      prompt: `Summarize these articles:\n${articles.map((a) => a.title + ": " + a.body).join("\n\n")}`,

    });


    this.sql`INSERT INTO digests (summary, created_at)

      VALUES (${text}, ${Date.now()})`;


    this.broadcast(JSON.stringify({ type: "digest", summary: text }));

  }

}


多模型流水线

由于 Agent 在调用之间保持状态,你可以在一个方法里串起多个模型——用快速模型做分类、用推理模型做规划、用 embedding 模型做检索——而不会在步骤之间丢失上下文。

JavaScript


import { Agent } from "agents";

import { generateText, embed } from "ai";

import { createWorkersAI } from "workers-ai-provider";


export class TriageAgent extends Agent {

  async triage(ticket) {

    const workersai = createWorkersAI({ binding: this.env.AI });


    const { text: category } = await generateText({

      model: workersai("@cf/zai-org/glm-4.7-flash"),

      prompt: `Classify this support ticket into one of: billing, technical, account. Ticket: ${ticket}`,

    });


    const { embedding } = await embed({

      model: workersai("@cf/baai/bge-base-en-v1.5"),

      value: ticket,

    });

    const similar = await this.env.VECTOR_DB.query(embedding, { topK: 5 });


    const { text: response } = await generateText({

      model: workersai("@cf/zai-org/glm-4.7-flash"),

      prompt: `Draft a response for this ${category} ticket. Similar resolved tickets: ${JSON.stringify(similar)}. Ticket: ${ticket}`,

    });


    this.sql`INSERT INTO tickets (content, category, response, created_at)

      VALUES (${ticket}, ${category}, ${response}, ${Date.now()})`;


    return { category, response };

  }

}


TypeScript


import { Agent } from "agents";

import { generateText, embed } from "ai";

import { createWorkersAI } from "workers-ai-provider";


export class TriageAgent extends Agent<Env> {

  async triage(ticket: string) {

    const workersai = createWorkersAI({ binding: this.env.AI });


    const { text: category } = await generateText({

      model: workersai("@cf/zai-org/glm-4.7-flash"),

      prompt: `Classify this support ticket into one of: billing, technical, account. Ticket: ${ticket}`,

    });


    const { embedding } = await embed({

      model: workersai("@cf/baai/bge-base-en-v1.5"),

      value: ticket,

    });

    const similar = await this.env.VECTOR_DB.query(embedding, { topK: 5 });


    const { text: response } = await generateText({

      model: workersai("@cf/zai-org/glm-4.7-flash"),

      prompt: `Draft a response for this ${category} ticket. Similar resolved tickets: ${JSON.stringify(similar)}. Ticket: ${ticket}`,

    });


    this.sql`INSERT INTO tickets (content, category, response, created_at)

      VALUES (${ticket}, ${category}, ${response}, ${Date.now()})`;


    return { category, response };

  }

}


每一步的中间结果都保留在 Agent 的内存中直到方法结束,最终结果则被写入 SQL,方便后续引用。

缓存与成本控制

持久化存储让你可以缓存模型响应,避免重复调用。这对昂贵的操作(如 embedding 或长链推理)尤其有用。

JavaScript


import { Agent } from "agents";


export class CachingAgent extends Agent {

  async cachedGenerate(prompt) {

    const cached = this.sql`

      SELECT response FROM llm_cache WHERE prompt = ${prompt}`;


    if (cached.length > 0) {

      return cached[0].response;

    }


    const workersai = createWorkersAI({ binding: this.env.AI });

    const { text } = await generateText({

      model: workersai("@cf/zai-org/glm-4.7-flash"),

      prompt,

    });


    this.sql`INSERT INTO llm_cache (prompt, response, created_at)

      VALUES (${prompt}, ${text}, ${Date.now()})`;


    return text;

  }

}


TypeScript


import { Agent } from "agents";


export class CachingAgent extends Agent<Env> {

  async cachedGenerate(prompt: string) {

    const cached = this.sql<{ response: string }>`

      SELECT response FROM llm_cache WHERE prompt = ${prompt}`;


    if (cached.length > 0) {

      return cached[0].response;

    }


    const workersai = createWorkersAI({ binding: this.env.AI });

    const { text } = await generateText({

      model: workersai("@cf/zai-org/glm-4.7-flash"),

      prompt,

    });


    this.sql`INSERT INTO llm_cache (prompt, response, created_at)

      VALUES (${prompt}, ${text}, ${Date.now()})`;


    return text;

  }

}


如果想跨多个 agent 在 provider 层做缓存和速率限制管理,使用 AI Gateway

下一步

使用 AI 模型 Workers AI、OpenAI、Anthropic 等的 provider 设置、流式传输和代码示例。

Chat agent AIChatAgent 自动处理消息持久化、可恢复的流式传输与工具。

存储与同步状态 用于构建上下文与缓存的 SQL 数据库与 key-value 状态 API。

调度任务 按延迟、定时或 cron 触发自主的模型调用。