调用 LLM
Agent 改变了你与 LLM 协作的方式。在无状态的 Worker 中,每次请求都从零开始——你重建上下文、调用模型、返回响应,然后忘掉一切。Agent 在调用之间保留状态,通过 WebSocket 与客户端保持连接,并能在没有用户在场的情况下按自己的节奏调用模型。
本文介绍当 LLM 调用发生在有状态 Agent 内部时新增的可能性。关于 provider 配置和代码示例,请参考使用 AI 模型。
状态即上下文
每个 Agent 都有内置的 SQL 数据库和 key-value 状态。Agent 不再让客户端在每次请求时传完整的对话历史,而是自己存下来,基于自己的存储构建提示词。
JavaScript
import { Agent } from "agents";
export class ResearchAgent extends Agent {
async buildPrompt(userMessage) {
const history = this.sql`
SELECT role, content FROM messages
ORDER BY timestamp DESC LIMIT 50`;
const preferences = this.sql`
SELECT key, value FROM user_preferences`;
return [
{ role: "system", content: this.systemPrompt(preferences) },
...history.reverse(),
{ role: "user", content: userMessage },
];
}
}
TypeScript
import { Agent } from "agents";
export class ResearchAgent extends Agent<Env> {
async buildPrompt(userMessage: string) {
const history = this.sql<{ role: string; content: string }>`
SELECT role, content FROM messages
ORDER BY timestamp DESC LIMIT 50`;
const preferences = this.sql<{ key: string; value: string }>`
SELECT key, value FROM user_preferences`;
return [
{ role: "system", content: this.systemPrompt(preferences) },
...history.reverse(),
{ role: "user", content: userMessage },
];
}
}
这意味着客户端不需要在每条消息时都发送完整的对话。Agent 拥有历史记录,可以裁剪它,可以用检索到的文档丰富它,或者把更老的轮次摘要后再发给模型。
经受断连考验
像 DeepSeek R1 或 GLM-4 这样的推理模型,响应时间可能从 30 秒到几分钟不等。在无状态的请求-响应架构里,客户端必须在整个时间内保持连接。一旦连接断开,响应就会丢失。
Agent 在客户端断开后仍会继续运行。响应到达时,Agent 可以把它持久化到状态,等客户端重连时再交付——哪怕过去了几小时甚至几天。
JavaScript
import { Agent } from "agents";
import { streamText } from "ai";
import { createWorkersAI } from "workers-ai-provider";
export class MyAgent extends Agent {
async onMessage(connection, message) {
const { prompt } = JSON.parse(message);
const workersai = createWorkersAI({ binding: this.env.AI });
const result = streamText({
model: workersai("@cf/zai-org/glm-4.7-flash"),
prompt,
});
for await (const chunk of result.textStream) {
connection.send(JSON.stringify({ type: "chunk", content: chunk }));
}
this.sql`INSERT INTO responses (prompt, response, timestamp)
VALUES (${prompt}, ${await result.text}, ${Date.now()})`;
}
}
TypeScript
import { Agent } from "agents";
import { streamText } from "ai";
import { createWorkersAI } from "workers-ai-provider";
export class MyAgent extends Agent<Env> {
async onMessage(connection: Connection, message: WSMessage) {
const { prompt } = JSON.parse(message as string);
const workersai = createWorkersAI({ binding: this.env.AI });
const result = streamText({
model: workersai("@cf/zai-org/glm-4.7-flash"),
prompt,
});
for await (const chunk of result.textStream) {
connection.send(JSON.stringify({ type: "chunk", content: chunk }));
}
this.sql`INSERT INTO responses (prompt, response, timestamp)
VALUES (${prompt}, ${await result.text}, ${Date.now()})`;
}
}
使用 AIChatAgent 时,这一切是自动处理的——消息会持久化到 SQLite,流也会在重连时恢复。
自主模型调用
Agent 不需要用户请求也能调用模型。你可以把模型调用安排在后台运行——用于夜间摘要、周期分类、监控,或任何应在无人参与下完成的任务。
JavaScript
import { Agent } from "agents";
export class DigestAgent extends Agent {
async onStart() {
this.schedule("0 8 * * *", "generateDailyDigest", {});
}
async generateDailyDigest() {
const articles = this.sql`
SELECT title, body FROM articles
WHERE created_at > datetime('now', '-1 day')`;
const workersai = createWorkersAI({ binding: this.env.AI });
const { text } = await generateText({
model: workersai("@cf/zai-org/glm-4.7-flash"),
prompt: `Summarize these articles:\n${articles.map((a) => a.title + ": " + a.body).join("\n\n")}`,
});
this.sql`INSERT INTO digests (summary, created_at)
VALUES (${text}, ${Date.now()})`;
this.broadcast(JSON.stringify({ type: "digest", summary: text }));
}
}
TypeScript
import { Agent } from "agents";
export class DigestAgent extends Agent<Env> {
async onStart() {
this.schedule("0 8 * * *", "generateDailyDigest", {});
}
async generateDailyDigest() {
const articles = this.sql<{ title: string; body: string }>`
SELECT title, body FROM articles
WHERE created_at > datetime('now', '-1 day')`;
const workersai = createWorkersAI({ binding: this.env.AI });
const { text } = await generateText({
model: workersai("@cf/zai-org/glm-4.7-flash"),
prompt: `Summarize these articles:\n${articles.map((a) => a.title + ": " + a.body).join("\n\n")}`,
});
this.sql`INSERT INTO digests (summary, created_at)
VALUES (${text}, ${Date.now()})`;
this.broadcast(JSON.stringify({ type: "digest", summary: text }));
}
}
多模型流水线
由于 Agent 在调用之间保持状态,你可以在一个方法里串起多个模型——用快速模型做分类、用推理模型做规划、用 embedding 模型做检索——而不会在步骤之间丢失上下文。
JavaScript
import { Agent } from "agents";
import { generateText, embed } from "ai";
import { createWorkersAI } from "workers-ai-provider";
export class TriageAgent extends Agent {
async triage(ticket) {
const workersai = createWorkersAI({ binding: this.env.AI });
const { text: category } = await generateText({
model: workersai("@cf/zai-org/glm-4.7-flash"),
prompt: `Classify this support ticket into one of: billing, technical, account. Ticket: ${ticket}`,
});
const { embedding } = await embed({
model: workersai("@cf/baai/bge-base-en-v1.5"),
value: ticket,
});
const similar = await this.env.VECTOR_DB.query(embedding, { topK: 5 });
const { text: response } = await generateText({
model: workersai("@cf/zai-org/glm-4.7-flash"),
prompt: `Draft a response for this ${category} ticket. Similar resolved tickets: ${JSON.stringify(similar)}. Ticket: ${ticket}`,
});
this.sql`INSERT INTO tickets (content, category, response, created_at)
VALUES (${ticket}, ${category}, ${response}, ${Date.now()})`;
return { category, response };
}
}
TypeScript
import { Agent } from "agents";
import { generateText, embed } from "ai";
import { createWorkersAI } from "workers-ai-provider";
export class TriageAgent extends Agent<Env> {
async triage(ticket: string) {
const workersai = createWorkersAI({ binding: this.env.AI });
const { text: category } = await generateText({
model: workersai("@cf/zai-org/glm-4.7-flash"),
prompt: `Classify this support ticket into one of: billing, technical, account. Ticket: ${ticket}`,
});
const { embedding } = await embed({
model: workersai("@cf/baai/bge-base-en-v1.5"),
value: ticket,
});
const similar = await this.env.VECTOR_DB.query(embedding, { topK: 5 });
const { text: response } = await generateText({
model: workersai("@cf/zai-org/glm-4.7-flash"),
prompt: `Draft a response for this ${category} ticket. Similar resolved tickets: ${JSON.stringify(similar)}. Ticket: ${ticket}`,
});
this.sql`INSERT INTO tickets (content, category, response, created_at)
VALUES (${ticket}, ${category}, ${response}, ${Date.now()})`;
return { category, response };
}
}
每一步的中间结果都保留在 Agent 的内存中直到方法结束,最终结果则被写入 SQL,方便后续引用。
缓存与成本控制
持久化存储让你可以缓存模型响应,避免重复调用。这对昂贵的操作(如 embedding 或长链推理)尤其有用。
JavaScript
import { Agent } from "agents";
export class CachingAgent extends Agent {
async cachedGenerate(prompt) {
const cached = this.sql`
SELECT response FROM llm_cache WHERE prompt = ${prompt}`;
if (cached.length > 0) {
return cached[0].response;
}
const workersai = createWorkersAI({ binding: this.env.AI });
const { text } = await generateText({
model: workersai("@cf/zai-org/glm-4.7-flash"),
prompt,
});
this.sql`INSERT INTO llm_cache (prompt, response, created_at)
VALUES (${prompt}, ${text}, ${Date.now()})`;
return text;
}
}
TypeScript
import { Agent } from "agents";
export class CachingAgent extends Agent<Env> {
async cachedGenerate(prompt: string) {
const cached = this.sql<{ response: string }>`
SELECT response FROM llm_cache WHERE prompt = ${prompt}`;
if (cached.length > 0) {
return cached[0].response;
}
const workersai = createWorkersAI({ binding: this.env.AI });
const { text } = await generateText({
model: workersai("@cf/zai-org/glm-4.7-flash"),
prompt,
});
this.sql`INSERT INTO llm_cache (prompt, response, created_at)
VALUES (${prompt}, ${text}, ${Date.now()})`;
return text;
}
}
如果想跨多个 agent 在 provider 层做缓存和速率限制管理,使用 AI Gateway。
下一步
使用 AI 模型 Workers AI、OpenAI、Anthropic 等的 provider 设置、流式传输和代码示例。
Chat agent AIChatAgent 自动处理消息持久化、可恢复的流式传输与工具。
存储与同步状态 用于构建上下文与缓存的 SQL 数据库与 key-value 状态 API。
调度任务 按延迟、定时或 cron 触发自主的模型调用。