重试
通过指数退避加抖动重试失败的操作。Agents SDK 为定时任务、队列任务以及通用的 this.retry() 方法提供了内置的重试支持,可用于你自己的代码。
概述
调用外部 API、与其他服务交互或运行后台任务时,瞬时失败十分常见。重试系统会自动处理这些情况:
- 指数退避 — 每次重试的等待时间都比上一次更长
- 抖动 — 随机化的延迟可避免“惊群“问题
- 可配置 — 可在每个调用点调整尝试次数、延迟和上限
- 内置 — schedule、queue 与 workflow 操作会自动重试
快速开始
使用 this.retry() 重试任意异步操作:
JavaScript
import { Agent } from "agents";
export class MyAgent extends Agent {
async fetchWithRetry(url) {
const response = await this.retry(async () => {
const res = await fetch(url);
if (!res.ok) throw new Error(`HTTP ${res.status}`);
return res.json();
});
return response;
}
}
TypeScript
import { Agent } from "agents";
export class MyAgent extends Agent {
async fetchWithRetry(url: string) {
const response = await this.retry(async () => {
const res = await fetch(url);
if (!res.ok) throw new Error(`HTTP ${res.status}`);
return res.json();
});
return response;
}
}
默认情况下,this.retry() 最多重试 3 次,使用带抖动的指数退避。
this.retry()
每个 Agent 实例都提供 retry() 方法。默认情况下,它会在抛出任何错误时重试给定的函数。
TypeScript
async retry<T>(
fn: (attempt: number) => Promise<T>,
options?: RetryOptions & {
shouldRetry?: (err: unknown, nextAttempt: number) => boolean;
}
): Promise<T>
参数:
fn— 要重试的异步函数。会接收当前尝试次数(从 1 开始)。options— 可选的重试配置(参见下方 RetryOptions)。选项会被立即校验——无效值会立即抛出。options.shouldRetry— 可选谓词函数,会接收抛出的错误和下次尝试编号。返回false可立即停止重试。如果未提供,所有错误都会被重试。
返回: 成功时为 fn 的返回值。
抛出: 如果所有尝试都失败,或 shouldRetry 返回 false,则抛出最后一次的错误。
示例
基本重试:
JavaScript
const data = await this.retry(() => fetch("https://api.example.com/data"));
TypeScript
const data = await this.retry(() => fetch("https://api.example.com/data"));
自定义重试选项:
JavaScript
const data = await this.retry(
async () => {
const res = await fetch("https://slow-api.example.com/data");
if (!res.ok) throw new Error(`HTTP ${res.status}`);
return res.json();
},
{
maxAttempts: 5,
baseDelayMs: 500,
maxDelayMs: 10000,
},
);
TypeScript
const data = await this.retry(
async () => {
const res = await fetch("https://slow-api.example.com/data");
if (!res.ok) throw new Error(`HTTP ${res.status}`);
return res.json();
},
{
maxAttempts: 5,
baseDelayMs: 500,
maxDelayMs: 10000,
},
);
使用尝试次数:
JavaScript
const result = await this.retry(async (attempt) => {
console.log(`Attempt ${attempt}...`);
return await this.callExternalService();
});
TypeScript
const result = await this.retry(async (attempt) => {
console.log(`Attempt ${attempt}...`);
return await this.callExternalService();
});
使用 shouldRetry 进行选择性重试:
使用 shouldRetry 在特定错误上停止重试。该谓词会同时收到错误和下次尝试编号:
JavaScript
const data = await this.retry(
async () => {
const res = await fetch("https://api.example.com/data");
if (!res.ok) throw new HttpError(res.status, await res.text());
return res.json();
},
{
maxAttempts: 5,
shouldRetry: (err, nextAttempt) => {
// Do not retry 4xx client errors — our request is wrong
if (err instanceof HttpError && err.status >= 400 && err.status < 500) {
return false;
}
return true; // retry everything else (5xx, network errors, etc.)
},
},
);
TypeScript
const data = await this.retry(
async () => {
const res = await fetch("https://api.example.com/data");
if (!res.ok) throw new HttpError(res.status, await res.text());
return res.json();
},
{
maxAttempts: 5,
shouldRetry: (err, nextAttempt) => {
// Do not retry 4xx client errors — our request is wrong
if (err instanceof HttpError && err.status >= 400 && err.status < 500) {
return false;
}
return true; // retry everything else (5xx, network errors, etc.)
},
},
);
在 schedule 中使用重试
创建定时任务时传入重试选项:
JavaScript
// Retry up to 5 times if the callback fails
await this.schedule(
"processTask",
60,
{ taskId: "123" },
{
retry: { maxAttempts: 5 },
},
);
// Retry with custom backoff
await this.schedule(
new Date("2026-03-01T09:00:00Z"),
"sendReport",
{},
{
retry: {
maxAttempts: 3,
baseDelayMs: 1000,
maxDelayMs: 30000,
},
},
);
// Cron with retries
await this.schedule(
"0 8 * * *",
"dailyDigest",
{},
{
retry: { maxAttempts: 3 },
},
);
// Interval with retries
await this.scheduleEvery(
30,
"poll",
{ source: "api" },
{
retry: { maxAttempts: 5, baseDelayMs: 200 },
},
);
TypeScript
// Retry up to 5 times if the callback fails
await this.schedule(
"processTask",
60,
{ taskId: "123" },
{
retry: { maxAttempts: 5 },
},
);
// Retry with custom backoff
await this.schedule(
new Date("2026-03-01T09:00:00Z"),
"sendReport",
{},
{
retry: {
maxAttempts: 3,
baseDelayMs: 1000,
maxDelayMs: 30000,
},
},
);
// Cron with retries
await this.schedule(
"0 8 * * *",
"dailyDigest",
{},
{
retry: { maxAttempts: 3 },
},
);
// Interval with retries
await this.scheduleEvery(
30,
"poll",
{ source: "api" },
{
retry: { maxAttempts: 5, baseDelayMs: 200 },
},
);
如果回调抛出错误,会按照重试选项重试。如果所有尝试都失败,错误会被记录并通过 onError() 上报。无论成功或失败,定时任务依然会被移除(一次性任务)或重新调度(cron/interval)。
在队列中使用重试
把任务加入队列时传入重试选项:
JavaScript
await this.queue(
"sendEmail",
{ to: "[email protected]" },
{
retry: { maxAttempts: 5 },
},
);
await this.queue("processWebhook", webhookData, {
retry: {
maxAttempts: 3,
baseDelayMs: 500,
maxDelayMs: 5000,
},
});
TypeScript
await this.queue(
"sendEmail",
{ to: "[email protected]" },
{
retry: { maxAttempts: 5 },
},
);
await this.queue("processWebhook", webhookData, {
retry: {
maxAttempts: 3,
baseDelayMs: 500,
maxDelayMs: 5000,
},
});
如果回调抛出错误,任务会在出队前被重试。所有尝试用尽后,任务会被出队,错误会被记录。
校验
调用 this.retry()、queue()、schedule() 或 scheduleEvery() 时,重试选项会被立即校验。无效选项会立即抛出错误,而不是稍后在执行时才失败:
JavaScript
// Throws immediately: "retry.maxAttempts must be >= 1"
await this.queue("sendEmail", data, {
retry: { maxAttempts: 0 },
});
// Throws immediately: "retry.baseDelayMs must be > 0"
await this.schedule(
60,
"process",
{},
{
retry: { baseDelayMs: -100 },
},
);
// Throws immediately: "retry.maxAttempts must be an integer"
await this.retry(() => fetch(url), { maxAttempts: 2.5 });
// Throws immediately: "retry.baseDelayMs must be <= retry.maxDelayMs"
// because baseDelayMs: 5000 exceeds the default maxDelayMs: 3000
await this.queue("sendEmail", data, {
retry: { baseDelayMs: 5000 },
});
TypeScript
// Throws immediately: "retry.maxAttempts must be >= 1"
await this.queue("sendEmail", data, {
retry: { maxAttempts: 0 },
});
// Throws immediately: "retry.baseDelayMs must be > 0"
await this.schedule(
60,
"process",
{},
{
retry: { baseDelayMs: -100 },
},
);
// Throws immediately: "retry.maxAttempts must be an integer"
await this.retry(() => fetch(url), { maxAttempts: 2.5 });
// Throws immediately: "retry.baseDelayMs must be <= retry.maxDelayMs"
// because baseDelayMs: 5000 exceeds the default maxDelayMs: 3000
await this.queue("sendEmail", data, {
retry: { baseDelayMs: 5000 },
});
校验时会先把部分选项与类级别或内置默认值合并,然后再检查跨字段约束。这意味着 { baseDelayMs: 5000 } 在解析后的 maxDelayMs 为 3000 时会被立即捕获,而不会等到执行时才失败。
默认行为
即使没有显式提供重试选项,定时和队列回调也会以合理的默认值进行重试:
| 设置 | 默认值 |
|---|---|
| maxAttempts | 3 |
| baseDelayMs | 100 |
| maxDelayMs | 3000 |
这些默认值适用于 this.retry()、queue()、schedule() 和 scheduleEvery()。每个调用点的选项会覆盖默认值。
类级别默认值
通过 static options 为整个 agent 覆盖默认值:
JavaScript
class MyAgent extends Agent {
static options = {
retry: { maxAttempts: 5, baseDelayMs: 200, maxDelayMs: 5000 },
};
}
TypeScript
class MyAgent extends Agent {
static options = {
retry: { maxAttempts: 5, baseDelayMs: 200, maxDelayMs: 5000 },
};
}
你只需指定要修改的字段——未设置的字段会回退到内置默认值:
JavaScript
class MyAgent extends Agent {
// Only override maxAttempts; baseDelayMs (100) and maxDelayMs (3000) stay default
static options = {
retry: { maxAttempts: 10 },
};
}
TypeScript
class MyAgent extends Agent {
// Only override maxAttempts; baseDelayMs (100) and maxDelayMs (3000) stay default
static options = {
retry: { maxAttempts: 10 },
};
}
类级别默认值会在调用点未指定重试选项时作为兜底。每个调用点的选项始终拥有最高优先级:
JavaScript
// Uses class-level defaults (10 attempts)
await this.retry(() => fetch(url));
// Overrides to 2 attempts for this specific call
await this.retry(() => fetch(url), { maxAttempts: 2 });
TypeScript
// Uses class-level defaults (10 attempts)
await this.retry(() => fetch(url));
// Overrides to 2 attempts for this specific call
await this.retry(() => fetch(url), { maxAttempts: 2 });
要为某个特定任务关闭重试,设置 maxAttempts: 1:
JavaScript
await this.schedule(
60,
"oneShot",
{},
{
retry: { maxAttempts: 1 },
},
);
TypeScript
await this.schedule(
60,
"oneShot",
{},
{
retry: { maxAttempts: 1 },
},
);
RetryOptions
TypeScript
interface RetryOptions {
/** Maximum number of attempts (including the first). Must be an integer >= 1. Default: 3 */
maxAttempts?: number;
/** Base delay in milliseconds for exponential backoff. Must be > 0 and <= maxDelayMs. Default: 100 */
baseDelayMs?: number;
/** Maximum delay cap in milliseconds. Must be > 0. Default: 3000 */
maxDelayMs?: number;
}
重试间隔使用全抖动指数退避(full jitter exponential backoff):
delay = random(0, min(2^attempt * baseDelayMs, maxDelayMs))
也就是说,前几次重试很快(通常 200ms 内),后续重试会逐渐放缓,以免压垮一个正在故障的服务。随机化(抖动)可防止多个 agent 在同一时刻同时重试。
工作原理
退避策略
重试系统使用 AWS 架构博客 ↗ 介绍的“Full Jitter“策略。在默认设置下,3 次尝试的延迟如下:
| 尝试 | 上界 | 实际延迟 |
|---|---|---|
| 1 | min(2^1 * 100, 3000) = 200ms | random(0, 200ms) |
| 2 | min(2^2 * 100, 3000) = 400ms | random(0, 400ms) |
| 3 | (不再重试 — 最后一次尝试) | — |
当 maxAttempts: 5 且 baseDelayMs: 500 时:
| 尝试 | 上界 | 实际延迟 |
|---|---|---|
| 1 | min(2 * 500, 3000) = 1000ms | random(0, 1000ms) |
| 2 | min(4 * 500, 3000) = 2000ms | random(0, 2000ms) |
| 3 | min(8 * 500, 3000) = 3000ms | random(0, 3000ms) |
| 4 | min(16 * 500, 3000) = 3000ms | random(0, 3000ms) |
| 5 | (不再重试 — 最后一次尝试) | — |
MCP 服务器重试
添加 MCP 服务器时,可以为连接和重连尝试配置重试选项:
JavaScript
await this.addMcpServer("github", "https://mcp.github.com", {
retry: { maxAttempts: 5, baseDelayMs: 1000, maxDelayMs: 10000 },
});
TypeScript
await this.addMcpServer("github", "https://mcp.github.com", {
retry: { maxAttempts: 5, baseDelayMs: 1000, maxDelayMs: 10000 },
});
这些选项会被持久化,并在以下时机使用:
- 休眠后恢复服务器连接
- OAuth 完成后建立连接
默认: 3 次尝试,500ms 基础延迟,5s 最大延迟。
模式
带日志的重试
JavaScript
class MyAgent extends Agent {
async resilientTask(payload) {
try {
const result = await this.retry(
async (attempt) => {
if (attempt > 1) {
console.log(`Retrying ${payload.url} (attempt ${attempt})...`);
}
const res = await fetch(payload.url);
if (!res.ok) throw new Error(`HTTP ${res.status}`);
return res.json();
},
{ maxAttempts: 5 },
);
console.log("Success:", result);
} catch (e) {
console.error("All retries failed:", e);
}
}
}
TypeScript
class MyAgent extends Agent {
async resilientTask(payload: { url: string }) {
try {
const result = await this.retry(
async (attempt) => {
if (attempt > 1) {
console.log(`Retrying ${payload.url} (attempt ${attempt})...`);
}
const res = await fetch(payload.url);
if (!res.ok) throw new Error(`HTTP ${res.status}`);
return res.json();
},
{ maxAttempts: 5 },
);
console.log("Success:", result);
} catch (e) {
console.error("All retries failed:", e);
}
}
}
带回退方案的重试
JavaScript
class MyAgent extends Agent {
async fetchData() {
try {
return await this.retry(
() => fetch("https://primary-api.example.com/data"),
{ maxAttempts: 3, baseDelayMs: 200 },
);
} catch {
// Primary failed, try fallback
return await this.retry(
() => fetch("https://fallback-api.example.com/data"),
{ maxAttempts: 2 },
);
}
}
}
TypeScript
class MyAgent extends Agent {
async fetchData() {
try {
return await this.retry(
() => fetch("https://primary-api.example.com/data"),
{ maxAttempts: 3, baseDelayMs: 200 },
);
} catch {
// Primary failed, try fallback
return await this.retry(
() => fetch("https://fallback-api.example.com/data"),
{ maxAttempts: 2 },
);
}
}
}
把重试与定时结合
对于可能需要较长时间(分钟或小时)才能恢复的操作,把 this.retry() 用于即时重试,把 this.schedule() 用于延迟重试:
JavaScript
class MyAgent extends Agent {
async syncData(payload) {
const attempt = payload.attempt ?? 1;
try {
// Immediate retries for transient failures (seconds)
await this.retry(() => this.fetchAndProcess(payload.source), {
maxAttempts: 3,
baseDelayMs: 1000,
});
} catch (e) {
if (attempt >= 5) {
console.error("Giving up after 5 scheduled attempts");
return;
}
// Schedule a retry in 5 minutes for longer outages
const delaySeconds = 300 * attempt;
await this.schedule(delaySeconds, "syncData", {
source: payload.source,
attempt: attempt + 1,
});
console.log(`Scheduled retry ${attempt + 1} in ${delaySeconds}s`);
}
}
}
TypeScript
class MyAgent extends Agent {
async syncData(payload: { source: string; attempt?: number }) {
const attempt = payload.attempt ?? 1;
try {
// Immediate retries for transient failures (seconds)
await this.retry(() => this.fetchAndProcess(payload.source), {
maxAttempts: 3,
baseDelayMs: 1000,
});
} catch (e) {
if (attempt >= 5) {
console.error("Giving up after 5 scheduled attempts");
return;
}
// Schedule a retry in 5 minutes for longer outages
const delaySeconds = 300 * attempt;
await this.schedule(delaySeconds, "syncData", {
source: payload.source,
attempt: attempt + 1,
});
console.log(`Scheduled retry ${attempt + 1} in ${delaySeconds}s`);
}
}
}
局限性
- 没有死信队列。 如果队列或定时任务用尽所有重试都失败,任务会被移除。如果你需要追踪失败的任务,请自行实现持久化。
- 重试延迟会阻塞 agent。 在退避延迟期间,Durable Object 处于唤醒但空闲状态。短延迟(3 秒以下)没问题,但对于较长的恢复时间,请改用
this.schedule()。 - 队列重试是队头阻塞的。 队列任务按顺序处理。如果某项任务以较长延迟重试,会阻塞所有后续任务。如果你需要相互独立的重试行为,请在回调内使用
this.retry(),而不是为queue()设置每任务的重试选项。 - 没有熔断器。 重试系统不会跨调用追踪失败率。如果某个服务长期不可用,每个任务都会独立耗尽自己的重试预算。
shouldRetry仅在this.retry()上可用。shouldRetry谓词无法用于schedule()或queue(),因为函数无法序列化到数据库中。对于定时/队列任务,请在回调内部处理不可重试的错误。
后续步骤
定时任务 安排任务在未来执行。
队列任务 用于即时处理的后台任务队列。
运行 Workflows 持久化的多步骤处理,带自动重试。