Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

重试

通过指数退避加抖动重试失败的操作。Agents SDK 为定时任务、队列任务以及通用的 this.retry() 方法提供了内置的重试支持,可用于你自己的代码。

概述

调用外部 API、与其他服务交互或运行后台任务时,瞬时失败十分常见。重试系统会自动处理这些情况:

  • 指数退避 — 每次重试的等待时间都比上一次更长
  • 抖动 — 随机化的延迟可避免“惊群“问题
  • 可配置 — 可在每个调用点调整尝试次数、延迟和上限
  • 内置 — schedule、queue 与 workflow 操作会自动重试

快速开始

使用 this.retry() 重试任意异步操作:

JavaScript


import { Agent } from "agents";


export class MyAgent extends Agent {

  async fetchWithRetry(url) {

    const response = await this.retry(async () => {

      const res = await fetch(url);

      if (!res.ok) throw new Error(`HTTP ${res.status}`);

      return res.json();

    });


    return response;

  }

}


TypeScript


import { Agent } from "agents";


export class MyAgent extends Agent {

  async fetchWithRetry(url: string) {

    const response = await this.retry(async () => {

      const res = await fetch(url);

      if (!res.ok) throw new Error(`HTTP ${res.status}`);

      return res.json();

    });


    return response;

  }

}


默认情况下,this.retry() 最多重试 3 次,使用带抖动的指数退避。

this.retry()

每个 Agent 实例都提供 retry() 方法。默认情况下,它会在抛出任何错误时重试给定的函数。

TypeScript


async retry<T>(

  fn: (attempt: number) => Promise<T>,

  options?: RetryOptions & {

    shouldRetry?: (err: unknown, nextAttempt: number) => boolean;

  }

): Promise<T>


参数:

  • fn — 要重试的异步函数。会接收当前尝试次数(从 1 开始)。
  • options — 可选的重试配置(参见下方 RetryOptions)。选项会被立即校验——无效值会立即抛出。
  • options.shouldRetry — 可选谓词函数,会接收抛出的错误和下次尝试编号。返回 false 可立即停止重试。如果未提供,所有错误都会被重试。

返回: 成功时为 fn 的返回值。

抛出: 如果所有尝试都失败,或 shouldRetry 返回 false,则抛出最后一次的错误。

示例

基本重试:

JavaScript


const data = await this.retry(() => fetch("https://api.example.com/data"));


TypeScript


const data = await this.retry(() => fetch("https://api.example.com/data"));


自定义重试选项:

JavaScript


const data = await this.retry(

  async () => {

    const res = await fetch("https://slow-api.example.com/data");

    if (!res.ok) throw new Error(`HTTP ${res.status}`);

    return res.json();

  },

  {

    maxAttempts: 5,

    baseDelayMs: 500,

    maxDelayMs: 10000,

  },

);


TypeScript


const data = await this.retry(

  async () => {

    const res = await fetch("https://slow-api.example.com/data");

    if (!res.ok) throw new Error(`HTTP ${res.status}`);

    return res.json();

  },

  {

    maxAttempts: 5,

    baseDelayMs: 500,

    maxDelayMs: 10000,

  },

);


使用尝试次数:

JavaScript


const result = await this.retry(async (attempt) => {

  console.log(`Attempt ${attempt}...`);

  return await this.callExternalService();

});


TypeScript


const result = await this.retry(async (attempt) => {

  console.log(`Attempt ${attempt}...`);

  return await this.callExternalService();

});


使用 shouldRetry 进行选择性重试:

使用 shouldRetry 在特定错误上停止重试。该谓词会同时收到错误和下次尝试编号:

JavaScript


const data = await this.retry(

  async () => {

    const res = await fetch("https://api.example.com/data");

    if (!res.ok) throw new HttpError(res.status, await res.text());

    return res.json();

  },

  {

    maxAttempts: 5,

    shouldRetry: (err, nextAttempt) => {

      // Do not retry 4xx client errors — our request is wrong

      if (err instanceof HttpError && err.status >= 400 && err.status < 500) {

        return false;

      }

      return true; // retry everything else (5xx, network errors, etc.)

    },

  },

);


TypeScript


const data = await this.retry(

  async () => {

    const res = await fetch("https://api.example.com/data");

    if (!res.ok) throw new HttpError(res.status, await res.text());

    return res.json();

  },

  {

    maxAttempts: 5,

    shouldRetry: (err, nextAttempt) => {

      // Do not retry 4xx client errors — our request is wrong

      if (err instanceof HttpError && err.status >= 400 && err.status < 500) {

        return false;

      }

      return true; // retry everything else (5xx, network errors, etc.)

    },

  },

);


在 schedule 中使用重试

创建定时任务时传入重试选项:

JavaScript


// Retry up to 5 times if the callback fails

await this.schedule(

  "processTask",

  60,

  { taskId: "123" },

  {

    retry: { maxAttempts: 5 },

  },

);


// Retry with custom backoff

await this.schedule(

  new Date("2026-03-01T09:00:00Z"),

  "sendReport",

  {},

  {

    retry: {

      maxAttempts: 3,

      baseDelayMs: 1000,

      maxDelayMs: 30000,

    },

  },

);


// Cron with retries

await this.schedule(

  "0 8 * * *",

  "dailyDigest",

  {},

  {

    retry: { maxAttempts: 3 },

  },

);


// Interval with retries

await this.scheduleEvery(

  30,

  "poll",

  { source: "api" },

  {

    retry: { maxAttempts: 5, baseDelayMs: 200 },

  },

);


TypeScript


// Retry up to 5 times if the callback fails

await this.schedule(

  "processTask",

  60,

  { taskId: "123" },

  {

    retry: { maxAttempts: 5 },

  },

);


// Retry with custom backoff

await this.schedule(

  new Date("2026-03-01T09:00:00Z"),

  "sendReport",

  {},

  {

    retry: {

      maxAttempts: 3,

      baseDelayMs: 1000,

      maxDelayMs: 30000,

    },

  },

);


// Cron with retries

await this.schedule(

  "0 8 * * *",

  "dailyDigest",

  {},

  {

    retry: { maxAttempts: 3 },

  },

);


// Interval with retries

await this.scheduleEvery(

  30,

  "poll",

  { source: "api" },

  {

    retry: { maxAttempts: 5, baseDelayMs: 200 },

  },

);


如果回调抛出错误,会按照重试选项重试。如果所有尝试都失败,错误会被记录并通过 onError() 上报。无论成功或失败,定时任务依然会被移除(一次性任务)或重新调度(cron/interval)。

在队列中使用重试

把任务加入队列时传入重试选项:

JavaScript


await this.queue(

  "sendEmail",

  { to: "[email protected]" },

  {

    retry: { maxAttempts: 5 },

  },

);


await this.queue("processWebhook", webhookData, {

  retry: {

    maxAttempts: 3,

    baseDelayMs: 500,

    maxDelayMs: 5000,

  },

});


TypeScript


await this.queue(

  "sendEmail",

  { to: "[email protected]" },

  {

    retry: { maxAttempts: 5 },

  },

);


await this.queue("processWebhook", webhookData, {

  retry: {

    maxAttempts: 3,

    baseDelayMs: 500,

    maxDelayMs: 5000,

  },

});


如果回调抛出错误,任务会在出队前被重试。所有尝试用尽后,任务会被出队,错误会被记录。

校验

调用 this.retry()queue()schedule()scheduleEvery() 时,重试选项会被立即校验。无效选项会立即抛出错误,而不是稍后在执行时才失败:

JavaScript


// Throws immediately: "retry.maxAttempts must be >= 1"

await this.queue("sendEmail", data, {

  retry: { maxAttempts: 0 },

});


// Throws immediately: "retry.baseDelayMs must be > 0"

await this.schedule(

  60,

  "process",

  {},

  {

    retry: { baseDelayMs: -100 },

  },

);


// Throws immediately: "retry.maxAttempts must be an integer"

await this.retry(() => fetch(url), { maxAttempts: 2.5 });


// Throws immediately: "retry.baseDelayMs must be <= retry.maxDelayMs"

// because baseDelayMs: 5000 exceeds the default maxDelayMs: 3000

await this.queue("sendEmail", data, {

  retry: { baseDelayMs: 5000 },

});


TypeScript


// Throws immediately: "retry.maxAttempts must be >= 1"

await this.queue("sendEmail", data, {

  retry: { maxAttempts: 0 },

});


// Throws immediately: "retry.baseDelayMs must be > 0"

await this.schedule(

  60,

  "process",

  {},

  {

    retry: { baseDelayMs: -100 },

  },

);


// Throws immediately: "retry.maxAttempts must be an integer"

await this.retry(() => fetch(url), { maxAttempts: 2.5 });


// Throws immediately: "retry.baseDelayMs must be <= retry.maxDelayMs"

// because baseDelayMs: 5000 exceeds the default maxDelayMs: 3000

await this.queue("sendEmail", data, {

  retry: { baseDelayMs: 5000 },

});


校验时会先把部分选项与类级别或内置默认值合并,然后再检查跨字段约束。这意味着 { baseDelayMs: 5000 } 在解析后的 maxDelayMs 为 3000 时会被立即捕获,而不会等到执行时才失败。

默认行为

即使没有显式提供重试选项,定时和队列回调也会以合理的默认值进行重试:

设置默认值
maxAttempts3
baseDelayMs100
maxDelayMs3000

这些默认值适用于 this.retry()queue()schedule()scheduleEvery()。每个调用点的选项会覆盖默认值。

类级别默认值

通过 static options 为整个 agent 覆盖默认值:

JavaScript


class MyAgent extends Agent {

  static options = {

    retry: { maxAttempts: 5, baseDelayMs: 200, maxDelayMs: 5000 },

  };

}


TypeScript


class MyAgent extends Agent {

  static options = {

    retry: { maxAttempts: 5, baseDelayMs: 200, maxDelayMs: 5000 },

  };

}


你只需指定要修改的字段——未设置的字段会回退到内置默认值:

JavaScript


class MyAgent extends Agent {

  // Only override maxAttempts; baseDelayMs (100) and maxDelayMs (3000) stay default

  static options = {

    retry: { maxAttempts: 10 },

  };

}


TypeScript


class MyAgent extends Agent {

  // Only override maxAttempts; baseDelayMs (100) and maxDelayMs (3000) stay default

  static options = {

    retry: { maxAttempts: 10 },

  };

}


类级别默认值会在调用点未指定重试选项时作为兜底。每个调用点的选项始终拥有最高优先级:

JavaScript


// Uses class-level defaults (10 attempts)

await this.retry(() => fetch(url));


// Overrides to 2 attempts for this specific call

await this.retry(() => fetch(url), { maxAttempts: 2 });


TypeScript


// Uses class-level defaults (10 attempts)

await this.retry(() => fetch(url));


// Overrides to 2 attempts for this specific call

await this.retry(() => fetch(url), { maxAttempts: 2 });


要为某个特定任务关闭重试,设置 maxAttempts: 1:

JavaScript


await this.schedule(

  60,

  "oneShot",

  {},

  {

    retry: { maxAttempts: 1 },

  },

);


TypeScript


await this.schedule(

  60,

  "oneShot",

  {},

  {

    retry: { maxAttempts: 1 },

  },

);


RetryOptions

TypeScript


interface RetryOptions {

  /** Maximum number of attempts (including the first). Must be an integer >= 1. Default: 3 */

  maxAttempts?: number;

  /** Base delay in milliseconds for exponential backoff. Must be > 0 and <= maxDelayMs. Default: 100 */

  baseDelayMs?: number;

  /** Maximum delay cap in milliseconds. Must be > 0. Default: 3000 */

  maxDelayMs?: number;

}


重试间隔使用全抖动指数退避(full jitter exponential backoff):


delay = random(0, min(2^attempt * baseDelayMs, maxDelayMs))


也就是说,前几次重试很快(通常 200ms 内),后续重试会逐渐放缓,以免压垮一个正在故障的服务。随机化(抖动)可防止多个 agent 在同一时刻同时重试。

工作原理

退避策略

重试系统使用 AWS 架构博客 ↗ 介绍的“Full Jitter“策略。在默认设置下,3 次尝试的延迟如下:

尝试上界实际延迟
1min(2^1 * 100, 3000) = 200msrandom(0, 200ms)
2min(2^2 * 100, 3000) = 400msrandom(0, 400ms)
3(不再重试 — 最后一次尝试)

maxAttempts: 5baseDelayMs: 500 时:

尝试上界实际延迟
1min(2 * 500, 3000) = 1000msrandom(0, 1000ms)
2min(4 * 500, 3000) = 2000msrandom(0, 2000ms)
3min(8 * 500, 3000) = 3000msrandom(0, 3000ms)
4min(16 * 500, 3000) = 3000msrandom(0, 3000ms)
5(不再重试 — 最后一次尝试)

MCP 服务器重试

添加 MCP 服务器时,可以为连接和重连尝试配置重试选项:

JavaScript


await this.addMcpServer("github", "https://mcp.github.com", {

  retry: { maxAttempts: 5, baseDelayMs: 1000, maxDelayMs: 10000 },

});


TypeScript


await this.addMcpServer("github", "https://mcp.github.com", {

  retry: { maxAttempts: 5, baseDelayMs: 1000, maxDelayMs: 10000 },

});


这些选项会被持久化,并在以下时机使用:

  • 休眠后恢复服务器连接
  • OAuth 完成后建立连接

默认: 3 次尝试,500ms 基础延迟,5s 最大延迟。

模式

带日志的重试

JavaScript


class MyAgent extends Agent {

  async resilientTask(payload) {

    try {

      const result = await this.retry(

        async (attempt) => {

          if (attempt > 1) {

            console.log(`Retrying ${payload.url} (attempt ${attempt})...`);

          }

          const res = await fetch(payload.url);

          if (!res.ok) throw new Error(`HTTP ${res.status}`);

          return res.json();

        },

        { maxAttempts: 5 },

      );

      console.log("Success:", result);

    } catch (e) {

      console.error("All retries failed:", e);

    }

  }

}


TypeScript


class MyAgent extends Agent {

  async resilientTask(payload: { url: string }) {

    try {

      const result = await this.retry(

        async (attempt) => {

          if (attempt > 1) {

            console.log(`Retrying ${payload.url} (attempt ${attempt})...`);

          }

          const res = await fetch(payload.url);

          if (!res.ok) throw new Error(`HTTP ${res.status}`);

          return res.json();

        },

        { maxAttempts: 5 },

      );

      console.log("Success:", result);

    } catch (e) {

      console.error("All retries failed:", e);

    }

  }

}


带回退方案的重试

JavaScript


class MyAgent extends Agent {

  async fetchData() {

    try {

      return await this.retry(

        () => fetch("https://primary-api.example.com/data"),

        { maxAttempts: 3, baseDelayMs: 200 },

      );

    } catch {

      // Primary failed, try fallback

      return await this.retry(

        () => fetch("https://fallback-api.example.com/data"),

        { maxAttempts: 2 },

      );

    }

  }

}


TypeScript


class MyAgent extends Agent {

  async fetchData() {

    try {

      return await this.retry(

        () => fetch("https://primary-api.example.com/data"),

        { maxAttempts: 3, baseDelayMs: 200 },

      );

    } catch {

      // Primary failed, try fallback

      return await this.retry(

        () => fetch("https://fallback-api.example.com/data"),

        { maxAttempts: 2 },

      );

    }

  }

}


把重试与定时结合

对于可能需要较长时间(分钟或小时)才能恢复的操作,把 this.retry() 用于即时重试,把 this.schedule() 用于延迟重试:

JavaScript


class MyAgent extends Agent {

  async syncData(payload) {

    const attempt = payload.attempt ?? 1;


    try {

      // Immediate retries for transient failures (seconds)

      await this.retry(() => this.fetchAndProcess(payload.source), {

        maxAttempts: 3,

        baseDelayMs: 1000,

      });

    } catch (e) {

      if (attempt >= 5) {

        console.error("Giving up after 5 scheduled attempts");

        return;

      }


      // Schedule a retry in 5 minutes for longer outages

      const delaySeconds = 300 * attempt;

      await this.schedule(delaySeconds, "syncData", {

        source: payload.source,

        attempt: attempt + 1,

      });

      console.log(`Scheduled retry ${attempt + 1} in ${delaySeconds}s`);

    }

  }

}


TypeScript


class MyAgent extends Agent {

  async syncData(payload: { source: string; attempt?: number }) {

    const attempt = payload.attempt ?? 1;


    try {

      // Immediate retries for transient failures (seconds)

      await this.retry(() => this.fetchAndProcess(payload.source), {

        maxAttempts: 3,

        baseDelayMs: 1000,

      });

    } catch (e) {

      if (attempt >= 5) {

        console.error("Giving up after 5 scheduled attempts");

        return;

      }


      // Schedule a retry in 5 minutes for longer outages

      const delaySeconds = 300 * attempt;

      await this.schedule(delaySeconds, "syncData", {

        source: payload.source,

        attempt: attempt + 1,

      });

      console.log(`Scheduled retry ${attempt + 1} in ${delaySeconds}s`);

    }

  }

}


局限性

  • 没有死信队列。 如果队列或定时任务用尽所有重试都失败,任务会被移除。如果你需要追踪失败的任务,请自行实现持久化。
  • 重试延迟会阻塞 agent。 在退避延迟期间,Durable Object 处于唤醒但空闲状态。短延迟(3 秒以下)没问题,但对于较长的恢复时间,请改用 this.schedule()
  • 队列重试是队头阻塞的。 队列任务按顺序处理。如果某项任务以较长延迟重试,会阻塞所有后续任务。如果你需要相互独立的重试行为,请在回调内使用 this.retry(),而不是为 queue() 设置每任务的重试选项。
  • 没有熔断器。 重试系统不会跨调用追踪失败率。如果某个服务长期不可用,每个任务都会独立耗尽自己的重试预算。
  • shouldRetry 仅在 this.retry() 上可用。 shouldRetry 谓词无法用于 schedule()queue(),因为函数无法序列化到数据库中。对于定时/队列任务,请在回调内部处理不可重试的错误。

后续步骤

定时任务 安排任务在未来执行。

队列任务 用于即时处理的后台任务队列。

运行 Workflows 持久化的多步骤处理,带自动重试。