For education only. TraderBear is not a registered investment adviser. Nothing here is investment advice. Past simulated performance does not guarantee future results.

Home › Blog › Cloudflare Worker vs FastAPI for agents

Why we put the trading agent on a Cloudflare Worker

May 2026 · Ray Lei, founder

TraderBear's architecture has two server-side pieces. A Cloudflare Worker hosts the AI agent — the layer that reads a user's plain-English rule, monitors markets, and decides what to do. A FastAPI service on Fly.io handles broker credential storage and a Tavily-backed web-search proxy. This post is the why: the trade-offs we weighed, the wrong turns we already made, and what the architecture buys us.

The shape of the decision

Every agent product makes one early call: where does the agent run? The options:

Single long-lived server (FastAPI, Express, Rails). Easy to reason about, easy to debug, but cold-start latency hurts agents that are conversational, and scaling per user means provisioning real CPU.
Serverless functions (Lambda, Cloudflare Workers). Per-request scaling, low cold start (for Workers, effectively zero), but execution-time and memory limits matter.
Edge runtime with KV / Durable Objects (Workers + DO, Deno Deploy). Same per-request model as serverless but with first-class stateful primitives.

We initially put the agent in FastAPI alongside the rest of the backend. The reasoning at the time was speed of iteration: one Python codebase, one deploy, one place to test. We had the agent tools, the broker integration, the persistence layer, all in one repo. It worked. Then it started to hurt.

What hurt

Cold-start latency on cold paths. Fly.io scales instances down. When a user opened the cockpit after some idle time, the first request could pay 1–3 seconds of cold-start before the agent even started reading the market. For a chat-style agent this is the difference between "feels alive" and "feels broken."

Cost shape. The agent's workload is bursty — long idle stretches, sudden activity when a user is in session, occasional cron-driven scans. FastAPI on a long-lived instance means you pay for CPU you aren't using most of the time. Workers' per-request billing matches the workload better.

Geographic distribution. The agent talks to LLM APIs (Anthropic, DeepSeek), market data APIs, and the user's browser. Three legs, three different ideal locations. A single FastAPI instance in one region adds latency on at least one of them. Workers run close to the user.

Stateful runtime for free. Cloudflare Durable Objects give us per-user stateful sessions without standing up a separate Redis. For a per-user agent with conversation state and scheduled scans, this matters.

What we kept in FastAPI

Two things, deliberately.

Broker credentials. The FastAPI service holds encrypted broker API keys and serves balance reads. We kept this off the Worker for two reasons: (a) the encryption-at-rest story on Fly.io with a fixed Postgres is simpler to audit than the equivalent on Workers KV; (b) broker credentials are the highest-value secret in the system, and isolating them on a separately deployed service with a different attack surface is defense in depth.

A web-search proxy. The Worker calls a single FastAPI endpoint that wraps Tavily for web search. Centralizing the API key here means no Worker ever sees it directly. If we ever need to rotate the Tavily key, we do it in one place. If we ever need to swap providers, we do it in one place.

Everything else — the agent itself, the trade-decision logic, the rule translation, the orchestration — runs on the Worker.

The actual topology

┌─────────────────┐
│  Browser (SPA)  │  TraderBearWeb on Cloudflare Pages
└────────┬────────┘
         │  Firebase Auth → ID token
         ▼
┌─────────────────────────────────────────┐
│  Cloudflare Worker — the agent layer    │
│  - Reads plain-English rule             │
│  - Talks to LLM (Anthropic / DeepSeek)  │
│  - Monitors prediction-market venues    │
│  - Enforces risk caps in code           │
│  - Persists decisions to Firestore       │
└────────┬────────────────────────────────┘
         │                  │
         │ Tavily proxy     │ Broker balance reads
         │ (web-search)     │ (read-only credential calls)
         ▼                  ▼
┌─────────────────────────────────────────┐
│  FastAPI on Fly.io                      │
│  - Holds encrypted broker credentials   │
│  - Wraps Tavily API key                 │
│  - Auth shim for legacy paths           │
└─────────────────────────────────────────┘

What this buys us

Risk caps live close to execution. The Worker is the only thing that issues trades. The cap-check happens in the same TypeScript file that calls the market venue. No network hop between the cap and the action means no race window where one but not the other ran.

Latency for the conversational path is low. Worker cold starts are effectively zero. The user types, the Worker is already warm at the nearest PoP, the LLM call is the only meaningful latency.

The most sensitive secrets are on the most boring service. Broker keys live on FastAPI. The Worker has its own narrow set of secrets (LLM provider keys, Firebase service-account credentials). Compromising one does not give you the other.

We can iterate on the agent without redeploying the credential layer. Two repositories, two deploy pipelines. The Worker ships dozens of times per week; the FastAPI service ships only when broker integrations change. The blast radius of an agent-side bug doesn't touch the credential layer.

What we'd do differently

We over-invested in a sandboxed Python bot runner in early versions of TraderBear. The product question was "what if users bring their own code?" — but in practice, the audience we ended up caring about wanted to bring their own intent, not their own code. The sandbox is frozen now (not deleted; just no new work going into it). Lesson: validate the user-input layer before building the runtime to host it.

We also kept the FastAPI service larger than it needed to be for longer than we should have. The migration of the agent to Workers was a single PR that should have happened three months earlier. "It works today" is the enemy of "it'll work better tomorrow" — especially for foundational architecture calls. If you have a working monolith and a hypothesis that splitting it would help, the cost of the split is almost always less than the cost of postponing it.

What we'd warn other agent builders about

Workers' execution-time limits matter for backtests. A backtest that walks through 10 years of hourly data does not fit in a Worker's CPU budget. We moved backtests to a separate, queue-driven path with a longer-running compute environment. Don't try to make a Worker do compute it wasn't built for.

Streaming responses from the Worker are great until they aren't. The Worker's streaming API works beautifully when the LLM streams cleanly. When the LLM stalls mid-stream, the failure mode is uglier than a clean request/response — half the response is on the user's screen, the other half never arrives. We had to build explicit timeout + retry logic that we wouldn't have needed in a non-streaming architecture.

Durable Objects are great for state, terrible for premature optimization. We almost shoved per-user trade history into Durable Objects "for speed." Firestore is fine for that workload and we'd have created an unnecessary migration debt. DOs earn their place when the state truly is per-session and hot.

Bottom line

The Worker-plus-FastAPI split is not the only right answer for an agent product. It is the right answer for an agent product where the agent's workload is bursty, latency-sensitive, and globally distributed, and where the secrets you care most about can live on a separately deployed service. If your workload looks different — long-running batch jobs, sustained CPU, heavy in-process state — a single long-lived server is fine.

The architectural principle that survives across topologies: the most dangerous code path (executing trades) should be the simplest, with the fewest dependencies, the fewest network hops, and the smallest amount of LLM in it. Whatever runtime you pick, that's the invariant to design around.

Try the architecture.

The agent is on Cloudflare. The risk caps are in code. The bear is on paper money by default. Adopt one and try a rule.

Adopt a bear →