For education only. TraderBear is not a registered investment adviser. Nothing here is investment advice. Past simulated performance does not guarantee future results.

Home › Learn › Paper trading with AI

Paper trading with AI — what it proves, what it doesn't

Q: What does paper trading with AI actually prove?

Paper trading proves three things: (1) whether the agent interprets your plain-English rules the way you expected, (2) whether the rule itself has any edge against real market prices and real fills, and (3) whether you personally can tolerate the variance of the strategy. It does not prove you will be profitable live — live introduces slippage, real-money emotion, and tail events not present in any short paper window.

Q: How long should I paper-trade before going live?

At minimum: enough sessions to cover a range of conditions — quiet markets, volatile markets, weekend gaps, scheduled-event volatility. For most prediction-market strategies this is 4–8 weeks. The number to track is not how long you have run, but how many distinct market regimes the agent has been tested under.

Q: What's the biggest lie paper trading tells?

Fill quality. Paper systems often assume you got the displayed bid or ask. In real life, the book moves while your order is in flight; your order may partially fill or not fill at all. A good paper system simulates realistic slippage and partial fills. Most don't.

Q: Why measure decision quality, not just P&L?

P&L over a short window is mostly variance. Decision quality — did the agent enter when its rule fired? did it skip when the conditions didn't hold? did it size positions correctly? — is signal. A losing month with high decision quality is a green light; a winning month with sloppy execution is a red flag.

Q: Can I trust an AI agent more once it's profitable on paper?

Slightly, but not as much as feels natural. Profitable paper sessions clear a necessary bar, not a sufficient one. The next test is reading the agent's logs — for every winning trade, was the win due to the rule firing correctly or due to luck? Lucky wins are not evidence of edge.

Q: What's the right paper-money starting balance?

Use what you would actually risk live, not a fantasy number. A $10 million paper account teaches you nothing useful — position sizing becomes meaningless and you'll never feel a 30% drawdown the way you would on a $5k live account. Match paper to intent.

Last updated: July 5, 2026

Paper trading is simulated trading: you place orders with fictional money against real, live market prices, and the system logs the fills you would have received. Run with an AI agent doing the execution, it proves exactly three things — that the agent interprets your plain-English rules the way you meant them, that the rule holds up against real prices and spreads, and that you can sit through the strategy's variance. It does not prove you will be profitable live, because paper hides two big variables: fill quality and real-money emotion. That answer holds whether you are testing stocks, crypto, futures, or prediction markets. The rest of this guide unpacks each piece, with the numbers to track and a concrete checklist for graduating to live without lying to yourself.

What is paper trading with an AI agent?

Four terms carry the whole topic, so here are the definitions this article uses:

Paper trading is placing simulated orders with fictional money against real market prices, so a strategy is tested without financial risk.
Slippage is the difference between the price you expected when the order was sent and the price the order actually fills at.
Decision quality is the share of an agent's actions that followed its stated rule, judged independently of whether the trade made or lost money.
A market regime is a stretch of conditions with a distinct character — quiet drift, sustained volatility, or the sharp repricing around a scheduled event like a CPI release.

Paper trading with an AI agent means the agent — not you — watches the market, applies the rule, and files every simulated order, leaving a complete log of what it saw and why it acted. The account is fictional; the prices, spreads, and timestamps are real.

The three things paper trading does prove

1. Whether the agent reads your rules the way you meant them. The most common source of "this AI is broken" complaints is not a broken model — it is a rule the user thought meant one thing and the agent interpreted as another. Paper sessions surface this divergence cheaply. The first week of any new rule is a translation test, not a profit test.

2. Whether the rule has any edge against real prices and real fills. Even on paper, your fills come from real market data: real bids, real asks, real spreads. If a rule cannot turn a profit against the actual market — under realistic-slippage paper assumptions — it definitely cannot turn one live, where slippage is worse.

3. Whether you can tolerate the variance. Most strategies that work over a year have weeks where they lose 5–15% of the test balance. Watching that happen on paper is the only way to learn whether you will pull the plug at the bottom of a drawdown live. If you would have shut the agent off in week 3 of a drawdown that recovered by week 6, you don't yet have the temperament for that strategy.

The two big things paper trading hides

Fill quality. Most paper systems assume you got the price you wanted. Reality: by the time your order reached the venue, the book had moved. Work one example. An event contract is quoted 42¢ bid / 45¢ ask. A naive simulator fills your buy at the 42¢ you saw on screen; a live marketable buy pays the 45¢ ask, and on a thin book part of it may fill at 46–47¢. On 200 contracts, that 3¢ gap is $6 of extra cost per entry ($0.03 × 200) — and it repeats on the exit. If the rule's expected edge was 4¢ per contract ($8 on the position), the fill gap consumed three-quarters of it before fees. Insist on paper that simulates realistic slippage and partial fills, and discount any paper P&L number that doesn't.

Execution costs of this kind are the classic explanation for why very active retail traders lag the market. In Barber and Odean's study of 66,465 households with retail brokerage accounts, 1991–1996 (Journal of Finance, 2000), the most active fifth of households earned 11.4% a year net of costs while the market returned 17.9%, with trading costs — chiefly spreads and commissions — driving the gap. A paper system that ignores those costs is testing a different strategy than the one you would run live.

Real-money emotion. A 30% drawdown on paper feels uncomfortable. A 30% drawdown on $5,000 of your actual money — $1,500 gone from the account — feels different. Behavioral research puts a number on the asymmetry: in Tversky and Kahneman's cumulative prospect theory estimates (Journal of Risk and Uncertainty, 1992), losses are weighted about 2.25 times as heavily as equivalent gains. The agent behaves identically in a paper drawdown and a live one — you do not. The only honest test is a live deployment with a small position, after the paper bar is cleared.

Paper vs live: what each environment can show you

What you're testing	Paper session	Live deployment
Rule interpretation	Fully testable, at zero cost	Testable, but every misreading costs real money
Edge vs real prices	Partially — only as honest as the slippage model	Fully, but slowly
Fill quality and partial fills	Hidden or approximated	Fully visible
Fees and spreads	Often omitted	Charged on every order
Your emotional response	Muted — the losses are fictional	Full strength — losses weigh roughly 2.25× gains
Cost of a bug or a bad rule	$0	Real dollars, immediately
Iteration speed	Fast — revise the rule daily	Slow — each change needs weeks of evidence

Measure decision quality, not just P&L

P&L over any window short of a year is dominated by variance. Decision quality is the signal you should be tracking.

For every trade the agent took, ask: did the entry condition actually fire? was the position sized within the rule? did the agent exit when the rule said to? For every trade it skipped, ask: did the conditions truly not hold, or did the agent get conservative for the wrong reasons?

A useful single number: rule-conformant actions divided by total actions. If the agent acted 40 times in a month and 38 of those actions followed the rule as written, decision quality was 95% — and the two exceptions are the most valuable artifacts in the log, because they mark exactly where your intent and the agent's reading of it diverge.

A losing month with clean decision quality — agent fired the rule correctly every time, sized cleanly, took every exit signal — is a green light to keep running. A winning month with sloppy execution is a red flag, because the wins are luck and the next month will probably reveal it.

The archetypal postmortem on r/algotrading goes: paper trading said the strategy worked, live trading said it didn't — because the paper system had been granting perfect fills. The lesson: trust paper sessions only as far as their slippage assumptions are honest.

How long should I paper trade before going live?

Six weeks is a reasonable floor for most event-market strategies — but the honest unit of measure is market regimes covered, not calendar days. A practical, conservative checklist for moving from paper to live:

At least 6 weeks of paper, covering a quiet stretch, a volatile stretch, and at least one scheduled high-volatility event (CPI release, FOMC, election milestone).
Decision-quality audit: read every trade. Can you defend each one? If not, the agent's reasoning is opaque — fix the rule or the agent before going live.
Live with 5–10% of your intended budget. Not 100%. Watch the live behavior for at least 4 weeks. Live will diverge from paper in small ways; you want to see how before exposing the rest.
Risk caps at the platform level, not at the agent's discretion. Max per-trade, max daily loss, allowed venues. If the platform doesn't enforce these in code, you don't have caps — you have suggestions.

What is a good paper-trading starting balance?

The one you would actually deploy live. A $10 million paper account teaches you nothing useful: a 2%-per-trade sizing rule becomes a meaningless $200,000, and no drawdown will ever register. Set the paper balance to your realistic live number — say $5,000 — and the simulation starts producing information you can use: a 2% position cap is $100 per trade, a 15% losing stretch is $750, and a 30% drawdown is $1,500. Those are the magnitudes you would actually have to sit through, so rehearse at that scale.

Why an AI agent makes paper better, not worse

A human paper trader cheats. Not deliberately — but they remember the trade they "would have skipped if I'd been live." They retroactively forgive the trade that lost as "not a real signal." Their paper journal is fiction by month two.

An AI agent's paper sessions don't lie. Every trade is logged with the rule that fired, the conditions observed, the size computed. There is no "I would have skipped that one." Either the rule fired or it didn't. This is why paper trading with an agent is, paradoxically, more honest than paper trading by hand — there is no human in the loop to soften the record.

The corollary: if you find yourself wanting to override the agent's paper decisions, you don't yet trust the rule. That's useful information. Either revise the rule or revise your trust in your own judgment — but don't pretend the override was the rule.

What good paper-trading tools should show you

Every decision with the rule that fired and the inputs that were observed
Realistic slippage assumptions, ideally configurable
A clean separation between "agent acted" and "human overrode" — no silent edits to the trade log
Distributions of outcomes (not just average P&L), so you can see the worst-week and worst-month numbers your strategy actually produced
The ability to replay a single decision: what did the agent see at that timestamp, what alternatives did it weigh, why did it pick what it picked

FAQ

What does paper trading with AI actually prove?

Whether the agent reads your rules correctly, whether the rule has edge against real prices, and whether you can tolerate the strategy's variance. It does not prove you will be profitable live.

How long should I paper-trade before going live?

Long enough to cover varied conditions — quiet, volatile, scheduled events. For most prediction-market strategies, 4–8 weeks minimum. Measure regimes covered, not calendar days.

What's the biggest lie paper trading tells?

Fill quality. Many paper systems assume you got the displayed price. Reality: the book moves while your order is in flight. Discount any paper P&L from a system that doesn't simulate slippage.

Why measure decision quality, not just P&L?

P&L over short windows is variance. Decision quality is signal. A losing month with clean execution is a green light; a winning month with sloppy execution is a red flag.

Can I trust an AI agent more once it's profitable on paper?

A little, but not as much as feels natural. Profitable paper clears a necessary bar, not a sufficient one. Audit the wins — were they the rule firing correctly, or luck?

What's the right paper-money starting balance?

Use what you would actually risk live. A $10M paper account teaches you nothing; a $5k paper account behaves like the live $5k account you'll eventually deploy.

Paper-first by default.

TraderBear runs on paper money out of the box. Every decision is logged with the rule that fired and the inputs that were observed. Going live is an explicit, multi-step opt-in — not a checkbox you can flip by accident.

Adopt a bear →