For education only. TraderBear is not a registered investment adviser. Nothing here is investment advice. All numbers below are illustrative of methodology. Past simulated performance does not guarantee future results.

Home › Blog › 1,000 paper trades

1,000 paper trades: the patterns that show up

May 2026 · Ray Lei, founder · methodology + observed patterns from TraderBear backtests on event-market data

Running one or two paper trades teaches you nothing. Running ten teaches you the agent's basic behavior. Running a thousand teaches you what the strategy actually does. The patterns below come up across asset classes — stocks, crypto, futures, prediction markets — when you put any single rule through that volume of trades. They are not predictions of future results; they are observations of how strategy distributions behave at scale.

The setup (a teaching example)

To make any of this concrete, you need a specific rule and a specific market. The example we use below — picked because it's a clean illustration, not because it's a recommendation — is a simple breakout rule on a major equity ETF:

"On SPY, enter when price breaks above the prior-day high with volume confirmation in the first 30 minutes of the session. Size: 1% of paper balance per position. Max 3 positions open at once. Exit: end of day, or stop at prior-day low."

One market, one rule, fixed parameters. The point of running 1,000 trades against this is not to discover whether the rule is profitable in some grand sense — it is to see the distribution of outcomes and identify the failure modes that don't show up in 10-trade samples. The patterns generalize: substitute a crypto momentum rule, a futures spread rule, or a prediction-market mispricing rule and you'll see the same shape.

Pattern 1: most of the P&L comes from a small fraction of trades

This is the most consistent finding across categories. The "median trade" is roughly break-even — it enters at a small signal, the signal expires, you collect very little. The bottom majority of trades by P&L contribute almost nothing to the cumulative result.

The work the strategy actually does happens in the top tail. These are the ones where the breakout actually ran, or the mispricing actually closed wide, or the momentum carried for longer than the average case. The shape is the same in stocks, crypto, and event markets: a long-tail distribution where a small fraction of trades produces most of the result.

Implication: cutting these trades short is the most dangerous instinct across asset classes. Whatever your impulse is to "lock in gains" or "manage the position," resist it on the top-decile setups. Those are where the year happens.

Pattern 2: drawdowns are bigger and longer than you expect

On any strategy with a true win rate around 55%, you will hit a streak of 6+ consecutive losers somewhere in a 1,000-trade run. Probability theory guarantees it. Your gut, watching it happen in week three of a paper deployment, will not believe it. This is true whether the strategy is on stocks, crypto, futures, or event markets.

The number to look at is the maximum drawdown — the deepest peak-to-trough cumulative loss during the run. The magnitude varies by strategy and category, but the universal principle: if you can't sit through whatever your strategy's typical drawdown is without overriding the agent, the strategy is wrong for you — not because the strategy is bad, but because your variance tolerance is not matched.

The cheapest way to find out you can't tolerate a 20% drawdown is on paper, in month two, before any real money is on the line. The expensive way is to find out live, override the agent at the bottom, miss the recovery, and then either repeat with smaller capital (now demoralized) or quit. Pay the cheap tuition.

Pattern 3: the boring filters do most of the work

In the SPY breakout example, two filters matter more than they look. The volume confirmation in the first 30 minutes — turn it off, you get more trades but worse average outcomes. The "max 3 positions" cap — relax it, you concentrate risk on whatever day decides to trend the wrong way.

The same pattern shows up in every category. Crypto rules without explicit liquidity filters get picked off on thin alt tokens. Futures rules without contract-size discipline blow up on margin calls. Event-market rules without spread filters bleed cost on every entry. The clever part of a strategy is usually overrated; the boring part — the filters that say "don't enter when X" — is usually doing most of the work.

Generalization: if your strategy doesn't have explicit liquidity, spread, and time-of-day filters appropriate to its category, it is probably leaking edge in ways you can't see in small samples.

Pattern 4: scheduled events distort the rule

Every category has scheduled volatility moments. Stocks: earnings, FOMC, CPI. Crypto: major upgrade dates, on-chain unlocks. Futures: contract roll. Event markets: data prints, election milestones. The 30 minutes before and after any scheduled release is the moment when fills are worst and the gap between paper and live is widest.

When you split a 1,000-trade sample into "entered during the volatility window" vs "entered outside," the outside-window trades almost always perform meaningfully better per trade. Sometimes the inside-window trades have the highest absolute outcomes, but they also have the worst execution costs, and the realized result shrinks.

Implication: a "no-entry zone" of 15 minutes before and 30 minutes after scheduled releases often improves realized outcomes without removing meaningful trades. It's worth testing on any rule, in any category.

Pattern 5: the agent's mistakes are remarkably consistent

When an agent loses big, it loses big in a small number of recurring ways. In the SPY breakout example, the worst trades cluster into two categories:

Fake-out breakouts. The agent enters on a clean break of the prior high with volume; the break reverses within an hour and never recovers. These are unavoidable on individual instances but become identifiable as a class (e.g., low-VIX days where breakouts are systematically more likely to fail).
News-driven gaps. The agent enters on a breakout that's actually being driven by news the rule had no input on (an analyst downgrade, a sector rotation). The price action looked technical but the cause wasn't.

In crypto rules, the recurring losers cluster around exchange outages and stablecoin de-pegs. In futures, around margin-call cascades. In event markets, around stale consensus data feeds. The point is universal: you only find these by running enough trades that the failure modes become obvious patterns rather than one-off events.

The three numbers that matter more than headline P&L

Maximum drawdown: how much the cumulative P&L fell from a peak. Tells you whether you can stomach the strategy.
P&L attributable to top 10% of trades: if this is more than 80% of the total, your strategy is "fat-tail dependent" and cutting winners short will destroy it.
Standard deviation of P&L by week: tells you how lumpy the strategy is. A high number means a lot of your year happens in a few good weeks, which has implications for both your psychology and your tax accounting.

The headline P&L number is the least useful summary. Two strategies with identical net returns can have wildly different drawdowns, fat-tail dependence, and weekly variance — and they will feel completely different to operate.

What 1,000 trades does not tell you

It does not tell you the strategy will work next year. Markets change. The 8-point consensus-disagreement threshold that produced edge over the historical sample may stop producing edge once enough other traders are watching for the same setup. Backtests are necessary but not sufficient.

It does not tell you whether the strategy survives a regime change. Most rules tested on 2024–2025 data have not been tested against, say, the 2008 environment. A rule that works in normal-volatility conditions can blow up in tail-event conditions.

It does not replace the discipline of the 8-week paper program before going live. The 1,000-trade backtest is a sanity check on whether the rule is plausible; the live paper program is the test of whether the agent operates the rule the way you want.

How to do this exercise yourself

Pick a rule, get historical data for the contracts that rule would have traded, simulate fills with realistic slippage assumptions, and aggregate. The arithmetic is not hard. The discipline is in being honest about the slippage assumptions and in resisting the urge to "tune" the rule until the backtest looks good — which is overfitting, and the resulting rule will not work live.

Inside TraderBear we ship a backtest path that walks bar-by-bar through the historical book for the contract, simulating fills against the real spread at each timestamp. That's the version we'd recommend over the simpler "assume you got the midpoint" approach you'll see in many quick scripts. If your tooling doesn't simulate the spread, your backtest numbers are systematically too good.

Run a backtest on your own rule.

TraderBear ships a backtest engine that simulates the actual book, not idealized fills. Adopt a bear, write a rule, run a thousand trades against it before you ever go to paper-live.

Adopt a bear →