Execution-cost aware backtests for reinforcement learning strategies

Execution · April 2026

Execution-cost aware backtests for reinforcement learning strategies

Backtests become dangerous when they score decisions at prices the live system cannot actually receive. This is especially true for reinforcement-learning policies, where small reward differences can steer the model toward behavior that looks clever in simulation and expensive in production.

Our preferred evaluation path prices every simulated trade through an execution-cost layer. Spread, depth, latency, and order type are modeled before the reward is recorded. This makes the backtest less flattering but more useful.

The same principle applies to exits. A stop that appears clean on candle data may be filled after a gap, and a take-profit may be skipped if queue position is poor. Strategy comparison should include these frictions consistently across candidates.

When cost-aware scoring is in place, fewer models survive research review. That is the point. The remaining models are more likely to represent durable behavior rather than accounting artifacts.