Backtest that doesn't lie
Walkforward, not cherry-picked window. Per-bar funding rate. Orderbook-based slippage. Brier score on every model in the ensemble. What you see is close to what you get live.
Why typical backtests deceive
The classic retail-backtest trap is overfitting to one period. "Parameters from 2023 gave +400% on BTC" — yes, because they were tuned to exactly that period. On out-of-sample they often net zero or a loss. Not a bug — math: with enough parameter sweeps you can always retrospectively profit any strategy.
Second trap is ignoring funding-rate. On HL perps funding can eat 5–12 bps per 60 days on a long. Backtests that miss this show "paper" profits that turn negative live.
How we do it
Walkforward
In-sample window (60 days) → parameter selection → out-of-sample (14 days) → forward-roll. The graph shows only out-of-sample — what the strategy did on "unseen" data.
Per-bar funding rate
Funding recomputed every 8h window and subtracted from open-position PnL. Not "average over period" — literal 8h windows.
Orderbook slippage
Large orders eat the book. We use historical HL L2 snapshots: at 10K USD slippage is one thing, at 100K — another. Not flat-percent.
Per-model Brier
Each ensemble model has a Brier calibration score on a rolling 14-day window. If calibration drifts above 0.25, the model is temporarily disabled until it recovers.
Equity curve and max DD
Not just Sharpe. We show equity curve, max drawdown, Calmar (return ÷ max DD), Sortino. Full risk profile.
Real HL data
180+ days L2 orderbook snapshots, funding rates, mark prices from Hyperliquid itself. Not Binance, not "ETH proxy" — literally HL.
What we show per vault
Out-of-sample equity curve
Rolling 180-day window. What the strategy earned on unseen data — data not used to pick parameters.
Sharpe, Sortino, Calmar
Three risk-return ratios. Sharpe penalizes any volatility, Sortino — only downside, Calmar looks at max drawdown.
PnL distribution
Histogram of per-trade profit. Visible that "one best day doesn't carry everything" — instead, hundreds of small wins.
Per-model Brier
Each ensemble model's calibration over time. If one breaks — instantly visible, no live surprises.