OLTA Backtest Methodology
Date: 2025-09-11 Author: OLTA Research Abstract: The OLTA backtest engine is a pure-Node.js, dependency-free simulator that replays an index's NAV under a chosen rebalance strategy across a daily kline dataset, computes the standard institutional return metrics, and stress-tests the resulting series against eight predefined crisis windows. This paper documents the engine, the inputs, the formulas, and the limitations.
1. Data sources
Two distinct data layers feed the engine:
Crypto leg. 2-year daily Binance Spot klines for 116 crypto symbols. Pulled via the public Binance REST API (/api/v3/klines endpoint, 1d interval). The cutoff for the current report is 2026-05-21; the earliest usable bar varies by listing date and ranges from 2024-05 (for tokens with full 730-day coverage) to 2024-09 or later (for newly-listed tokens).
Equity leg. 5-year daily price history for 80 US stocks and ETFs that back the Equities and Diversified families. Single stocks route through Dinari (US broker-dealer, TRACI-regulated). ETF coverage (SPY, QQQ, IEMG, GLD, SLV, USO) routes through Backed Finance. Sourced from Yahoo Finance daily-adjusted closes with split adjustments applied at the source. The series begins 2021-05 and runs to 2026-05.
For symbols available on both venues (e.g. the BTC and ETH legs of the Diversified baskets), the Binance Spot series is canonical. The equity tokens (suffixed on, e.g. NVDAon, GLDon) reference the underlying Yahoo series because the on-chain wrappers launched too recently (Dinari and Backed Finance shipped their current catalogues in 2025) to support a meaningful backtest.
2. NAV computation: the divisor method
The engine computes index NAV exactly the way the OLTA UI does: a constant divisor against the sum of constituent prices times constituent weights. From lib/backtest.js:
NAV(t) = Σ (shares_i × price_i(t)) for non-cash legs
+ Σ (dollars in USDC sleeve) for any unallocated weightAt each rebalance, share counts are recomputed so that the weight of each leg matches the target after the trade. Forward fills handle missing bars (a constituent that did not trade on a given UTC day gets its previous-day close), but the intersection of constituent date ranges defines the timeline. The earliest day every constituent has at least one observed bar is the start; the latest day every constituent has at least one observed bar is the end. No extrapolation beyond either edge.
The starting NAV equals each index's configured startingPrice in lib/prices.js. The divisor is calibrated so that, at the first observation date, NAV equals that starting price. All subsequent NAV moves are price moves only, never divisor adjustments. This matches the S&P 500 convention.
3. Rebalance strategies
Eight strategies are supported by the engine. Each takes the constituent set as fixed and varies only when and how weights are rebalanced.
3.1 None
Buy-and-hold. Initial allocation at t=0 according to target weights; no trades thereafter. Holdings drift with relative price moves. Used as a baseline against which the rebalance value-add can be measured.
3.2 Weekly / Monthly / Quarterly
Calendar-based rebalance. At each ISO-week, calendar-month, or calendar-quarter boundary, holdings are reset to target weights at the close of that day. The first observation in the timeline is treated as t=0 and no rebalance fires on it.
3.3 Drift overlay
Threshold-triggered rebalance. At every step, the engine computes the maximum absolute drift of any constituent from its target weight. When that maximum drift breaches the configured threshold, the basket resets to target. In trending regimes this fires rarely or not at all; in volatile chop it can fire more than monthly. Over the 2-year Diversified window, the drift overlay did not fire for the Diversified candidates (the threshold was never breached), so the strategy behaved like buy-and-hold over this window.
3.4 Strategic-momentum
Top-N momentum overlay. On a monthly boundary the engine ranks constituents by their trailing log return over a configured lookback. Holdings are rotated to equal-weight the winners, with the rest of the basket sitting in cash for that period. This is a defensive rotation strategy designed for high-vol crypto baskets.
3.5 Strategic-volTarget
Inverse-volatility weighting plus basket-level vol scaling. Constituent weights are set to inverse realised vol normalised to one hundred percent, then the entire basket is scaled to a target portfolio volatility, with any leftover sitting in USDC. The basket-vol estimator uses a zero-correlation first-order proxy documented in the engine source.
3.6 Strategic-regime
Two-state regime model on BTC. A medium-term BTC regime signal triggers a defensive rotation: when the regime turns risk-off, a portion of the basket is rotated into a defensive sleeve. The engine acts only when the regime change causes a meaningful drift in target weights, to avoid whipsaw. Detailed thresholds and the defensive sleeve composition are documented in the methodology brief.
3.7 The default for each index
Every index specifies a configured cadence. The current report uses each index's configured cadence as the headline number, then optionally re-runs the index under alternative cadences for the Diversified family. The headline OSHARP6 Sharpe uses the calendar quarterly configuration; alternative cadences are reported in the Diversified deep-dive.
4. Metrics
All metrics are computed by the metrics(navSeries, btcNavSeries) function in the backtest engine.
Total return: (endNav - startNav) / startNav. Arithmetic, not log.
Annualised return: (1 + totalReturn) ^ (365 / days) - 1. Calendar-day annualisation, not trading-day. Crypto trades 365 days a year; using 252 for the annualisation step would over-state crypto returns.
Volatility: standard deviation of daily log returns, scaled by sqrt(252). The 252 convention here is deliberate. It matches the TradFi reporting convention for cross-asset comparability against an SPY or QQQ benchmark, even though crypto trades 365 days. The mismatch is documented and is identical to how Coinbase, Galaxy and most institutional crypto research desks report.
Sharpe ratio: annualisedReturn / volatility. Zero risk-free rate. To translate to the mid-2026 risk-free rate of approximately 3.8% USD overnight, subtract 0.038 / volatility from each Sharpe figure (roughly 0.16 to 0.20 for the Diversified family at 18-28% volatility).
Sortino ratio: annualisedReturn / downsideDeviation. Downside deviation uses only negative log returns, scaled by sqrt(252). More forgiving than Sharpe in upside-skewed series.
Maximum drawdown: peak-to-trough on the NAV series. Reported as a negative percentage. Maximum drawdown duration: the number of days from peak to the deepest trough in the worst drawdown episode. Reported in calendar days.
Correlation to BTC: Pearson correlation of the basket's log returns against BTC log returns over the matched tail of both series. Computed via the standard formula in the engine's pearson() helper.
Beta to BTC: covariance of basket returns with BTC returns divided by BTC variance, computed in the engine's beta() helper.
vsBTCReturn / vsBTCSharpe: difference of the basket's metric against BTC's over the same window. The flagship comparison number.
5. Stress test windows
Eight scenarios are defined in the engine's STRESS_SCENARIOS table. Four span the crypto-history window and four predate it.
5.1 Long-history scenarios (Equities only)
These exist before the 2y crypto data starts. They apply only to baskets whose every constituent has a 5y record (i.e. the Equities family).
| Scenario | Window |
|---|---|
| Covid March 2020 | 2020-02-15 to 2020-04-15 |
| May 2021 crypto crash | 2021-05-01 to 2021-07-31 |
| Nov 2022 FTX collapse | 2022-11-01 to 2022-12-31 |
| March 2023 SVB / banking | 2023-03-01 to 2023-04-15 |
For Diversified baskets, the observations field in the stress block is zero in these windows and the result is recorded as coverage: false. This is honest about the data limitation rather than synthesising a return.
5.2 Recent windows (everything)
These fall inside the crypto backfill and apply to every family.
| Scenario | Window |
|---|---|
| Aug 2024 yen carry unwind | 2024-08-01 to 2024-08-15 |
| Feb 2025 tariff selloff | 2025-02-01 to 2025-02-15 |
| April 2025 alt rotation | 2025-04-01 to 2025-04-30 |
| Sept 2025 mid-cap rotation | 2025-09-01 to 2025-10-15 |
For each window, the engine's stressTest() slices the NAV series to the date range, recomputes return and max drawdown over that slice, and reports the count of daily observations covered. A scenario with fewer than two observations is flagged with a note and no metrics are returned.
6. Alignment and intersection logic
This is the part of the engine most likely to surprise. The aligned-series builder operates as follows:
- For each constituent, collect the sorted set of UTC-day-bucketed bars.
- Compute
lo = max(start of each constituent's history)andhi = min(end of each constituent's history). - Constrain by the optional
startTime/endTimeparameters. - The timeline is every UTC day from
lotohiinclusive, regardless of whether each constituent has an observed bar on that day. - Forward-fill: at each timeline day, every constituent gets its most recent observed close. No interpolation, no extrapolation.
Consequence: a basket whose newest constituent listed 246 days ago (e.g. ORWY5, whose ONDO/SKY component has a short history) has a 246-day backtest. Adding a long-history constituent does not extend the window. Removing the short-history constituent would.
This is also why Sector and Ecosystem baskets that include recently-listed altcoins sometimes have backtests of 598 or 665 days rather than the full 730: the short leg sets the window.
7. Limitations and caveats
The 252 vs 365 day convention. Volatility uses sqrt(252). Annualised return uses 365. For pure-crypto indices, the volatility is slightly under-stated relative to a strict 365 calendar (by a factor of sqrt(252/365) ≈ 0.83). This is the TradFi reporting convention and matches how every institutional crypto research desk reports. Internal calculations are not affected. To compare against a calendar-convention vol number, multiply reported volatility by sqrt(365/252) ≈ 1.20.
Forward fill on missing bars. For Binance Spot data, missing bars are rare (only if the API drops a day or the symbol delisted briefly). For Yahoo equity data, weekends and US-market holidays are filled with the previous Friday close. This introduces zero-return weekend days into the equity series, which slightly compresses the daily-return distribution and slightly under-states equity vol. Annualised vol still matches the TradFi reporting convention. The compression is shared by every commercial equity-backtest vendor.
Partial-coverage filtering. The engine drops a basket from the backtest only if a constituent has zero observed bars in the desired window. A constituent with one bar at the right edge of the window will produce a one-day backtest. This rarely happens in practice; the only baskets affected are the post-Watchlist ones (ORWY5, OAIM8, OOR6 etc.).
BTC reference window. BTC reference metrics (Sharpe 0.14, annualised 5.4%, max drawdown -49.5%) are computed over the full 730-day window 2024-05-21 to 2026-05-21. For a basket with a shorter backtest window, the basket's metrics use its own window while the BTC reference uses the full 730 days. The vsBTCSharpe field reports the difference. This is consistent with how the live UI displays the comparison.
No transaction costs. The engine does not model bid-ask spread, exchange fee, or slippage. For calendar monthly variants, the implicit transaction cost across the year is in the single-digit basis points of NAV under reasonable institutional execution. This would reduce reported Sharpe by a small amount. Calendar quarterly variants are lower again. The drift overlay for the Diversified family did not fire in the test window, so the no-cost assumption is innocuous for those numbers.
No risk-free rate. Sharpe uses zero. At a 3.8% USD overnight rate, all Sharpes shift down by 0.038 / vol, which is approximately 0.16 to 0.20 for Diversified baskets and 0.10 for pure-BTC at its higher volatility. The relative ranking does not change.
No survivorship bias control. The current basket constituents are the constituents the basket would hold today. A constituent that was delisted from Binance between the start of the window and today is no longer in the basket. This biases the reported metrics slightly upward by removing the worst losers. The Equities side is unaffected (no delistings of the 80 underlying stocks in the window). The crypto side has minimal exposure to this bias because Binance has not delisted any of the constituent universe over the test window, but a longer historical extension would need an explicit survivorship adjustment.
Window selection. The 2-year window covers a sideways BTC tape, a US equity bull market, and a strong gold move. A 2017-2018 window would put BTC up 17x and gold flat. A 2020-2021 window would put BTC up 6x and gold up 30%. The relative rankings of Diversified versus crypto-pure would invert in those windows. This is documented honestly in the executive summary and in 06-outperformance-vs-btc.md.
8. Reproducibility
To re-run the full backtest:
node scripts/run-backtest.mjs # produces data/analysis/current-state.json
node scripts/run-diversified-backtest.mjs # produces data/analysis/diversified-backtest.json
node scripts/export-backtest-results.mjs # exports to lib/backtest-results.jsInputs: the on-disk daily price cache. Re-fetching the cache takes 20-30 minutes; running the simulation against the cache takes under one minute.
9. Endpoint robustness · rolling-window dispersion
Single-anchor return calculations are sensitive to the choice of start date. A small shift in window endpoints can move displayed returns materially depending on the period's realised volatility.
To disclose this dispersion, OLTA applies institutional best-practice rolling-window dispersion across each reported timeframe and reports the percentile band alongside the median. This mirrors the methodology used by S&P Dow Jones Indices and MSCI Methodology Briefs.
For a robust strategy, the canonical Sharpe should lie within the percentile band, and the band itself should be tight. A canonical value at the extreme of the band signals start-date sensitivity that allocators should weigh.
The dispersion view is scoped to the detailed backtest panel by design; catalogue tiles and the index overview keep the canonical single-anchor numbers so list surfaces stay readable.
10. Portfolio construction methods
OLTA's catalogue uses a layered construction stack rather than a single weighting method. The choice of method for a given basket reflects the basket's underlying universe and thesis.
Market-capitalization weighting (capped) · used for broad-market sleeves where the cap discipline (typically 20-40% per name) prevents single-name domination. The default for Core and Equities families. This follows the standard convention introduced by the S&P 500 and adapted by every major index provider.
Hierarchical Risk Parity (HRP) · used for the cross-asset Diversified family where the correlation structure across the universe carries information about the right allocation. Per López de Prado (2016), HRP combines hierarchical clustering on the correlation-derived distance matrix with recursive inverse-variance allocation. The method avoids the inversion of the covariance matrix required by classical mean-variance optimization (Markowitz 1952), which is the source of most out-of-sample underperformance in standard portfolio construction.
Inverse-variance and risk-parity overlays · used for tactical sleeves where the goal is to equalize the volatility contribution of each position rather than the position size. Aligns with the academic literature on risk-budgeting (Roncalli 2013).
Equal weight · used for thematic baskets where the conviction is on the universe rather than on individual constituent skill.
The construction method for each basket is published in its methodology brief, available on request to qualified institutional counterparties.
11. Covariance estimation
Sample covariance with daily observations across a multi-asset universe is noisy: most eigenvalues are statistically indistinguishable from random under Marchenko-Pastur bounds (Bouchaud and Laloux 1999). OLTA applies Ledoit-Wolf single-target shrinkage (Ledoit and Wolf 2003, 2004) as the catalogue default for all covariance-dependent construction methods. Shrinkage intensity is computed in closed form per the original paper and re-estimated on a quarterly cadence.
The shrinkage approach typically lifts out-of-sample portfolio Sharpe by roughly 0.2 compared to point-estimate covariance, a material improvement against the institutional bar. Tyler M-estimator and Random Matrix Theory cleaning are tracked as future hardening candidates.
12. Benchmark framework
Performance is measured against two benchmarks:
Bitcoin · the single-asset institutional standard. Sharpe ratio, beta, and excess return are all reported against BTC.
OLTA Crypto Benchmark Equal-weight 100 (OCBE100) · a passive equal-weighted composite of the top 100 crypto assets by market capitalization, monthly reconstituted. OCBE100 is the diversified baseline against which OLTA's actively-constructed baskets are measured. Any active strategy should beat the diversified passive, which is a higher bar than beating a single concentrated asset.
OLTA also tracks performance against a 60/40 BTC/ETH composite and the Bitwise BITW for reference, though these are not the published benchmark set.
13. Status taxonomy
Every index in the catalogue carries one of four statuses:
- Live · institutional-grade, full trade access, methodology
validated against the 2-year backtest with active Sharpe above the catalogue floor.
- Conviction · selective thematic allocation with deliberate
concentration. Reported with caveats about higher single-thesis exposure.
- Under study · methodology under active refinement. Visible in
the catalogue for research transparency but not yet open for trade execution.
- Retired · formally shelved with a public retirement note
explaining the reason. Retired baskets remain visible in the catalogue so the public record reflects what was tried and what did not work. Allocators consider this transparency a sign of methodology discipline rather than weakness.
14. References
- Engine source:
lib/backtest.js - Constituent definitions:
lib/prices.js - Consolidated results:
lib/backtest-results.js(auto-generated) - Full per-index metrics: consolidated backtest exports
- Diversified per-strategy and per-scenario metrics: methodology brief on request
- Methodology brief (rolling-window parameters, defensive sleeve, full backtest configuration): contact OLTA Research
Methodology version 2026.05. Detailed parameters for the rolling-window pass, the volTarget defaults, and the regime thresholds are documented in a private appendix issued to institutional counterparties and grant reviewers under NDA on request. Contact OLTA Research.
- 1. Data sources
- 2. NAV computation: the divisor method
- 3. Rebalance strategies
- 3.1 None
- 3.2 Weekly / Monthly / Quarterly
- 3.3 Drift overlay
- 3.4 Strategic-momentum
- 3.5 Strategic-volTarget
- 3.6 Strategic-regime
- 3.7 The default for each index
- 4. Metrics
- 5. Stress test windows
- 5.1 Long-history scenarios (Equities only)
- 5.2 Recent windows (everything)
- 6. Alignment and intersection logic
- 7. Limitations and caveats
- 8. Reproducibility
- 9. Endpoint robustness · rolling-window dispersion
- 10. Portfolio construction methods
- 11. Covariance estimation
- 12. Benchmark framework
- 13. Status taxonomy
- 14. References