The Multi-Source Historical Price Pipeline
Date: 2026-05-25 Author: OLTA Research Abstract: Cycle-tested research requires daily price history that spans multiple regime turns. A single-venue history sourced from Binance Spot reaches back to 2017 for the largest crypto majors and to 2018-2020 for everything else. That is one full cycle of usable data per asset, and zero usable data on the pre-Binance era when crypto first traded against fiat at meaningful liquidity. This paper documents the methodology behind OLTA's multi-source price cascade, the splice rule that arbitrates between venues at the listing boundary, and the multi-anchor convention OLTA uses to report Bitcoin's risk-adjusted profile.
1. Why a single venue is insufficient
Binance Spot is the deepest liquidity venue in crypto today. Its kline endpoint serves clean daily OHLC for 200+ symbols against USDT, going back to each symbol's USDT-pair listing date. For most institutional crypto research desks, the Binance series is the de facto canonical history.
The constraint is the listing date. BTC's first Binance USDT trading day is 2017-08-17. ETH's is the same. Most altcoin majors list later: LTC in late 2017, XRP and ADA in 2018, BCH in late 2019. A research desk anchored entirely to Binance Spot is therefore working with eight years of daily data for the two biggest names, six years for the next layer, and three to four years for the long tail.
Eight years covers 2017 mania, 2018 crash, 2019 base, 2020-2021 bull, 2022 crash, 2023-2024 recovery, 2025 sideways. That is sufficient for many research questions and insufficient for a few specific ones. The questions Binance-only history cannot answer are exactly the ones that matter most for institutional positioning:
- What is Bitcoin's risk-adjusted profile across multiple complete cycles, not just the most recent one?
- For altcoin majors with pre-Binance trading volume, what does their multi-cycle Sharpe look like on a window that includes the cycle they were born in?
- How does the cycle structure of the deep history affect the published headline Sharpe for Bitcoin and the cycle-tested basket that sits alongside it?
The honest answer to each of these questions requires data older than the Binance Spot record provides. OLTA addressed the gap by building a multi-source cascade that extends the deep-history window for every symbol where a pre-Binance public history exists.
2. The cascade approach
The principle is straightforward. For each symbol in the catalogue, OLTA consults a sequence of public daily-OHLC sources, beginning with the deepest available history and ending with the most liquid recent venue. The earlier sources contribute the pre-Binance era; the recent venue owns from its listing date forward.
Three sources participate in the cascade:
- A deep-history aggregator publishes daily candles spanning the full available record for each major. For Bitcoin this reaches inception in 2010. For the next-tier majors it reaches the 2013-2017 window where they first traded at meaningful liquidity.
- Binance Spot daily klines own the recent segment from each symbol's USDT-pair listing forward. Binance is the liquidity venue OLTA actually trades on; its print is the price an allocator would have realised.
- A secondary public OHLC venue serves as a tertiary fallback for symbols the deep-history feed does not cover.
The cascade order is intentional. Deeper history loses precision as the source recedes from the present; recent venue data has narrower spread and matches execution. The cascade therefore biases toward depth where depth is the only option and toward execution venue where it is available.
3. The splice rule
When two sources cover the same calendar window, OLTA must decide which close to keep. The splice rule arbitrates this question, and it is the single most important element of the cascade methodology because it determines the multi-cycle Sharpe an analyst will compute.
The rule is direction-aware. The cascade splices at the symbol's Binance listing date. Before that boundary the deep-history feed wins. From that boundary forward Binance wins. The transition is one trading day.
A drift tolerance protects against splice artefacts. At the listing boundary the engine compares the deep-history feed's last close to Binance's first close. If the two prices differ by more than a banded threshold (in the low single-digit percent range), the splice is logged for traceability and the segment chain notes the divergence. The principle is that within-tolerance splices proceed silently and out-of-tolerance splices are explicitly tagged.
Roughly five of the twelve majors OLTA extended through the deep-history feed showed drift above the tolerance band at the splice boundary. The pattern is consistent. A Binance USDT pair launch in 2017-2019 routinely landed at a price that differed several percent from the broader market's prior-day close. The cause is venue-specific liquidity at the launch instant: the pair opened thin, often within hours of a major price move on other venues, and the open print reflected the first few Binance trades rather than the consolidated market. Within 48 hours the spread closed.
The splice rule documents these events without using them. Binance still owns the recent segment; the deep-history feed still owns the pre-listing segment. The tag exists so a reader who wants to know about the splice gap can find it. Removing the tagged days, smoothing across the boundary, or substituting one source for the other would each introduce a different artefact; the cascade design accepts the documented divergence rather than synthesise around it.
4. The Bitcoin reference and the multi-anchor convention
Bitcoin's Sharpe ratio is the single most-cited risk-adjusted return number in the asset class. Public research desks publish a figure that ranges roughly from 0.6 to 1.7 depending on the window. The dispersion is not a data-quality problem. It is a structural feature of Bitcoin's return history.
OLTA's deep-history pipeline yields a Bitcoin daily series from mid-2010 to the present. Computed straight off that series with zero risk-free rate and the institutional 252 trading-day volatility convention, Bitcoin's full-history Sharpe ranges in the 1.5 to 1.8 band. Computed off a 2014-anchored window, which several public references cite, the Sharpe sits in the 0.5 to 0.7 band. Computed off a Binance-era anchor at 2017, the Sharpe sits in the 0.6 to 0.8 band. Computed off a rolling 5y window, which several other public references cite, the Sharpe sits in the 0.8 to 0.9 band. Each of these is correct. Each measures a different question.
The OLTA convention is to disclose all of them. The /research surface renders a four-anchor table:
- An inception-anchored figure that uses every daily close from Bitcoin's first traded day in 2010.
- A post-mania figure that drops the early 2010-2013 window and anchors on 2014, matching the convention used by several institutional research desks.
- A Binance-era figure anchored on Bitcoin's first Binance USDT trading day in 2017, matching the window that pairs apples-to-apples with the basket Sharpes published in the deep-dive papers.
- A rolling 5y figure anchored on 2020, matching the rolling-5y convention used by several large institutional crypto research desks.
The published headline on every public surface (the ticker strip, the /research surface header, the methodology paper) uses the rolling 5y anchor. Basket-versus-BTC comparisons on individual /index pages use the per-basket window so the comparison is apples-to-apples. The multi-anchor table sits alongside so any reader can see how the headline depends on the convention.
This is not a methodological retreat. It is the institutional norm for any asset whose return history crosses multiple distinct regimes. Equity research desks face the same question for emerging-markets indices and for the S&P 500 itself. The convention is to disclose the anchor and publish multiple anchors so the reader can pick the comparable window for their own analysis.
5. Window alignment for basket-versus-benchmark comparisons
The cascade extends Bitcoin's history to 2010 and many altcoin majors' histories to 2014-2017. A basket whose constituents include a token that listed in 2022 still has its backtest window bounded by that token's listing date. The basket cannot have a 2018-2026 backtest if one of its components first traded in 2022.
The engine aligns each basket's backtest window to the intersection of its constituents' availability. The basket's start date is the maximum of the constituent first-trade dates; the basket's end date is the minimum of the constituent last-trade dates. Forward fill handles isolated missing bars within the window. No extrapolation extends the window beyond the intersection.
The basket-versus-BTC comparison must respect this alignment. If a basket's window runs from 2022-03 to 2026-05, the BTC reference used for the comparison must also run from 2022-03 to 2026-05. Using BTC's full 2010-2026 record for that comparison would be a category error: the displayed delta would compound the basket's recent window against Bitcoin's multi-cycle 15.86-year return and produce a number with no defensible meaning.
This sounds like an obvious point and in principle it is. In practice it was the source of a regression caught during a verification pass on the May 2026 catalogue refresh. The fix lands in two places. The analyze pipeline writes a per-basket BTC reference on each backtest entry, computed on the basket's own timestamps. The display logic prefers the per-basket reference and falls back to the global reference only when no per-basket data is present. The global reference itself is sliced to a rolling 5y window at the source so any fallback path produces a sensible comparison.
The methodological principle is general. Any time a basket and a benchmark are compared, both must run on the same calendar. A basket-versus-benchmark comparison that is not window-aligned is not a comparison. The pipeline now enforces this on every published display.
6. What is published, what is not
The /research surface publishes the methodology framework: the cascade structure, the splice rule, the multi-anchor convention, the window alignment principle. The figures published alongside (the four anchor Sharpe values, the per-basket window comparisons, the headline rolling 5y figure) are computed off the same daily file the OLTA backtest pipeline consumes.
The methodology brief covers the exact splice tolerance threshold, the per-symbol cascade order including which symbols invoked a tertiary fallback, the historical-feed rate-limit budget, the segment-chain metadata format, the per-symbol extension report, and the cadence at which the cascade re-runs. The brief is available to institutional counterparties and grant reviewers under standard research-distribution terms.
The institutional research convention is to publish the framework and reserve the recipe. OLTA follows that convention. A desk that wants to reproduce the published numbers from the same public sources can do so; a desk that wants to lift the cascade verbatim cannot, because the operational layer is not disclosed. This is the posture the major index providers use for their own methodology, and the posture institutional allocators expect from a serious research desk.
7. Caveats
- The deep-history feed contributes daily candles only. Intraday data, order-book information, and venue-specific trade history are not part of the cascade.
- The cascade extends history for crypto symbols. The equity leg of the cross-asset baskets runs off the conventional daily-adjusted equity series.
- The splice rule preserves the deep-history segment for the pre-listing window without smoothing the boundary. Logged splice gaps are documented in the methodology brief; they are not synthesised away.
- The headline Sharpe figure depends on the anchor. The 5y rolling figure is the institutional default; the inception anchor sits structurally higher because it captures Bitcoin's 2010-2013 early-cycle phase, and the 2014 anchor sits structurally lower because it captures the back half of a complete cycle followed by an extended base. The multi-anchor table is the disclosure of this dispersion.
- The cascade re-runs on a documented cadence. Between refreshes the published file is stable; after each refresh the published Sharpe and basket comparisons regenerate against the updated file.
The framework is publishable. The operational detail that converts the framework into a running pipeline is the kind of artefact OLTA reserves for the methodology brief. Both layers are documented internally and the framework layer is documented here. The methodology brief documents the per-basket parameters and is available on request.
OLTA Research Desk · 2026-05-25