This page documents the active local-first vendor paths. Both public vendor
adapters are Polymarket book adapters: they produce OrderBookDeltas for L2
book state and the replay adapter interleaves real TradeTick records for
execution.
PMXT
PMXT is the hourly raw archive path for Polymarket L2 order-book data.
The preferred workflow is raw-first:
- Mirror raw PMXT archive hours onto local disk.
- Point runners at those raw hours with
local:/.... - Let public archive sources fill gaps when the local mirror is incomplete.
- Let the filtered PMXT cache make repeated market/token/hour slices fast.
Runner Source Modes
Public PMXT runners select sources directly in their inline MarketDataConfig:
MarketDataConfig(
platform=Polymarket,
data_type=Book,
vendor=PMXT,
sources=(
"local:/Volumes/storage/pmxt_data",
"archive:r2v2.pmxt.dev",
"archive:r2.pmxt.dev",
),
)
Lookup order:
- Local filtered cache at
~/.cache/nautilus_trader/pmxt. - Explicit raw sources in
MarketDataConfig.sources, left to right. - Confirmed miss.
MarketDataConfig.sources is intentionally strict: use only local: and
archive: for PMXT. Bare hosts, bare paths, and legacy aliases are rejected.
Lower-Level Loader Env Vars
Runner files should carry their source priority inline. These lower-level env vars remain available for custom integrations:
PMXT_LOCAL_RAWS_DIRPMXT_RAW_ROOTPMXT_REMOTE_BASE_URLPMXT_CACHE_DIRPMXT_DISABLE_CACHEPMXT_PREFETCH_WORKERSPMXT_CACHE_PREFETCH_WORKERSPMXT_ROW_GROUP_SCAN_WORKERS
What Works Today
The public PMXT path loads one market/token/hour from raw archives and converts
those rows into Nautilus OrderBookDeltas.
The loader decodes:
book_snapshotas a fresh full book snapshot.price_changeas an incremental price-level update.
If an hour is missing, the loader warns and resets local book state. It does
not apply later incremental price_change rows across a missing-hour gap until
a fresh book_snapshot rebuilds the book.
To mirror raw archive hours locally:
make download-pmxt-raws DESTINATION=/path/to/pmxt_raws
The downloader walks direct hourly filenames from 2026-02-21T16:00:00Z
through the current floored UTC hour newest-first. It probes r2v2.pmxt.dev
and r2.pmxt.dev, chooses the larger archive object when both exist, and
writes the same archive filename under a dated local path.
The downloader is incremental. Without --overwrite, an existing local hour is
treated as complete and skipped before any network transfer is attempted. This
keeps reruns safe for large mirrors and prevents accidental replacement of
already-downloaded raws.
Supported Local File Layout
The filtered cache lives at:
~/.cache/nautilus_trader/pmxt
Override it with:
PMXT_CACHE_DIR=/custom/path
Disable it with either:
PMXT_CACHE_DIR=0
PMXT_DISABLE_CACHE=1
For local raw PMXT archive hours, the loader accepts:
<raw_root>/polymarket_orderbook_YYYY-MM-DDTHH.parquet
<raw_root>/YYYY/MM/DD/polymarket_orderbook_YYYY-MM-DDTHH.parquet
Pin it in a runner with:
sources=("local:/data/pmxt/raw",)
Required Parquet Columns
Raw PMXT archive parquet may use the legacy payload schema:
market_idupdate_typedata
or the fixed-column schema:
timestampmarketevent_typeasset_idbidsaskspricesizeside
For the legacy schema, the loader filters raw hours to market_id at parquet
scan time, then filters the remaining rows to token_id inside the JSON
payload. For the fixed-column schema, it filters decode(market) and
asset_id, then sends the selected columns directly to the Rust PMXT converter.
PMXT_PREFETCH_WORKERS controls how many archive hours are read ahead while a
single market window is loading. The repo data-source wrapper defaults local
raw mirrors to 6 workers. Multi-replay PMXT loading also groups filtered-cache
misses by raw hour, so a basket that needs the same hourly parquet for many
market/token requests scans that raw hour once and splits the filtered Arrow
batches per replay. BACKTEST_REPLAY_MATERIALIZE_WORKERS separately caps the
memory-heavy conversion from filtered/cache data into Nautilus replay objects,
so source-stage workers can be raised without materializing too many full
replays at once.
Legacy JSON Payload Shape
For book_snapshot, the loader expects the data JSON to include:
{
"update_type": "book_snapshot",
"market_id": "0x...",
"token_id": "123...",
"timestamp": 1710000000.123,
"bids": [["0.45", "100.0"]],
"asks": [["0.47", "120.0"]]
}
For price_change, the loader expects:
{
"update_type": "price_change",
"market_id": "0x...",
"token_id": "123...",
"timestamp": 1710000001.456,
"change_price": "0.46",
"change_size": "25.0",
"change_side": "buy"
}
Prices and sizes are preserved as decimal strings until the Nautilus instrument constructs typed prices and quantities.
Telonex
Telonex is a Polymarket full-book snapshot vendor path. Public Telonex runners
use data_type=Book, vendor=Telonex, and the book_snapshot_full channel.
Execution trade ticks are loaded from Telonex materialized cache first, then the
configured Telonex sources in order. Within each Telonex source, the loader
tries onchain_fills before trades; Polymarket's public trade API is only the
final fallback. Public runners list api:${TELONEX_API_KEY} first, then the
standard local mirror fallback. Empty Telonex onchain-fill days are not treated
as proof that no execution prints exist; the loader keeps falling through to
Telonex trades and then Polymarket before returning a zero-trade day.
Telonex source syntax:
local:/path/to/telonexreads the local blob mirror.api:useshttps://api.telonex.iowithTELONEX_API_KEY.api:https://host.examplepoints at a compatible custom base URL.
The API path reads the key from TELONEX_API_KEY unless a private runner source
provides an explicit api:<key> value. Do not commit private keys.
Telonex caches are stored by default at:
~/.cache/nautilus_trader/telonex
Each cached API day has two forms:
<YYYY-MM-DD>.parquet: the raw nested Telonex API payload.<YYYY-MM-DD>.fast.parquet: a flat list-string sidecar optimized for replay reads.
The fast sidecar preserves price and size strings while avoiding expensive pandas materialization of nested list-of-struct columns. If a raw cache file is encountered without a sidecar, the loader migrates it lazily.
Replay conversion has a separate materialized cache under:
~/.cache/nautilus_trader/telonex/book-deltas-v1
~/.cache/nautilus_trader/telonex/trade-ticks-v1
Those caches store Nautilus OrderBookDeltas after full-book snapshots have
been converted and non-empty Nautilus TradeTicks after Telonex trade rows
have been converted. They are keyed by exchange, channel, market slug, outcome,
instrument id, day, and clipped replay window. Warm runs report
telonex deltas cache, telonex onchain_fills cache, or
telonex trades cache and skip local/API decoding entirely. Execution
trade-tick progress labels include the exact Telonex channel, for example
telonex local onchain_fills or telonex local trades.
Clear Telonex API and materialized replay caches with:
make clear-telonex-cache
Do not point TELONEX_CACHE_ROOT at the local mirror. The clear target refuses
configured local data stores and parents containing those stores.
Recommended local mirror root:
/Volumes/storage/telonex_data/
telonex.duckdb
data/
channel=book_snapshot_full/
year=2026/
month=04/
part-000001.parquet
The DuckDB manifest records completed and empty market/outcome/channel/day jobs. The loader uses it to select only readable parquet parts for the requested market, outcome, channel, and date range. If the manifest is missing or invalid, the loader falls back to legacy path scans.
Download Local Telonex Files
Small window:
TELONEX_API_KEY=... make download-telonex-data TELONEX_DOWNLOAD_FLAGS='\
--market-slug us-recession-by-end-of-2026 \
--outcome-id 0 \
--channels book_snapshot_full onchain_fills trades \
--start-date 2026-01-19 \
--end-date 2026-02-01'
Full Polymarket mirror:
uv run python scripts/telonex_download_data.py \
--destination /Volumes/storage/telonex_data \
--all-markets \
--channels book_snapshot_full onchain_fills trades
For a bounded smoke test of the all-market path, add --max-days 100; the cap
is applied after manifest resume pruning.
book_snapshot_full is the canonical book source. onchain_fills is the
preferred execution-tick source for Telonex book replay, and trades fills in
days where onchain-fill parquet is absent or empty. Do not download
book_snapshot_5 and book_snapshot_25 unless you intentionally want the
shallow vendor files too; they duplicate the same book-state family at lower
depth.
Downloader behavior:
- Default destination is
/Volumes/storage/telonex_data. - Default channel is
book_snapshot_full. - Default
--workersis 16. --max-dayscaps post-resume day jobs for smoke tests.- Runner API day loading uses
TELONEX_API_WORKERS, default32; the broader Telonex prefetch planner usesTELONEX_PREFETCH_WORKERS, default128. --parse-workersorTELONEX_PARSE_WORKERScontrols the bounded Arrow decode pool.--writer-queue-itemsorTELONEX_WRITER_QUEUE_ITEMSbounds parsed day results waiting for the writer. Default:128.--pending-commit-itemsorTELONEX_PENDING_COMMIT_ITEMSbounds completed day results held before manifest commit. Default:128.- Transient
408,425,429, and5xxresponses retry with exponential backoff. - Completed days and empty days are tracked in
telonex.duckdbfor crash-safe resume. - The writer queue and pending-commit list are bounded, and an hourly forced writer drain closes open Parquet part writers, commits their manifest rows, releases Arrow memory, and prints RSS/open-part diagnostics. Raising the queue limits can improve throughput on high-RAM machines while still preventing unbounded growth.
- Hit
Ctrl-Conce to stop gracefully; in-flight work drains and the manifest is flushed before exit.
What Is Not Plug-And-Play Yet
- Arbitrary third-party vendor raw formats.
- Automatic normalization from another vendor into PMXT raw archive hours.
- Public Kalshi backtests. Kalshi fee-model, instrument-provider, trade,
candlestick, and research helper components exist, but there is no built-in
Kalshi replay adapter or public
backtests/runner in the current framework because we do not yet have Kalshi L2 historical book data. - Limitless.exchange and Opinion.trade adapters. They are planned exchange expansion targets after the Polymarket PMXT/Telonex loading path remains stable.
- True L3/MBO priority reconstruction from public Polymarket L2 data.
If you have custom global raw dumps, the safe paths are:
- If they already match PMXT raw archive shape, point
local:/...at them. - Otherwise normalize them outside this repo into the PMXT raw schema.
- Or add a new vendor adapter that directly emits
OrderBookDeltas.