This page is intentionally strict about what is supported today.
PMXT
The repository direction is raw-first:
- mirror raw PMXT archive hours onto local disk when you want local-first replay
- point runners at those raws directly
- use the public PMXT archive URLs when local raw files are missing
Runner Source Modes
The preferred PMXT quote-tick path is runner-side source selection through
MarketDataConfig(..., sources=...). Public runners pin those source values
directly in code so the file is self-contained and directly runnable.
Example:
DATA = MarketDataConfig(
platform=Polymarket,
data_type=QuoteTick,
vendor=PMXT,
sources=(
"local:/data/pmxt/raw",
"archive:r2v2.pmxt.dev",
"archive:r2.pmxt.dev",
),
)
With PMXT, the active public contract is:
- local filtered cache
- each explicit raw source in the order you list it
DATA.sources is intentionally strict here: use only local: and archive:.
Unprefixed hosts, paths, and legacy alias prefixes are rejected.
The vendored Nautilus PMXT loader still exposes lower-level env switches for custom integrations. In this repository's v3 setup, the supported remote path is direct raw archive parquet.
Lower-Level Loader Env Vars
The public runner layer is pinned in code, but the underlying loader env vars still work for custom integrations:
PMXT_LOCAL_ARCHIVE_DIRPMXT_RAW_ROOTPMXT_REMOTE_BASE_URL(comma-separate multiple archives, e.g.https://r2v2.pmxt.dev,https://r2.pmxt.dev)PMXT_CACHE_DIRPMXT_DISABLE_CACHE
What Works Today
The public PMXT runner layer reads one market/token/hour from these places:
- local filtered cache
- each explicit raw source in the order you list it in
DATA.sources
The current "bring your own data" story is therefore:
- set
DATA.sourcesin your runner to("local:/path/to/raw-hours", "archive:r2v2.pmxt.dev", "archive:r2.pmxt.dev") - or point
PMXT_LOCAL_ARCHIVE_DIR/PMXT_RAW_ROOTat a directory of raw PMXT hour files you already mirrored locally
When the runner falls back to a remote raw source, it downloads that hour to a temporary local parquet file, filters it locally, and deletes the temp artifact afterward. Persistent raw disk growth only happens when you intentionally configure a local raw mirror.
If you want local-only PMXT replays, set PMXT_LOCAL_ARCHIVE_DIR or
PMXT_RAW_ROOT to your raw-hour directory and leave remote archive sources out
of DATA.sources.
The loader still does not expose a first-class runner flag for arbitrary vendor raw dumps or automatic normalization from other vendors.
To mirror raw archive hours locally for this repo's runners, use:
make download-pmxt-raws DESTINATION=/path/to/pmxt_raws
The downloader walks direct hourly filenames from 2026-02-21T16:00:00Z
through the current floored UTC hour newest-first, probes r2v2.pmxt.dev and
r2.pmxt.dev, and keeps the larger archive object when both exist for the same
hour. It reports the direct-hour count, hours missing from all configured
sources, and requested hours still missing locally. Existing local files are
refreshed when they are empty or when an upstream source advertises a larger
object. It also prints per-hour completion lines plus the active transfer.
Example output:
PMXT raw source: direct hour probes (archive best-of https://r2v2.pmxt.dev, https://r2.pmxt.dev)
Downloading PMXT raw hours to /path/to/pmxt_raws (requested_hours=3, window_start=2026-02-27T11, window_end=2026-02-27T13)...
2026-02-27T13 12.431s 445.9 MiB archive
2026-02-27T12 0.000s existing skip
Downloading raw hours (2/3 done, 1 active): 67%|████████████████████████████████████████████████████████████▏ | [00:41<00:20]active: archive 2026-02-27T11 392.0/445.9 MiB 14.8s
Those values vary with the direct-hour window and whatever hour is currently in flight.
Supported Local File Layout
The loader-managed filtered cache still lives at:
~/.cache/nautilus_trader/pmxt
You can override it with:
PMXT_CACHE_DIR=/custom/path
Or disable it with:
PMXT_CACHE_DIR=0
PMXT_DISABLE_CACHE=1
For local raw PMXT archive hours, the loader accepts either of these layouts:
<raw_root>/polymarket_orderbook_YYYY-MM-DDTHH.parquet
<raw_root>/YYYY/MM/DD/polymarket_orderbook_YYYY-MM-DDTHH.parquet
Enable that source with low-level env vars:
PMXT_LOCAL_ARCHIVE_DIR=/custom/raw-hours
The lower-level loader raw-local mode expects the archive-style layout:
/data/pmxt/raw/YYYY/MM/DD/polymarket_orderbook_YYYY-MM-DDTHH.parquet
Enable that mode with:
PMXT_DATA_SOURCE=raw-local
PMXT_LOCAL_RAWS_DIR=/data/pmxt/raw
Or pin it directly in a runner:
sources=("local:/data/pmxt/raw",)
Required Parquet Columns
Local raw PMXT archive parquet must contain:
market_idupdate_typedata
The loader filters raw hours to market_id at parquet scan time, then filters
the remaining rows to token_id inside the JSON payload.
Required JSON Payload Shape
For book_snapshot, the loader decodes data with these fields:
{
"update_type": "book_snapshot",
"market_id": "0x...",
"token_id": "123...",
"side": "buy",
"best_bid": "0.45",
"best_ask": "0.47",
"timestamp": 1710000000.123,
"bids": [["0.45", "100.0"]],
"asks": [["0.47", "120.0"]]
}
For price_change, the loader decodes data with these fields:
{
"update_type": "price_change",
"market_id": "0x...",
"token_id": "123...",
"side": "buy",
"best_bid": "0.45",
"best_ask": "0.47",
"timestamp": 1710000001.456,
"change_price": "0.46",
"change_size": "25.0",
"change_side": "buy"
}
The loader filters to token_id by regex-matching inside the data JSON, so
that field must be present and string-encoded exactly as expected.
Telonex
Telonex is a separate Polymarket quote-tick vendor path. It does not use PMXT hourly raw files or the PMXT filtered cache.
Telonex source syntax is also explicit:
local:/path/to/telonexreads already-downloaded Telonex Parquet files from diskapi:uses the defaulthttps://api.telonex.iodownload endpointapi:https://host.examplepoints the adapter at a custom compatible base URL
The API path reads the key from TELONEX_API_KEY. Do not put API keys in
DATA.sources, notebooks, docs, or committed files. Telonex free trials count
each daily Parquet download, so warm local files first when you are experimenting
and use api: only for intentional downloads.
Recommended local layout:
/path/to/telonex/
polymarket/
market-slug/
0/
quotes.parquet
trades.parquet
book_snapshot_5.parquet
book_snapshot_25.parquet
book_snapshot_full.parquet
onchain_fills.parquet
1/
quotes.parquet
The downloader consolidates each (market, outcome, channel) group by default
so a full mirror does not create one tiny file per market/outcome/day. With
--no-consolidate, daily files are kept under
polymarket/<market_slug>/<outcome>/<channel>/<YYYY-MM-DD>.parquet. The loader
also accepts the earlier channel-first daily layout and a few flat filename
fallbacks for test fixtures and ad hoc downloads.
Download Local Telonex Files
Use the local downloader to warm the same directory the public Telonex runner uses by default:
TELONEX_API_KEY=... make download-telonex-data TELONEX_DOWNLOAD_FLAGS='\
--market-slug us-recession-by-end-of-2026 \
--outcome-id 0 \
--start-date 2026-01-19 \
--end-date 2026-02-01'
To mirror every Telonex Polymarket market and every channel into Hive-partitioned Parquet (with a DuckDB manifest for resumability), run:
uv run python scripts/telonex_download_data.py \
--destination /Volumes/LaCie/telonex_data \
--all-markets \
--channels quotes trades book_snapshot_5 book_snapshot_25 book_snapshot_full onchain_fills
The default --workers 128 is the in-flight coroutine ceiling in the
async httpx pool. On a bandwidth-heavy VPS push this to 256–512
(each slot is a coroutine + socket, not an OS thread); transient
408/425/429/5xx responses retry with exponential backoff, and the
single-thread Parquet writer is the usual practical bottleneck. Hit
Ctrl-C once to stop gracefully; the same command resumes.
The default destination is /Volumes/LaCie/telonex_data. Override it with
TELONEX_DATA_DESTINATION=/path/to/telonex_data or call the script directly:
uv run python scripts/telonex_download_data.py \
--destination /Volumes/LaCie/telonex_data \
--market-slug us-recession-by-end-of-2026 \
--outcome-id 0 \
--start-date 2026-01-19 \
--end-date 2026-02-01
The script reads the API key only from TELONEX_API_KEY and writes everything
into <destination>/telonex.duckdb — one file per destination, with one
table per channel (quotes_data, trades_data, …) plus completed_days /
empty_days manifest tables for crash-safe resume. The
local:/Volumes/LaCie/telonex_data source reads back through the same blob.
For --all-markets, progress is visible in three phases: loading the markets
dataset, planning concrete day-file downloads from each market availability
window, and downloading straight into the blob. The process is resumable:
Ctrl-C once to stop gracefully, then re-run the same command to skip
everything already recorded and continue. Transient HTTP failures
(408/425/429/5xx) and connection errors are retried with exponential backoff.
What Is Not Plug-And-Play Yet
- arbitrary third-party vendor raw formats
- automatic normalization from another vendor into PMXT raw archive hours
If you have your own global raw dumps today, the safe path is:
- if they are already PMXT raw archive hours, point
PMXT_LOCAL_ARCHIVE_DIRat them directly - otherwise normalize them into the PMXT raw archive shape outside this repo
- or add a new vendor adapter that knows how to read them directly
That keeps the strategy and runner layer unchanged.