This page is intentionally strict about what is supported today.
PMXT
The repository direction is raw-first:
- mirror raw PMXT archive hours onto local disk when you want local-first replay
- point runners at those raws directly
- treat shared servers as raw mirrors only
Mirror deployment and archived self-hosted relay guidance live on the separate Mirror And Relay Ops page so the vendor docs can stay focused on source order, raw layout, and loader expectations.
Runner Source Modes
The preferred PMXT quote-tick path is runner-side source selection through
MarketDataConfig(..., sources=...). Public runners pin those source values
directly in code so the file is self-contained and directly runnable.
Example:
DATA = MarketDataConfig(
platform=Polymarket,
data_type=QuoteTick,
vendor=PMXT,
sources=(
"local:/data/pmxt/raw",
"archive:r2.pmxt.dev",
"relay:mirror.example.com",
),
)
With PMXT, the active public contract is:
- local filtered cache
- each explicit raw source in the order you list it
DATA.sources is intentionally strict here: use only local:,
archive:, and relay:. Unprefixed hosts, paths, and legacy alias prefixes
are rejected.
The vendored Nautilus PMXT loader still exposes lower-level env switches for custom integrations. In this repository's current mirror-first setup, the supported shared-server path is raw parquet serving, not relay-hosted filtered parquet.
Lower-Level Loader Env Vars
The public runner layer is pinned in code, but the underlying loader env vars still work for custom integrations:
PMXT_LOCAL_ARCHIVE_DIRPMXT_RAW_ROOTPMXT_REMOTE_BASE_URLPMXT_RELAY_BASE_URLPMXT_CACHE_DIRPMXT_DISABLE_CACHE
What Works Today
The public PMXT runner layer reads one market/token/hour from these places:
- local filtered cache
- each explicit raw source in the order you list it in
DATA.sources
The current "bring your own data" story is therefore:
- set
DATA.sourcesin your runner to("local:/path/to/raw-hours", "archive:r2.pmxt.dev", "relay:relay.example.com") - or point
PMXT_LOCAL_ARCHIVE_DIR/PMXT_RAW_ROOTat a directory of raw PMXT hour files you already mirrored locally - or run your own raw mirror and point
PMXT_RELAY_BASE_URLat it
When the runner falls back to a remote raw source, it downloads that hour to a temporary local parquet file, filters it locally, and deletes the temp artifact afterward. Persistent raw disk growth only happens when you intentionally configure a local raw mirror.
The important distinction is:
- local raw mirrors and remote raw mirrors are the first-class paths
- the public runner layer only uses raw mirror relay endpoints
- the shared relay path for this repo is raw-only
If you want local-only PMXT replays, set both:
PMXT_RELAY_BASE_URL=0
PMXT_LOCAL_ARCHIVE_DIR=/path/to/pmxt-hours
The loader still does not expose a first-class runner flag for arbitrary vendor raw dumps or automatic normalization from other vendors.
To mirror raw archive hours locally for this repo's runners, use:
make download-pmxt-raws DESTINATION=/path/to/pmxt_raws
The downloader prints per-hour completion lines plus the active transfer while it walks archive hours. Example output:
PMXT raw source: explicit priority (archive https://r2.pmxt.dev -> relay https://209-209-10-83.sslip.io)
Downloading PMXT raw hours to /path/to/pmxt_raws (requested_hours=3, window_start=2026-02-27T11, window_end=2026-02-27T13)...
2026-02-27T11 12.431s 445.9 MiB archive
2026-02-27T12 0.000s existing skip
Downloading raw hours (2/3 done, 1 active): 67%|████████████████████████████████████████████████████████████▏ | [00:41<00:20]active: relay 2026-02-27T13 392.0/445.9 MiB 14.8s
Those values vary with the archive listing and whatever hour is currently in flight.
Supported Local File Layout
The loader-managed filtered cache still lives at:
~/.cache/nautilus_trader/pmxt
You can override it with:
PMXT_CACHE_DIR=/custom/path
Or disable it with:
PMXT_CACHE_DIR=0
PMXT_DISABLE_CACHE=1
For local raw PMXT archive hours, the loader accepts either of these layouts:
<raw_root>/polymarket_orderbook_YYYY-MM-DDTHH.parquet
<raw_root>/YYYY/MM/DD/polymarket_orderbook_YYYY-MM-DDTHH.parquet
Enable that source with low-level env vars:
PMXT_LOCAL_ARCHIVE_DIR=/custom/raw-hours
The lower-level loader raw-local mode expects the archive-style layout:
/data/pmxt/raw/YYYY/MM/DD/polymarket_orderbook_YYYY-MM-DDTHH.parquet
Enable that mode with:
PMXT_DATA_SOURCE=raw-local
PMXT_LOCAL_RAWS_DIR=/data/pmxt/raw
Or pin it directly in a runner:
sources=("local:/data/pmxt/raw",)
Required Parquet Columns
Local raw PMXT archive parquet must contain:
market_idupdate_typedata
The loader filters raw hours to market_id at parquet scan time, then filters
the remaining rows to token_id inside the JSON payload.
Required JSON Payload Shape
For book_snapshot, the loader decodes data with these fields:
{
"update_type": "book_snapshot",
"market_id": "0x...",
"token_id": "123...",
"side": "buy",
"best_bid": "0.45",
"best_ask": "0.47",
"timestamp": 1710000000.123,
"bids": [["0.45", "100.0"]],
"asks": [["0.47", "120.0"]]
}
For price_change, the loader decodes data with these fields:
{
"update_type": "price_change",
"market_id": "0x...",
"token_id": "123...",
"side": "buy",
"best_bid": "0.45",
"best_ask": "0.47",
"timestamp": 1710000001.456,
"change_price": "0.46",
"change_size": "25.0",
"change_side": "buy"
}
The loader filters to token_id by regex-matching inside the data JSON, so
that field must be present and string-encoded exactly as expected.
Relay Mode
The public runner layer does not assume a repo-wide default relay host. If you
want a relay fallback, set it explicitly in DATA.sources or with the loader
env var below.
Override it with:
PMXT_RELAY_BASE_URL=https://your-relay-host
Disable relay usage entirely with:
PMXT_RELAY_BASE_URL=0
Mirror-only deployments are the preferred server shape now. In that mode the
server keeps mirroring raw hours and may serve /v1/raw/..., but /v1/filtered
and filtered-hour listing endpoints are disabled or expected to miss.
What Is Not Plug-And-Play Yet
- arbitrary third-party vendor raw formats
- automatic normalization from another vendor into PMXT raw archive hours
If you have your own global raw dumps today, the safe path is:
- if they are already PMXT raw archive hours, point
PMXT_LOCAL_ARCHIVE_DIRat them directly - otherwise normalize them into the PMXT raw archive shape outside this repo
- or add a new vendor adapter that knows how to read them directly
That keeps the strategy and runner layer unchanged.