April 28, 202614 min read

Polymarket Weather Strategy: How to Find and Keep an Edge

Most Polymarket weather strategies fail not because the underlying logic is wrong but because the edge was never validated in the first place. The trader had a theory — “ECMWF is usually right” or “tails are underpriced” — placed bets based on it, had a few winning weeks, and concluded they had an edge. Then the losses started.

What “Edge” Actually Means

In a prediction market context, edge is the expected profit per dollar risked on a given type of trade. It's not whether you win more often than you lose — a trade that wins 85% of the time can still be negative expected value if you're systematically overpaying for the 85% outcomes.

Edge is always relative to price:

If you believe the probability of a bucket is 55% and it's priced at 45¢, you have +10% edge.
If you believe the probability is 55% and it's priced at 58¢, you have −3% edge.

The practical goal of weather strategy development is to identify systematic situations where your probability estimate exceeds the market price by a meaningful margin — not occasionally, but repeatedly and reliably enough to profit after accounting for variance.

The Four Structural Edges in Weather Markets

Before building a specific strategy, understand where the category-level edges live. There are four well-documented structural sources of edge in Polymarket daily temperature markets.

1. The Model-Update Lag

NWP models update on predictable schedules. Each new run that materially shifts the forecast should immediately reprice the relevant Polymarket bucket — but it doesn't, because Polymarket prices only change when traders act. The gap between a new model run being published and the market fully reflecting it is the most widely documented edge in weather trading.

The size of this window depends on city volume and bot saturation. In 2024, the window on major markets (NYC, London) was 30–60 minutes. As of 2026, competitive bots have compressed it to 5–15 minutes on the most active markets. On secondary markets (Buenos Aires, Cape Town, Atlanta), the window can persist for hours.

How to exploit it: Set up alerts for the publication times of the model runs most relevant to your target cities (ECMWF 12 UTC primary is the highest-impact for European markets; HRRR's hourly cycle is highest-impact for US final-day markets). Check the new run's output against the current market price immediately. If the model has shifted by more than your minimum edge threshold, enter.

The risk: Not every model shift is correct. A single run that aggressively moves warm or cold often partially reverses in the next run. Waiting for two consecutive runs to confirm the same direction before entering reduces false positives.

2. The Airport-vs.-City Discrepancy

Every Polymarket temperature market resolves at a specific airport station. Most retail traders use city-center weather readings as their reference. The gap between city center and airport temperature is not random noise — it has a predictable sign and magnitude based on local geography.

Examples:

LaGuardia (KLGA) runs 3–5°F cooler than Midtown Manhattan in summer sea-breeze events.
Paris Le Bourget (LFPB) runs 2–3°C cooler than the Paris urban core in summer anticyclonic conditions.
Hong Kong VHHH (airport on reclaimed land offshore) runs slightly cooler than Kowloon during stagnant heat events.
Burbank (KBUR) runs warmer than coastal LAX when offshore flow suppresses the marine layer at the coast but not inland.

These discrepancies are predictable from synoptic analysis. When the pattern is clear (e.g., confirmed sea-breeze day in NYC, confirmed heat dome with light winds in Paris), the retail-vs.-airport gap is a reliable edge component.

How to exploit it: Maintain a dataset of historical pairs: the wunderground finalized daily high at the resolution station, alongside the city-center reading from a standard weather service. Compute the average delta by season and atmospheric regime type. In trading, apply that delta when market prices suggest retail is anchoring to the wrong reading.

3. The Calibration Edge (Better Probability Estimation)

Even with the same forecast data, two traders can arrive at very different probability estimates for a given bucket. One might just look at the point forecast and bet on the single most-likely bucket. Another might properly model the full distribution of outcomes with realistic uncertainty. The latter will consistently have better-calibrated probabilities and therefore make better trades.

This is the deepest and most durable edge — it doesn't require speed (unlike the model-update lag) and it doesn't require special data (unlike private models). It just requires doing the probability math correctly.

What correct probability estimation looks like:

Using ensemble spread, not just the deterministic forecast mean, to set the width of your bucket probability distribution.
Applying station-level bias corrections based on historical model vs. observation residuals at each resolution station.
Accounting for underdispersion in raw ensembles by inflating the spread using historical calibration data.
Using the correct distributional form — Normal works fine for most situations but fails in tail events and convective regimes.

A bot with properly calibrated probabilities will have a Brier score (mean squared probability error) noticeably lower than one using raw model output. That Brier improvement is the calibration edge, and it compounds into real money across hundreds of markets.

4. The Behavioral Edge (Exploiting Retail Biases)

Retail prediction market participants show well-documented behavioral biases that systematic traders can exploit. In temperature markets specifically:

Recency bias: Yesterday's temperature anchors tomorrow's guess. If NYC was 88°F today, retail pushes up the 86–90°F bucket for tomorrow regardless of what the models say. After sharp temperature transitions (front passage, onset of cool air mass), retail lags the actual new state of the atmosphere.

Favorite-longshot bias: Low-probability buckets ($0.05–$0.15) are sometimes systematically underpriced relative to their true probability, and high-probability buckets ($0.60–$0.80) are sometimes slightly overpriced. Whether this pattern holds on a given day and city requires checking against your calibrated model.

News-driven herding: A heat-wave story on CNN drives retail into the top temperature buckets even after the models have already moderated the forecast. The behavior peak often arrives 12–24 hours after the model has already adjusted.

Building a Strategy: The Process

Step 1: Choose Your Edge Type and Time Horizon

Every good weather strategy starts by committing to a specific edge type and operating window. Trying to do everything (latency arb + calibration + behavioral) simultaneously without a clear priority produces a confused, hard-to-diagnose system.

For a first strategy, pick one:

Calibration-based, 24–48h lead time — Your edge is better probability estimation than the market. You trade multiple times per day when edge is high, regardless of city.
Model-update latency, final-day — Your edge is speed at model release windows. You need automation and a reliable data feed from the model run.
Behavioral, post-weather-event — Your edge is fading the retail overreaction after a notable weather event. You enter manually, selecting high-confidence setups.

Step 2: Define Your Signal

Write down, in advance, exactly what has to be true for you to enter a trade. Not “the model looks good for this bucket” — that's not specific enough. Instead:

“I buy YES on bucket B if: (a) my ensemble-implied probability for bucket B is ≥ 8% above the current YES ask price, (b) the lead time is ≤ 48 hours, (c) the ECMWF and GFS ensemble means agree within 1.5°C on the city's daily high, and (d) the market has ≥ $10,000 in total volume.”

Every condition serves a purpose. Condition (a) is the edge threshold. (b) restricts to a lead time where your calibration is validated. (c) filters out high-uncertainty multi-model-disagreement setups where your edge is smaller. (d) ensures enough liquidity to enter at a reasonable price.

Step 3: Backtest Against Historical Data

Before risking money, test your signal against the historical record. You need two datasets:

Historical Polymarket weather market prices at various timestamps before resolution
Historical finalized station temperatures (from wunderground or NOAA NCEI) for each resolution airport

With these, you can reconstruct what your signal would have said on past dates and compare it to the eventual outcome.

What to measure:

Win rate — What fraction of your trades resolved correctly.
Brier score — For your probability estimates vs. actual outcomes.
EV per trade — Average (outcome − entry price) across all trades.
Sharpe ratio — EV per unit of standard deviation (measures risk-adjusted edge).
Drawdown — Longest streak of consecutive losses, maximum cumulative loss.

If the backtest shows no edge after transaction costs, the strategy is not viable. If it shows strong edge in the backtest, your job is to explain why that edge exists and why it should persist forward. If you can't articulate the mechanism, you may be overfitting.

Step 4: Paper-Trade to Validate

Before deploying real capital, run the strategy in simulation mode for 30–60 days. Keep a log of every signal generated, every paper-trade placed, and every outcome. Compare paper-trade performance to backtest expectations.

If paper-trade performance is significantly weaker than backtest (a common outcome), the backtest was likely overfitting to historical quirks. Investigate: are you getting similar fill prices? Are you trading similar market conditions? Is there something about live market microstructure that the backtest didn't capture?

Only proceed to live trading when paper-trade performance is consistent with a plausible backtest, over at least 100+ paper-trade positions.

Step 5: Start Small and Scale Slowly

Begin live trading at 10–20% of your target position sizes. Keep meticulous records of every live trade: entry price, exit price (or resolution outcome), edge estimated at entry, actual outcome, P&L.

After 200–300 live trades, you have enough data to compute your live Brier score and compare it to paper-trade and backtest. If the live Brier is meaningfully worse, something has changed: market conditions, competition level, your data feed, or your model calibration. Identify and fix it before scaling up.

The Minimum Edge Threshold: Why 8%?

The commonly cited minimum edge threshold for weather trades is 8%. Here's why that number makes sense:

Spread cost: A typical liquid weather market has a 2–4¢ spread. If you're paying the ask, you're immediately behind by 1–2% of the contract's value. That has to be overcome.
Calibration uncertainty: Even a well-designed forecast model has uncertainty in its own probability estimate. A 51-member ensemble sampling error on a mid-distribution probability (say 40%) is approximately ±7–8 percentage points at the 95% confidence level. If your edge looks like 4%, it might just be sampling noise in your 51-member ensemble.
Competition: Other bots are running similar models. If the edge is 3%, a faster bot already took it. The 8% threshold roughly filters for situations where the market is meaningfully wrong, not just marginally stale.
Transaction costs: Beyond the spread, there's Polygon gas (trivial at current prices) and any third-party service fees if using a wrapper bot.

A 5% threshold is workable if your calibration is genuinely excellent (validated Brier score ≤ 0.15). An 8% threshold is appropriate for a new strategy not yet validated at high confidence.

How Edge Decays (And What to Do About It)

Edge in a given weather trading approach tends to decay over time as more participants discover and exploit the same signal. Monitoring edge decay is as important as finding the edge in the first place.

Signs of edge decay:

Live Brier score starts rising relative to backtest/paper-trade baseline.
Win rate drops while market structure (spreads, volume) remains unchanged.
Your model-update lag window closes faster than it used to.
The behavioral patterns you relied on (recency bias on certain cities) stop appearing in price action as other bots start fading them.

What to do:

Add model complexity. If competitors are all using GFS, add JMA or CMA. If they're all using 24h lead times, improve your 36–72h calibration.
Move to less competitive markets. Edge often persists longer on secondary cities (those with lower volume and fewer bot traders).
Improve execution. If your edge is decaying because competitors are faster, address the infrastructure: VPS proximity to Polymarket's servers, WebSocket feeds vs. REST polling, pre-staged signed orders.
Find new behavioral patterns. Retail biases evolve; new ones emerge. The marine-layer play in LA was less known in 2024 than today.

The Portfolio View: Why Single-City Strategies Don't Scale

A profitable weather trading operation needs to think in portfolio terms, not individual-trade terms. The variance on any single binary bet is enormous — even a 65% probability trade loses 35% of the time. Profitability shows up statistically only over many hundreds of independent trades.

This is why top Polymarket weather traders operate across 10–30+ city/date combinations simultaneously. The diversification doesn't eliminate variance (temperature markets have positive correlations — if Europe has a heat wave, London and Paris move together), but it reduces the per-day variance enough to make the strategy's expectation show up in actual results.

A practical rule of thumb: you need a minimum of 200 independent trades to have statistical confidence that an observed 8% win-rate advantage is real rather than noise. At 5–10 trades per day across active markets, that's 3–6 weeks of continuous trading — which is why paper-trading for a month before going live isn't excessive caution, it's the minimum required sample size.

When to Stop Trading

Disciplined exits are as important as disciplined entries. Define in advance the conditions under which you'll halt the strategy:

Per-day loss limit: If the day's P&L reaches −5% of bankroll, stop for the day. Review what went wrong before resuming.
Rolling Brier threshold: If your rolling 30-day Brier exceeds 0.30 (indicating calibration has broken down), stop trading and diagnose.
Consecutive-loss circuit breaker: If you've lost 8 consecutive trades on the same city/lead-time combination, assume your signal has failed for that context and pause it specifically.
Maximum drawdown: Define a total bankroll drawdown level (e.g., 30%) at which you stop, step back, and reassess whether the fundamental edge still exists.

The traders on the Polymarket weather leaderboard who have sustained profits for 12+ months are not the ones who had the single best model. They're the ones who managed risk tightly enough that bad periods didn't end the operation before the good periods had time to produce results.

Request Access to WeatherCaster