Phantom Liquidity: Why Our 0% Win Rate Was a Lie

DiveEdge Research · February 13, 2026 · 10 min read
Prediction Markets Liquidity Backtesting Kalshi Weather

Three days into our weather trading experiment, we added a new metric: filtered accuracy — measuring only contracts where our model detected a 10¢+ edge. The number that came back nearly killed the project.

0 wins. 16 losses. 10.3% accuracy on "high edge" contracts. Hypothetical P&L: -16¢.

That looks fatal. If your model performs worst on the trades it's most confident about, something is fundamentally broken. We almost pivoted away from weather markets entirely.

Then we looked at the actual order books.

The 1¢ Contracts Nobody Could Trade

Our model was detecting "edge" on contracts like Miami T79 and Chicago T40. The market priced these at 1¢. Our model said 25-40% probability. That's a huge edge on paper — buying something for 1¢ that should be worth 25¢.

There was one problem: nobody was actually selling these contracts at 1¢.

When we pulled the order book data, these contracts had:

ContractYes BidYes AskSpreadModel Edge
Miami T79+38¢
Chicago T40+29¢
Denver T65+24¢
Denver T5812¢19¢+15¢

The first three contracts were phantom liquidity — prices that existed on a screen but couldn't actually be traded. A bid of 0¢ means no one is buying. An ask of 1¢ is the minimum possible price on Kalshi, essentially a placeholder. You might get filled, but you'd be the only person in the pool.

The fourth contract — Denver T58 — had a real order book. Bids at 12-16¢, asks at 15-19¢, a 7¢ spread. This was a market with actual participants. And this contract won.

Why Phantom Liquidity Exists

Prediction markets aren't stock markets. Most contracts have very few participants. Weather markets on Kalshi are particularly thin — maybe 5-20 active traders per city per day. Many strike prices have zero natural interest.

But the exchange still displays a price. On Kalshi, every contract shows a Yes price and a No price, even when no orders exist. The minimum display price is 1¢. So a contract with zero interest and zero liquidity shows up as "1¢" in the API — right next to a contract with real depth at 15¢.

If your backtest treats both prices equally, you're mixing real and imaginary markets. And imaginary markets are always where the biggest "edges" appear — because no one is pricing them efficiently, because no one cares about them at all.

The Selection Bias Trap

This creates a vicious feedback loop for any model:

  1. Model scans for mispriced contracts (large gap between model probability and market price)
  2. Biggest gaps appear on illiquid contracts (because they're not really priced at all)
  3. Model allocates capital to highest "edge" — which is the most illiquid
  4. Those contracts lose at close to random rates (the model's probability was reasonable, but the "market price" was meaningless)
  5. Backtester records losses and concludes the model doesn't work

Meanwhile, the model's real picks — on liquid contracts with genuine two-sided order books — are buried in the noise. Denver T58 won, but it got counted alongside 15 phantom trades it couldn't have made anyway.

The model wasn't broken. The test was measuring performance on trades that couldn't actually happen.

The Fix: Liquidity Filters

The solution is straightforward, but it needs to be baked into both backtesting and live execution:

Apply these filters and the 564 "high edge" opportunities collapse to a much smaller set — but a real one. Denver T58 passes. Miami T79 doesn't. Chicago T40 doesn't.

The Broader Lesson for Prediction Markets

This isn't just a weather market problem. Any prediction market backtester that doesn't filter for liquidity will overstate edge and understate risk.

We've seen this pattern in:

Political markets: Long-shot candidates priced at 1¢ with zero volume. Your model might say 5%, but there's nobody to trade with.

Sports markets: Obscure prop bets with wide spreads. The "price" is a formality.

Crypto markets: Low-cap prediction pools where one large trade moves the price 20%. Your backtest entry price was never achievable.

The pattern is always the same: the most attractive "edges" cluster on the least tradeable contracts. If you don't filter for liquidity, your backtest is fiction.

What This Changed for Us

After adding liquidity filters, our picture flipped:

MetricBefore FilterAfter Filter
Contracts Evaluated564~30-50
Avg. Edge Detected22¢12-18¢
Win Rate10.3%TBD (collecting data)
Avg. Entry Cost1¢ (phantom)12-25¢ (real)
Confidence LevelZeroCautious optimism

The "after filter" win rate is still unknown — we need more data points on liquid contracts specifically. But we know the input data is real now. Denver T58 wasn't a fluke. It was the one trade that was actually testing our model. And it won.

Bonus: Model Overconfidence on Tails

The phantom liquidity problem masked a second issue: our model was overconfident on tail events. On contracts the market correctly priced at <1% probability, our model was assigning 10-30%.

This makes sense once you think about it. We're using a normal distribution with a fixed 3°F RMSE to estimate temperature probabilities. But tail events — temperatures 5°F+ from forecast — don't follow a simple normal curve. Local terrain, inversion layers, frontal timing, and a dozen other factors make the tails fatter or thinner depending on the city and season.

Our Day 1 finding was that average forecast error is ~1.3°F. But average accuracy isn't what matters for tail contracts. What matters is how often the forecast misses by 5°F or more. That requires per-city, per-season calibration — which we're building now.

If your model uses a single error distribution for all cities and conditions, it will be overconfident on tails. Chicago in winter ≠ Miami in summer ≠ Denver in spring. Calibrate locally or don't trade the tails.

Our Updated Rules

Coming out of this analysis, we added three hard rules to our system:

  1. Never evaluate a contract with YesBid = 0. If no one is bidding, there is no market. Don't pretend there is.
  2. Never treat min-price contracts as "cheap edge." A 1¢ contract isn't cheap — it's unpriced. There's a difference.
  3. Validate data before strategy. We almost pivoted away from weather markets based on phantom data. Always ask: "Could I actually have made this trade?" before drawing conclusions.

Conclusion

We went from "our model has a 0% win rate" to "our model hasn't been tested yet" in the span of one morning. The difference was three lines of filtering code.

If you're building any kind of prediction market strategy — or really any strategy on thin markets — check your liquidity assumptions before you check your model. The most dangerous backtest is one that looks thorough but measures fiction.

We're still collecting data. Day 1 taught us about rounding. Day 3 taught us about liquidity. Both lessons cost $0 because we hadn't risked any capital yet.

Prove the edge before you risk the capital. Every time.