How Accurate Are NOAA Weather Forecasts? We Tracked Every Degree.
Everyone checks the weather forecast. Few people check whether the forecast was right.
We built an automated system that pulls NOAA high temperature forecasts every 30 minutes across multiple US cities, then compares them against actual observations from the NWS Daily Climate Report (CLI). After a week of continuous collection — over 21,000 forecast data points — here's what we found.
How We Measured
Our system scrapes the NWS Weather API (api.weather.gov) for point forecasts, which provide high temperature predictions for each day. We collect these every 30 minutes to track how forecasts evolve as the target date approaches.
For "actual" temperatures, we use the NWS Daily Climate Report — the same source that Kalshi prediction markets use to settle weather contracts. This is the official quality-controlled record.
Important caveat: This is one week of data across three cities. It's enough to see patterns, not enough to draw definitive conclusions. We're sharing early because the patterns are interesting and our data collection continues.
The three cities with the most settlement data so far: Chicago, Denver, and Miami.
The Headline Number: 1.3°F Average Error
Across all cities and all settled days, NOAA's mean absolute error (MAE) for high temperature forecasts was approximately 1.3°F.
That's remarkably good. It means if NOAA says tomorrow's high will be 55°F, you can expect the actual temperature to land between 54°F and 57°F about two-thirds of the time.
But averages hide important variation. The city-by-city story is where it gets interesting.
City-by-City Breakdown
🏙️ Chicago — The Reliable One
| Date | NOAA Forecast | Actual High | Error |
|---|---|---|---|
| Feb 10 | 48°F | 48°F | 0°F ✅ |
| Feb 11 | 40°F | 42°F | 2°F |
| Feb 12 | 43°F | 43°F | 0°F ✅ |
| Feb 13 | 55°F | 54°F | 1°F |
Average error: 0.75°F. Chicago is NOAA's best performer in our sample. Flat terrain, well-instrumented, the Great Lakes moderate surprises. Two perfect forecasts in four days.
🏔️ Denver — The Tricky One
| Date | NOAA Forecast | Actual High | Error |
|---|---|---|---|
| Feb 10 | 60°F | 59°F* | 1°F |
| Feb 11 | 58°F | 57°F | 1°F |
| Feb 12 | 55°F | 55°F | 0°F ✅ |
| Feb 13 | 63°F | 62°F | 1°F |
Average error: 0.75°F. Denver surprised us. Despite its reputation for unpredictable mountain weather, NOAA nailed it consistently. The asterisk on Feb 10: the CLI report showed <58°F due to a rounding artifact we documented earlier, but the raw station data read 59°F.
🌴 Miami — The Wild Card
| Date | NOAA Forecast | Actual High | Error |
|---|---|---|---|
| Feb 10 | 79°F | 76°F | 3°F |
| Feb 11 | 79°F | 79°F | 0°F ✅ |
| Feb 12 | 80°F | 78°F | 2°F |
| Feb 13 | 82°F | 80°F | 2°F |
Average error: 1.75°F. Miami consistently overforecasts — NOAA tends to predict a couple degrees warmer than reality. This could be sea breeze timing, cloud cover variability, or station microclimate effects. The bias is consistent, which is actually useful: a predictable bias can be calibrated out.
Three Patterns We Noticed
1. NOAA Has a Warm Bias in Subtropical Cities
Miami's forecasts were consistently 1-3°F too warm. In four days of data, NOAA overforecast the high every single time except Feb 11. This isn't random noise — it's a systematic bias.
If this holds over a larger sample, it means contracts asking "Will Miami's high exceed X°F?" are more likely to settle NO than NOAA suggests. A model that naively trusts NOAA would overestimate YES probability for Miami contracts.
Practical implication: If NOAA says Miami's high will be 80°F, adjust your expectation to ~78°F before evaluating contracts near that strike.
2. Flat Terrain Cities Are More Predictable
Chicago (flat, Great Lakes-moderated) had the lowest error. This makes meteorological sense — temperature forecasting is fundamentally about predicting air mass behavior, and flat terrain doesn't create the localized effects (mountain waves, downslope warming, valley cold pools) that surprise forecasters.
Denver being equally accurate in our sample was a surprise. We expected mountain proximity to introduce more variability. A longer data window might reveal that — or it might show that NOAA's Denver forecasters have calibrated well for local effects.
3. Same-Day Forecasts Are Not the Whole Story
All the numbers above are from the most recent forecast before settlement. But NOAA updates forecasts multiple times per day. A forecast issued 5 days before often differs significantly from the one issued 12 hours before.
In our data, we've observed forecast revisions of 3-5°F for Denver forecasts issued 3+ days out that later converge to within 1°F by the day before. The revision itself is a signal. When NOAA revises a forecast upward by 3°F, but the market hasn't repriced, that's potentially actionable information.
We're building tools to detect and alert on these revisions — more on that in a future article.
What This Doesn't Tell You
A few important caveats:
- One week is not enough. 12 city-day observations can't tell you the true long-term error distribution. Seasonal effects, storm systems, and rare events will widen the error bands. This is a snapshot, not a verdict.
- We only measured high temperatures. Low temperature forecasts, precipitation probability, and wind speed forecasts may have very different accuracy profiles.
- February in three cities. Summer forecasting is different. Convective weather (thunderstorms) is harder to predict than winter high pressure systems. Cities with more weather volatility (Dallas, Denver in spring) might show wider errors.
- The CLI rounding problem is real. As we documented in our Day 1 report, the NWS Climate Report uses a Celsius-to-Fahrenheit conversion that can shift reported temperatures by 1°F compared to raw station data. This matters for boundary contracts.
How This Compares to Market Expectations
If NOAA averages 1.3°F error, what does that mean for prediction markets?
Assuming a roughly normal error distribution, a 1.3°F MAE corresponds to about a 1.6°F standard deviation. That means:
| Distance from Forecast | Probability of Exceeding | Implication |
|---|---|---|
| 0°F (at forecast) | ~50% | Coin flip — not tradeable |
| 1°F above | ~27% | Market should price ~27¢ |
| 2°F above | ~11% | Market should price ~11¢ |
| 3°F above | ~3% | Very unlikely — deep tail |
| 5°F above | ~0.1% | Almost never — extreme tail |
When we see a market pricing a 3°F-above contract at 15¢ while our model says 3%, that looks like edge. But remember: our model assumes a simple normal distribution. Reality has fatter tails — sometimes the forecast misses by 5°F, and those rare events are what make tail contracts dangerous.
Don't trade the tails with a thin-tailed model. A 1.3°F average error doesn't mean 3°F misses are impossibly rare. They're just uncommon. Per-city calibration and longer data series are needed before trusting tail probabilities.
What We're Building Next
We're continuing to collect data across 8 cities (adding New York, Los Angeles, Seattle, Houston, and Atlanta). Our goals:
- Per-city error distributions — enough data to calibrate each city independently rather than using a single global RMSE
- Forecast revision tracking — detecting when NOAA updates a forecast and by how much, as a trading signal
- Seasonal baselines — February is relatively easy to forecast. How does accuracy change in spring storm season?
- Live accuracy dashboard — we're working on making our forecast-vs-actual data available as a public tool on DiveEdge
We'll publish updated accuracy numbers as our sample size grows. Follow along at diveedge.io/research.
The Takeaway
NOAA's forecasts are good — better than most people assume. A 1.3°F average error for next-day high temperatures means the free government forecast is a legitimately strong signal for anyone trading weather-related markets.
But "good on average" doesn't mean "good everywhere." Miami runs warm. Boundary contracts are a trap. And the tails are where models break. Use NOAA as a foundation, but don't trust it blindly at the edges.
The data is free. Checking whether it's right is the real work.