Docs/Forecast Accuracy

Methodology

Forecast accuracy, measured honestly

Most cloud tools show a forecast line and never tell you how often it was right. TurboFinOps backtests every forecast against your real billing history and publishes four metrics — MAPE, bias, 95% interval coverage and skill versus a persistence baseline — so you know exactly how much to trust the number.

MAPE — Mean Absolute Percentage Error

The headline accuracy number. Averaged over holdout days with non-zero actuals: mean of |forecast − actual| / actual. Lower is better; a MAPE of 8% means forecasts were, on average, within 8% of realized spend.

Bias — directional error

Mean signed error (forecast − actual) as a percentage. Positive bias means the model systematically over-forecasts; negative means it under-forecasts. A model can have low MAPE but meaningful bias, which matters for budget planning.

95% interval coverage

The share of holdout days whose actual spend fell inside the forecast 95% prediction interval. Well-calibrated intervals cover close to 95% of outcomes — materially lower means the bands are too tight to trust.

Skill vs. persistence

We compare model MAPE against a naive persistence baseline (tomorrow = today). Skill is the relative improvement. A forecast that cannot beat persistence is not adding value, and we say so rather than hide it.

How the backtest works

  1. 1. Hold out. Set aside the most recent N days of billing data (default 14).
  2. 2. Refit. Train the forecast model only on data before the holdout window — no leakage from the period being scored.
  3. 3. Predict. Forecast across the holdout window, including 95% prediction intervals.
  4. 4. Score. Compare predictions to realized spend → MAPE, bias, interval coverage, and skill vs. persistence.

The same fitted model that serves your live forecast is the one scored — there is no separate “demo” model tuned to look good in a backtest.

How is forecast accuracy measured?

By holdout backtesting: TurboFinOps holds out the most recent N days of billing data, refits the forecast model on everything before the holdout, then scores the model’s predictions against what actually happened over the held-out window. The accuracy exercise runs the exact same model that serves live forecasts.

What is a good MAPE for cloud cost forecasting?

It depends on spend volatility, but for stable workloads a MAPE under ~10% is strong, and 10–20% is typical for mixed estimated and metered spend. TurboFinOps always reports the measured value rather than a marketing figure.

Why compare against a persistence baseline?

Persistence (assuming spend stays flat) is a hard-to-beat baseline for short horizons. Reporting skill versus persistence prevents over-claiming: a model is only credited when it genuinely improves on the naive forecast.

Does the methodology change with the forecast model?

No. Whether the underlying model is linear regression, ARIMA or a seasonality-aware fit, accuracy is always measured the same way — holdout backtest, MAPE, bias and interval coverage — so numbers stay comparable across models.

Where can I see accuracy for my own data?

The Forecasts area of the dashboard surfaces these metrics for your organization’s spend, computed on your real billing history through the GET /forecasts/accuracy endpoint.

See it on your own spend in the Forecasts dashboard, or read how forecasts feed budgets and reports.

Get started

Find recoverable spend before the next invoice lands.

Connect one AWS, Azure or GCP scope, approve the safest savings actions, and give finance a receipt when the savings verify.

Read-only scan first. Approval gates before remediation.