2012-03-11

RSS- & UAH-AMSU prediction

Welcome to my new blog!
Updates will be infrequent and mainly concerned with results of my hobby of (automated) time series analysis.

Let's start with the global temperature anomaly, as calculated from satellite measurements since 1979.

RSS- & UAH AMSU data with zero mean, Excels linear regression and 4th order polynomial fit,
as well as a simple model of linear trend + single cosine. The asterisks indicate El Niño events.

Most of the medium term variance of ~0.5 °C can be attributed to ENSO and the big volcanic eruptions, like El Chichón 1982 and Mount Pinatubo 1991. Curiously the cooling caused by Mount Pinatubo fell into a period which should have had a La Niña episode, but didn't actually, so the pattern is probably more regular than it would have been without the volcano.

The overall correlation between between the RSS and UAH series is high at 0.957,
but over the last 12 months, UAH has been on average 0.05 °C warmer than RSS.
Difference between the two datasets.

The autocorrelation of the series is significant and confirms the ENSO signal with a period of ~3.7 years.


Auto- and cross-correlation of the detrended data.







Now for the interesting part of using genetic algorithms to automatically build models and use them for forecasting. The models (neural net like recursive algorithms) use only the timeseries as input and try to predict one step ahead. During optimization, the input is mostly clamped to the data but sometimes replaced by the model output for a short period. The model fitness is usually the mean squared deviation from the data, but could take into account other desirable properties like the first derivative.
The candidate models have a high potential number of freedoms, and are thus prone to overfitting (modelling the noise), so we're not really interested in the global minimum, but the genetic optimization usually converges on a local minimum with a low effective number of freedoms, which models an essential property of the data and is thus useful for forecasting.


The final model is then run many times with added noise and the results are averaged, to get a representative output and estimate the error.
Here are the results of 3 optimization runs:





The models almost always exploit the ENSO regularity, which the manual analysis indicated, but model c also picks up some higher frequency details, and consequently has the lowest error bounds and highest R²=0.84, though this measure is not too meaningful, as it depends on the forecasting horizon and is calculated on the same data used during fitting.
To verify a model, its performance should be evaluated on unseen data, but this timeseries was so short, that I decided to use all existing data during optimization and the verification must be done on future data.
Model C is probably overfit, and model A and B show that the error bounds in the far future are most likely underestimated, as small timing differences lead to a big phase shift. Getting the timing right is often more difficult than the magnitude and direction.

No comments:

Post a Comment