SoftestPawn’s Weblog

Oh no, not another blog!

Apparent Trends and Random Variations

Posted by softestpawn on July 28, 2009

Take a look at these graphs:

Run2Clip Run2Clip RunClip
RunClip RunClip RunClip

You can see trends and features: The noisy but steady drop in the first, a similar but slower rise in the second. The fourth has a slow rise followed by a sharp and maybe accelerating rise. And so on.

The thing is, these are randomly generated, using what are known as ‘random walks’. There are no underlying trends, no feature-causing events. There are just a set of random numbers added together. And these charts have not been specially picked, they are just six consecutive runs of the same random walk spreadsheet. Try it yourself:

Make Your Own Random Walk

Random walks are created where a value is changed by, rather than set to, a random value at each step.

For example, get a willing volunteer to walk across a field tossing a coin; if it shows heads he takes a step to the left and one forward, and if it shows tails he takes a step to the right and one forwards. If he’s really helpful, carries a leaky paint pot, and walks across several fields you get something like those graphs above.

However as it’s raining out and I don’t want to waste any paint, they were created in a very simple OpenOffice spreadsheet: enter the formula A1+RAND()*1 -0.5 into cell B1. This gives B1 a number between -0.5 and +0.5. Then copy it into, say, 200 cells below. Each cell creates a random number between -0.5 and 0.5 and adds it to the previous number.

What is perhaps remarkable, if you think of random numbers as being, well, random, are these long trends of steadily decreasing or increasing apparent trends.

Smoothing and Trends

So those are examples of randomly generated graphs that at first sight look quite similar to many graphs that we get when measuring features of the environment.

For example, if we look at a buoy bobbing about on the sea, it appears to change height from the sea floor in an almost random, unpredictable way, as the overlapping waves, boat wakes, splashes and wind push it about. These small changes are not random, or noise; the measurements are measurements of height at a particular point, and so are signal; they tell us how high the buoy is. However they are chaotic, and so (for most practical purposes) not predictable.

The difficulty is in telling the difference between random systems and ones that do actually have underlying trends. Sometimes simply time well tell: long observations of those random walks above will give us fewer and fewer consistent patterns. Long observations of buoy heights give us predictable tides.

When we’re looking at ways in which complex systems work, we can sometimes find underlying causes by smoothing out the inconvenient small scale changes that confuse the larger patterns. We look to remove the ‘noise’ to reveal the underlying signal.

This requires long observations though, where the patterns can be consistently and reliably repeated, and it requires looking at the right scale in the data. If we smoothed our buoy height data over weeks, we may see seasonal patterns but would miss out the tidal ones.

Linear fits

The simplest trend analysis is to see whether the data is tending to go up or down overall. We can see in the first two graphs at the top above that there is a steady fall; what about the last one? We can find out by a method called ‘linear regression’. The way it works doesn’t really concern us, but using it gives us a line through the data that is as close as it can be to every single point in that data set. The angle of the line tells us how fast the values have been increasing or decreasing, overall.

run2clip-line

Sometimes this doesn’t tell us anything very useful. We can see in that last graph a sudden drop at the end; is this merely a disturbance to the underlying trend, or is it part of the underlying trend’s events? Similarly the third graph, with it’s large trough in the middle, doesn’t lend itself well to a straight line.

In fact none of them do. The key thing to remember here is that there is no underlying trend in these graphs; they are merely random numbers added together.

Fooling the Eye

Being humans we tend to look for patterns and trends, and the way our mind is wired we’ll spot them too – even in random data. This is probably because of something excitingly dangerous such as being able to spot predators, prey or mates in the dappled jungles of Africa. Whatever it is, it can lead us astray too, to think we’ve found things we haven’t.

Take the second graph above, and we can see that if we look at low scale trends (trends for short timescales), the trend lines (yellow) are much steeper – up and down – than the overall one (the blue one):

Run2-Granularity-coarse

If we look at longer scales, the trend lines (red here) gradually flatten out to be come closer to the blue one:

Run2-Granularity-coarse

Until we get to scales of the same order of magnitude as the whole graph, and the trend lines are very nearly those of the overall blue one:

Run2-Granularity-coarse

This can lead us to think that the longer trends are ‘better’. But if we have a look at where that graph fits into the much longer run that it was clipped from, we can see that even the overall blue line trend of the clip (above the red bar) has little to do with the bigger picture:

Run2Full

The apparent smoothed trends we see above are only features of the length of the graph. They tell us nothing about longer term trends (well, they can’t, there aren’t any…).

Scale and Granularity

The above are actually clips from runs of 10,000 points. If we look at these longer ones, we can see similar effects: at no point do we start getting an overall smoothing, as the more steps we have, the more likely we are to have long runs of apparently biased direction. Here’s the first and third longer runs (the second is above):

Run1Clip Run1Full
Run3Clip Run3Full

(bear in mind the Y axis scales are different)

So What?

There is no ‘natural’ scale where a noisy-looking system can be smoothed out to. It is tempting to look at the data you have in front of you and fit a trend-line, but without more knowledge behind that data, that trend says nothing about any underlying one without something more.

We need either long enough observations to establish a pattern, and/or enough knowledge about the mechanics of the thing being recorded that we can relate features and trends in the data set to known changes in those mechnics.

For example, if someone’s body temperature is unusually high and increasing, we should worry. We should probably do more than just worry, but it’s not something to ignore because it might be random; I don’t know how the body works in detail, but I do know what a body temperature behaves like; it’s been observed so many times and for so long that patterns have been established, even if working knowledge has not.

So… Global Temperature…

So, yes, the next examples come from my favourite subject, Global Warming, because some people seem to have forgotten that it’s not enough to draw a straight lines through data and imply things about the future from it:

IPCC 2007 Recent Temperature Increases

(IPCC showing how trends were increasing in the run up to their 2007 report, Working Group 1, chapter 3)

The recent claims that temperatures ‘are’ decreasing are on similarly shaky ground:

temp2002-2008

We could even take the full dataset that we have, which for Hadley runs 1850-2009, and look at the apparent trend there (ignore the green patch and line):

150 year trend

But, again, that tells us nothing about future behaviour by itself.

Some of the more frothing deluders enthusiastic GW advocates* say that small changes are ‘noise’ over an underlying trend or signal. However there is very little noise in the records; the values in datasets like Hadley’s are pretty much all signal (ignoring for the moment systematic errors).

There’s a fundamental error in an approach that dismisses inconvenient short-term variations as ‘natural’ but does not understand the range of time scales that ‘natural’ is valid for; there is no reason to assume that longer-term variations are not also natural. The temptation is just to ‘smooth’ the data until it looks right to the eye, but that tells us interesting things about how the eye and brain interpret shapes and nothing about the data.

Summary

We really can’t say anything useful about temperature trends by just examining the recent record.

We need knowledge of how the climate works, captured usually as models. A lot of people are working hard to understand the climate based on the various records of various measurables; but most work on some small aspect of it, few but the most enthusiastic deluders claim anyone has complete understanding. And the good quality data is fairly recent, there are huge systematic problems with it (such as surface station placements, urban heat island effects, tidal station changes) and for anything more than a few decades we tend to have to use proxies and add another layer of systematic problems.

When we check the models, they need to be checked against features of the dataset, not carefully selected subsets or the overall trend. So we’ve seen a steady rise of CO2; why did temperature rise 1910ish-1940ish at the same rate as recently when little man-made CO2 was present? ie what was that natural variation and has it been included in our knowledge base – and eliminated as a candidate for the recent rise; do the models that ‘predict’ the 1980-2000 rise also ‘predict’ the earlier periods?

These models and their validation are key; it’s not sufficient to establish underlying trends by drawing a straight line through some data.

(Audit, full disclosure, etc: Example spreadsheet to create random walks and zip file of the 6 runs made )

* I must remember not to use the same tone here as I would in forum arguments. Apologies to tamino who was not at all frothing below.

Advertisements

6 Responses to “Apparent Trends and Random Variations”

  1. tamino said

    There are at least two reasons that a random walk does not properly model global temperature.

    The first is mathematical. A random walk has a characteristic autocorrelation function — but observed temperature time series don’t show that pattern.

    The second is physical. A random walk is unbounded; I doubt anyone believes earth’s temperature is unbounded.

    As for models of climate change, they include a large number of factors besides greenhouse gases. The “1910-ish to 1940-ish” rise was due to a slight increase in solar output, a significant lull in volcanic activity, and yes, greenhouse gases too — despite protests of “little man-made CO2,” levels were signficantly higher than pre-industrial.

    And computer models match the entire 20th century temperature record extremely well. As for the fact that they ‘predict’ the 1980-2000 rise, you should remove the surrounding quote marks.

    • Indeed; this was not intended to claim that global temperatures can be modelled by random walks, but that drawing apparent trends from recent records alone are insufficient.

      Understanding the mechanics of the climate is needed to distinguish between ‘random’/chaotic variation and underlying trend (thus the summary); we haven’t observed enough of it well enough yet to tell without that.

      I’ll be posting something about models and backfitting and predicting at some point.

      Cheers

  2. tamino said

    It is not true that “Understanding the mechanics of the climate is needed to distinguish between ‘random’/chaotic variation and underlying trend.”

    The kinds of random (/chaotic) processes which create false trends (the random walk is only one example) have statistical characteristics which are subject to purely statistical detection. Temperature data have been subjected to these tests, with the result that they don’t show those characteristic patterns. The temperature record for the last longer-than-a-century is sufficient to establish a genuine trend which is not an artifact of random or chaotic behavior.

  3. tamino said

    Monthly data for global temperature from GISS and from NCDC have 1554 data points, from HadCRUT3v have 1914.

    For an exploration of the likelihood that modern warming is only an artifact of long-term persistence in random fluctuations, see Zorita et al. 2008, Geophysical Research Letters, 35, L24706 doi:10.1029/GL036228.

  4. We have about thirty years of satellite nearly-global annual temperature measurements, and a further hundred years or so of increasingly less reliable older surface station measurements. I don’t think monthly adds anything to that scale, but I’m happy to include them.

    Zorita’s paper is here and their first hypothesis is based, it seems, on the likelihood of an already known feature in the dataset, the 1980-2000 warming: “what are the odds of that particular feature?” Feynman uses the example of seeing the numberplate “ARW 357” in a car park.

    Indeed in their response to Buerger’s comments on the paper (he calls it the Mexican Hat fallacy) they agree that the reasoning is a tautology, and return to comparing with a simple extract from models (Arrhenius predicted a tail end of rising temperatures). They assume a stationary climate, and it would still fail if run 20-30 years ago.

    Their second hypothesis requires much longer (proxied) temperature records and is not pursued in the paper.

    Cheers

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: