cc: t.osborn@uea.ac.uk date: Wed, 19 Sep 2007 12:52:29 +0100 (BST) from: "Tim Osborn" subject: late review to: mark.new@ouce.ox.ac.uk Hi Mark, sorry for lateness, hope it is useful nevertheless. Submitted via online system, but copied here too: ---------- Review of Semenov, Latif and Jungclaus: "Is the observed NAO variability during the instrumental record unusual?" This paper compares the observed NAO variability with that simulated during a 1500-year simulations with a coupled ocean-atmosphere general circulation model. Others have reported related comparisons in the past, finding that at least some recent periods are outside the likely range of internal variability (whether that range is estimated via empirical methods or numerical climate models) and typically concluding that there is probably a contribution from some (natural or anthropogenic or combined) external forcing in explaining recent observations. Here Semenov et al. obtain similar results, namely that the observations slightly exceed the range of internally-generated climate variability (perhaps the exceedance is less than from other studies/models), yet they interpret their results to mean that they "suggest that the observed NAO variability...including the recent increase can be explained solely by internal variability". I don't think criticism of their interpretation should preclude publication, even though I would interpret them differently. If the observations are outside the 95% range of variability, or, as here, outside the range of the full 1500-yr sample of variability, but are not far outside that range, then some would simply conclude that unusual (i.e. externally-forced) variability had been confidently detected, while others (such as Semenov et al.) might conclude that there may be very little role for external forcing because internal variability *might* explain almost all of the observed variations. Really, however, it might be better to take into account the likelihood that strong changes occurred purely by chance during a strongly-forced period of time. We might then conclude that internal variability might explain some, or even a very large part, but not likely all (unless our models are deficient), of the observed increase in the NAO from the 1960s to the 1990s, and that there is probably some contribution from external forcings, though this contribution might be only a small fraction of the changes. My recommendation, therefore, is that the manuscript should be published in GRL, but that it should first be modified to indicate that it isn't really in any disagreement with previous work -- if should be possible to do this, while still accommodating the authors' tendency to focus on the possibility that external forcing may have a limited role to play. Certainly, however, the statements that are inconsistent with the results should be removed, such as the final sentence of the abstract which was quoted above. The results are rather difficult to see in Figure 3b, but the accompanying text states that the observed trend exceeds the range of simulated trends (and anyway we never do statistical tests on the extremes of a range, given that they are more sensitive to sampling variability and if distributions are asymptoting towards infinity then they may not be bounded, but higher values have vanishingly smaller likelihood). Readers who are not familiar with all the work in this area should not go away with the false impression that the observations are "well within the model variability" when they actually exceed model variability. Specific comments: (1) abstract: "observed multi-decadal NAO variations are well within the model variability" and also final sentence are misleading since the observed 1960s-1990s trend is bigger than any 30-yr trend in the model. (2) abstract: "highly non-stationary behaviour" is wrong, the authors have not tested to see if the variations are bigger than expected by sampling variability, so cannot be said to be non-stationary. Even if autocorrelated, doesn't mean they are non-stationary: a stationary autoregressive process still exhibits centennial variability. Either remove, reword or do a test to prove it! (3) abstract: any conclusions regarding contribution of internal variability and external forcing should have the attached caveat or condition that they depend on this single model (other studies have used multiple models). (4) overall: I couldn't see anywhere a statement regarding the season being analysed. Surely it isn't annual-means? I expect winter-means, but it must make this clear (e.g. Dec-Jan-Feb, Dec-Jan-Feb-Mar)? Some studies use the latter 4-month season and perhaps the strong contribution of March trends in the NAO would increase the unusualness of the trend if the current work used only DJF? (5) Page 6-7: although no signficant differences in *local* SLP variability are found (via the F-test), the authors have chosen (top of page 7) to compare the observations version (from HadSLP) that have the lowest interannual variability (s.d. 5.6 hPa) with the model (s.d. 7.1 hPa) and this difference is statistically significant using the F-test. The authors should note this significant difference and consider whether it affects the results, or whether the observed and simulated NAO series are normalised using their own standard deviations and hence whether this significant over-estimation of interannual SLP gradient variability by the model affects the overall findings? (6) Page 7, lines 13-16: inconsistent to say that the observed trend slightly exceeds all trends simulated during 1500 years yet at the same to say it is "not unusual". Also this would be the ideal place to give quantitative values to (a) the strongest observed trend and (b) the maximum, the 99th and 95th percentiles of the simulated trend distribution. How far above the 95th percentile the observations lie will help to interpret it's unusualness! (7) Page 8, line 2: see comment (2) above regarding use of "non-stationary" without an appropriate statistical test. (8) Page 8, line 10-11: the authors seem to imply that because the "observed variability is rather similar to the simulated", the previous rejection of an internally-generated trend at the 95% confidence level should be rejected. Not so. But the authors can of course choose to focus on the fact that some, or even most, of the trend may be internally-generated. (9) Page 8, line 21-22: terminology seems strange here. Usually it is the null hypothesis that is rejected, yet surely the null hypothesis was not that the PDFs are different (as implied here), but that they were the same. Is there 85% confidence that the PDFs are not different, or *only* 85% confidence that they the PDFs are different? I don't believe the former is correct. ----------