date: Mon, 16 Oct 2000 22:54:31 +0100 from: Tim Osborn subject: progress to: k.briffa@uea.ac.uk, p.jones@uea.ac.uk Hi Keith & Phil (a long one this, as I have an hour to kill!) We're making slow-ish progress here but it's still definitely v. useful. I've brought them up-to-date with our work and given them reprints. Mike and Scott Rutherford have let me know what they're doing, and I've got a preprint by Tapio Schneider describing the new method and there's a partially completed draft paper where they test it using the GFDL long control run (and also the perturbed run, to test for the effect of trend and non-stationarities). The results seem impressive - and 'cos they're using model data with lots of values set to missing, they can do full verification. The explained verification variances are very high even when they set 95% of all grid-box values to missing (leaving about 50 values with data over the globe I think). In fact the new method (regularized expectation maximization, if that means anything to you, which is similar to ridge regression) infills all missing values (not just in the climate data), which is interesting for infilling climate data from climate data, proxy data from climate data (see below). As well as the GFDL data, they've also applied the method to the Jones et al. data on its own. The method fills in missing temperatures using the non-missing temperatures (i.e., similar to what Kaplan et al. do, or the Hadley Centre do for GISST, but apparently better!). So they have a complete data set from 1856-1998 (except that any boxes that had less than about 90 years of data remain missing, which seems fair enough since they would be going too far if they infilled everything). We're now using the MXD data set with their program and the Jones et al. data to see: (i) if the missing data from 1856-1960 in the Jones et al. data set can be filled in better using the MXD plus the non-missing temperatures compared to what can be achieved using just the non-missing temperatures. I expect that the MXD must add useful information (esp. pre-1900), but I'm not sure how to verify it! The program provides diagnostics estimating the accuracy of infilled values, but it's always nice to test with independent data. So we're doing a separate run with all pre-1900 temperatures set to missing and relying on MXD to infill it on its own - can then verify, but need to watch out for the possibly artificial summer warmth early on. We will then use the MXD to estimate temperatures back to 1600 (not sure that their method will work before 1600 due to too few data, which prevents the iterative method from converging), and I will then compare with our simpler maps of summer temperature. Mike wants winter (Oct-Mar) and annual reconstructions to be tried too. Also, we set all post-1960 values to missing in the MXD data set (due to decline), and the method will infill these, estimating them from the real temperatures - another way of "correcting" for the decline, though may be not defensible! They will then try the Mann et al. multi-proxy network with the new method (which they've not done till now). They've given me the full data set, so we can do stuff with that later. I have, I think, all the programs needed for his old method, so we could still look at that on our own, but he's not keen on spending time on that while I'm here. I've swapped it for the MXD data set (the Hugershoff chronologies, and also the gridded, but uncalibrated, version of the Hugershoff chronologies). The gridded stuff was needed for the reconstruction efforts, because the 387 chronologies would all have had equal weight and we wanted a simple way to account for clustered groups - the gridded version that I made seemed the easiest way, even though that is the Osborn et al. paper that is yet to be written! What conditions do I need to place on subsequent use of the MXD chronologies/gridded data? That (i) we be informed of what they're doing with it; (ii) that Osborn & Briffa (and Jones?) be co-authors on any subsequent papers, and if the MXD dataset provides the core of the paper, then Schweingruber too? We will all be on the paper that comes out of the reconstructions I've just described, but I'm thinking about any future stuff they use it for. I hadn't realised (until just now) that their reconstruction program is so slow that it will take about 3-6 days to run each one! They have about 6-8 separate processors/machines so we need to get them all various runs going on at once. Even so, results are unlikely to be available to Friday morning (I leave Friday midday), or after I've got back home - this looks like being an ongoing thing! No need to reply to all/any of this - just thought I'd bring you up-to-date while I had some time to spare. Cheers Tim