date: Thu, 28 May 2009 06:09:59 -0700
from: Darrell Kaufman
subject: Re: robust regression
to: "K.Briffa@uea.ac.uk"
Keith:
Thanks for your thorough reply. I've attached the latest version of
the manuscript.
I think we're in pretty good shape with our analysis, including the
calibration of the proxy data, but I would welcome your input at any
level. It would be great to explore the stability of the long-
timescale variation in the reconstruction, if you're up for that.
The calibration (Fig. S3) is based on 10-year-binned means. I ran the
regression with temp as the independent variable, then solved for temp
algebraically (SOM Note C).
The Spearman's ranked correlation coefficient is 0.88, compared with
0.89 for Pearson's. There might be a single, slight outlier, but the
reviewer confused outliers in the proxy records with outliers in the
calibration series. I will include the ranked correlation in the SOM
note.
I'm hoping to turn this around quickly, so please let me know if you
have any other suggestions.
Darrell
Attachment Converted: "c:\eudora\attach\2k synthesis v10.doc"
On May 28, 2009, at 3:40 AM, K.Briffa@uea.ac.uk wrote:
> Darrell
> First, well done indeed for getting this far in the review process.
> The
> points you make, on my reading here alone, are entirely correct. I
> do not
> though have access to my work computer until next week (for reasons
> too
> complicated to waste your time with). This means that I can not
> access the
> precise text or reviewer's comments (assuming you have sent these as
> an
> attachment to a previous message).
>
> Certainly any least squares regression will leverage the effect of
> outliers - whether this is good or bad depends on the situation - what
> can be more relevant is whether the regression is "forward" or
> "inverse"
> in the sense of the whether the regression model is formulated and
> calibrated with climate as the dependent variable or alternatively the
> mean proxy record, in the latter case with the equation then
> rearranged
> after calibration, to predict climate. Certainly the forward approach
> applied at annual resolution can lead to a biased reconstruction in
> the
> sense that the low-frquency variance is suppressed. There is,
> therefore,
> scope for applying several regression approaches and comparing the
> long-term variance produced. Having demonstrated a genuine temperature
> association using the inter-annual data (and only after having done
> so)
> it may be worth recalibrating the regression using decadally smoothed
> data (proxy and climate) to assess the time-scale stability of the
> regression coefficient and to directly explore the stability of the
> long-timescale variance in the reconstruction.
>
> The point you make regarding the OandS procedure seems fine to me.
> It is
> useful to establish whether the composite series is highly ifluenced
> by
> only one or two constituent series.
>
> The point about the rank correlation is presumably echoing the
> reviewer's
> worry that our data are not normally distributed and that this may
> influence the regression unduly. It is worth comparing the rank
> correlation against the usual Pearson. If significant difference is
> apparent it would be worth stating this in a note. Simply repeating
> the
> regression with the few large outliers in the predictand series set
> to the
> mean (and with the equivalent predictor values likewise) would test
> the
> leverage theory directly.
>
> I will be in work and able to look at this stuff in more detail next
> week.
>
> cheers
> Keith
>> Hi Keith:
>> The reviewer had a suggestion that I could use your help with:
>>
>> 7. Page 5, lines 93-96: It seems to me that the Osborn and Briffa
>> procedure does not add much in this instance. The concern is
>> apparently “leveraging” of the fitted regression equation by one or
>> several outliers. I think this is a relatively minor concern in this
>> instance given the number of decades that enter into the regression.
>> Moreover, it seems to me that the Osborn and Briffa diagnostic gives
>> weight to what is happening in the tails of the distribution of the
>> decadal values, rather than to the centre of the distribution, which
>> is where the signal of interest presumably lies. Thus I think it
>> would
>> be preferable to use a so-called “robust” regression procedure, such
>> as median absolute deviation regression, to assess the possible
>> influence that outliers may have had. Additionally, it may also be
>> useful to consider the use of simple rank-based procedures (e.g.,
>> calculation of the Spearman rank correlation coefficient).
>>
>> It seems that the reviewer was confused about the application of of
>> the O&B procedure. It doesn't concern the regression at all, but
>> instead, the influence of a few extreme proxy records on the
>> composite. So, in response, I will add a few words to clarify the
>> approach and explain to the editor where the reviewer went wrong. But
>> I wanted to double check that you agree with this. I also don't know
>> what the reviewer had in mind when s/he suggested using the Spearman
>> rank correlation coefficient.
>>
>> Any comments?
>> thanks.
>
>