## OUTLIER TESTING AND RESIDUALS

The following is mostly taken from HARVEY (1994), although any textbook on Least Squares adjustment or statistics would give similar information and advice. The residuals are given by:

 (9.1-13)

Residual testing in general assumes that the errors in observations, and the residuals, are normally distributed. Hence, before statistical tests can be applied it may be necessary to check/test that the residuals are normally distributed.

The familiar bell-shape of the Normal Distribution frequency curve indicates that relatively large residuals can be expected, although these should occur much less frequently than relatively small residuals. For example, 99.7% of all residuals should be less than ±3 times the "root-mean-square" value of the residuals (= the square root of the sum of squares of the residuals divided by the number of residuals), which can be considered an estimate of the square root of the variance (standard deviation) of the observations . Thus the chance of a residual exceeding 3 is very small. (This is the basis of the oft used "rule-of-thumb" that rejects any observation with a residual exceeding 3 times the standard deviation of the observations.)

If one or more of the residuals are significantly larger than either the other residuals in the set, or the residuals obtained from similar adjustments in the past, then it must be decided whether:

• the anomalous observation represents an observation at the extremity of the Normal Distribution, in which case it should be retained, or
• it is indicative of an observation containing a gross error (or blunder), known as an "outlier", in which case it should be rejected.

There is no clear cut boundary between a "small" error (expected in any observation, a "normal" occurrence!), and a "large" error which can be considered "unnatural". At what cutoff point is an error assumed to belong to a Normal Distribution (ND), or to an "Alternative" (unknown) Distribution (AD)? This cutoff point is known as the critical value (CV), hence below the CV the errors belong to the ND and above the CV the errors belong to the AD.

The CV is based on the standard deviation, hence figures such as 1.96, 2.58 and 3.29 correspond to probabilities of 95%, 99% and 99.9% respectively. The figure chosen for the CV will determine what percentage of good observations will be incorrectly rejected. If 2.58 times the standard deviation (99% confidence level) is selected as the CV, it is expected that 1% of good data is rejected (together with any observations with "true" gross errors) -- this is a so-called Type I error. The CV figure defines the level of significance of the test (=0.05, 0.01 and 0.001, corresponding to the confidence levels 95%, 99% and 99.9% respectively), and the probability of making a Type I is therefore a function of the CV (=5%, 1% and 0.1%, corresponding to a =0.05, 0.01 and 0.001 respectively).

Second type of false outcome of observation/residual testing is to accept a bad observation (that is, assume it belongs to a Normal Distribution), when it should be rejected (that is, it belongs to an Alternative Distribution) -- in the statistics literature this is referred to as a Type II error. The probability of making a Type II error is referred to as the power of the test (=0.30, 0.20 and 0.10, corresponding to probabilities 30%, 20% and 10% respectively). Hence, if the power is set to 20%, this means that there is a 20% chance of incorrectly accepting an observation that should have been rejected (or, an 80% chance of correctly detecting an outlier when one occurs).

The Alternative Distribution may also be a Normal Distribution, but with a different mean and standard deviation -- see Figure below. This would be the situation if the observations are systematically biased in some way. These observations may still be considered outliers.

Residuals may belong to either a biased (RH) or unbiased (LH) Normal Distribution.

Residual testing is usually carried out not on the residual itself, but on a dimensionless quantity known as the "normalised residual" ui = vi/vi, where vi is the square root of the diagonal of the cofactor matrix of the residuals Q ( eqn (7.1-13), in the case of the Least Squares parametric method ) :

 (9.1-14)

When observations are unbiased (that is, contain no gross error), the normalised residuals are centred around the lefthand ND (see Figure above). This observation is accepted within the band set by the choice of the level of significance (here 1%, or 0.5% either side of the mean). However, if the observation is biased then the normalised residual will also be biased, and its distribution will be centred around another mean (righthand ND). There is still a chance that the value of the anomalous normalised residual will fall within the band between -2.58 and +2.58 standard deviations of the mean of the unbiased residuals, and would be incorrectly accepted as an unbiased residual (and therefore an unbiased observation). The probability of this happening is 20%. The value is referred to as the upper bound (UB) and its magnitude is the sum of a and b, where a is a function of the parameter , and b is a function of the parameter . For example, if =0.01 and =0.20, then a=2.58 and b=0.84, resulting in =3.42.

How to detect an outlier? Apart from using the above mentioned "rule-of-thumb", there are several statistical tests that can be applied to the residuals, which do not require a modification of the secondary adjustment process. The two most common outlier detection techniques are (CROSS, 1983; HARVEY, 1994):

Baarda's Data Snooping method:

• Ql l is known.
• Compute the normalised residuals: ui .
• Considers both the significance level ( ) and power ( ) of the statistical test.
• Observation i is an outlier if ui > 3.42 (=0.01, =0.20).
• Assumes that only one outlier occurs at a time, and that all the observations are uncorrelated.

Pope's Tau Test:

• Assumes that the VCV of the observations Ql l is not known reliability.
• Compute the standardised residuals: ui = vi/si, where si is the square root of the diagonal of the cofactor matrix of the residuals Q multiplied by .
• Observation i is an outlier if ui > o/2,n - u, a Tau distribution where (n-u) is the degrees of freedom (n is the number of observations, u is the number of parameters), and o= 1-(1-)1/n ( is the significance level).
• The value of depends on the degrees of freedom, the number of observations and the significance level. However, the main effect is the magnitude of (n-u), for if it is small then is about 2 or 3, otherwise it is about 4.

The following comments can be made regarding the detection, and subsequent elimination, of observation outliers in GPS Least Squares secondary adjustments:

• The impact of a certain size of error on the baseline component parameter estimates may vary significantly. For example, a very "sensitive" adjustment will not tolerate even a small observation error because its effect is amplified on the parameter estimates. On the other hand, even quite large errors may be tolerated if they have only a marginal effect on the adjustment.
• Marginally Detectable Errors (MDE) is the term used for the magnitude of the error can be only just marginally detected as an outlier, and hence any error smaller will probably remain undetected (and still affect the results). For example, if the outlier test flags residuals greater than 3, then this is the MDE and anything smaller will not be detected. If the power of the outlier detection test is 20% then MDE is the smallest size of an outlier that it is possible to detect at the 80% probability level, which according to Figure 9.1-5 is 3.42s (where = vi , the square root of the appropriate diagonal element of the matrix Q in eqn (9.1-14)).
• The greater the uncertainty in the estimate of the residual (the larger the value of vi), the more difficult it is to detect small outliers, and hence the larger the value of the MDE for that observation. For example, in the case of one observation when outlier detection is carried out with a level of significance of 1%, there may be a 20% chance of an outlier of 10cm (or less) remaining undetected. Yet for another observation the value of the MDE, based on the same level of significance and power of the test, may be much greater, say 50cm.
• The sensitivity of an adjustment in detecting outliers is referred to as the "internal reliability". The internal reliability of an adjustment is therefore quantified by the MDE, and a highly reliable adjustment is one where the MDEs are very small. An overall measure may be taken as the largest of the MDEs.
• A measure of the "external reliability" of an adjustment is obtained by calculating the effect of an undetected outlier on the estimated parameters, assuming that the magnitude of the outlier is equal to the MDE of the corresponding observation. In effect, a new adjustment is carried out n times (the number of observations), each time propagating the MDE of the observation. In this way, an external reliability vector (each of dimension u, the number of parameters) is generated for each observation. An overall measure may be the worst case, that is, the largest effect caused by an MDE (which may not necessarily, though usually is, that due to the observation with the largest MDE).
• An alternative approach is to define the external reliability tolerance, the maximum bias in the estimated parameters that is acceptable, and eliminate only those observations that cause the external reliability to be greater than the specified tolerance.
• In the case of a single outlier in a GPS adjustment, representing one baseline component, it would be necessary to delete not just that component "observation" but the entire baseline (that is, all three components). A further problem is that the three baseline "observations" are correlated, making statistical testing unreliable.
• In the event that there are multiple outliers in a population of residuals, the best strategy is to iterate the outlier detection procedure, commencing with the largest residual and if it is flagged an outlier then it is removed from the vector of observations, the adjustment is re-run, and the residual testing is carried out again.

Not mentioned here are procedures based on modifying the Least Squares adjustment process itself, to either make it easier to detect outliers or to make the adjustment procedure less sensitive to the presence of outliers, as in the case of "robust" Least Squares.

`	`