Gross Error Detection Using The L1 Norm Method In A Large Survey Data Set

NSW LPI GNSS Vectors - Central West Network Segments

A subset of an L1 adjustment on the LPI NSW GNSS network is displayed to the right.

Click on a baseline in the map to see residual information.

If the map does not appear, refresh the page or contact the webmaster

For a complete kml of the NSW GNSS network as provided by LPI NSW click here.

Note that all measurements in the kml are precise only to eight significant figures and thus should not be relied upon for any geodetic surveys.


This dissertation investigates computational methods based on the minimisation of the L1 norm of residuals for the identification of outlier observations in a datum network comprised of GNSS baselines. The Barrodale & Roberts[2] algorithm based on Dantzig’s[7] Simplex method was examined in depth and the computational complexity issues that arose from large data sets necessitated an investigation into a network segmentation-based compromise and the application of interior point solvers for linear programming in the minimisation of the L1 norm of residuals in very large data sets. The dual of the L1 norm minimiser primal model was finally investigated and successfully applied to yield a solution in thirty seconds using an implementation of Mehrotra’s[25][21] Primal-Dual Interior Point Solver as opposed to more than six hours for the same problem model on the same hardware using a simplex solver. Further analysis of the segmentation-based solution was not deemed necessary once a fast interior point method was successfully applied.

1  Introduction

1.1  Literature Review

Gaussian parametric weighted least squares (WLS) is still the most widely known method of obtaining both a best fit for a survey network adjustment and reliable quality parameters for that fit. In the context of a very large set of measurements such as that for a national datum, where thousands of parameters are to be estimated, WLS is still the first choice for obtaining coordinate and quality parameters. However, when the volumes of data are large and the quality of the meta-data is sometimes uncertain, the result from a WLS adjustment can be equally uncertain. The analyst will often be required to employ supplementary quality control techniques to assist in cleaning a set of data.

Of the existing methods for the detection of outliers, the L1 norm minimiser using both simplex[2] and an interior point[8] algorithms was assessed for the ability to detect real and simulated outlier observations in large data sets. This was compared against the other more conventional least squares Baarda test[1] and misclose analysis as explored by Katambi et al[17].

The mechanism of the simplex method as applied to a L1 minimiser in geodetic networks is explained in Chapter4 followed by a discussion on an adaptation of an interior-point optimisation algorithm[8][31] which in practice can be more computation-time efficient on large sets of data but has the unfortunate aspect of being less numerically stable[5].

1.1.1  Problem Scope

The ongoing effort to source historical geodetic measurement data from archives[13] for the next Geodetic Datum of Australia has resulted in a large set of measurement data containing 66,000+ GPS baseline measurements that will estimate the Earth-Centred, Earth Fixed (ECEF) positions of 20,000+ class A survey marks. Characteristic gross errors in measurements have been encountered and continue to arise as outliers in existing data as new data is added[12].

A more rigorous pre-processing step prior to the final least squares adjustment of the GPS baseline network was requested to reduce the repetitive aspect of individually testing each outlier as it appears in a large adjustment. The author trialled several methods for simplifying the problem and succeeded with a residual analysis technique using a proposed adjustment-per-measurement method that approximates redundancy in the whole network with a "redundancy neighbourhood" of fixed depth from the measurement in question. The method is detailed in Chapter 7.

1.1.2  Reliability

The focus of an argument for using L1 Norm minimisation for outlier detection hinges on reliability. As discussed by Baarda[1], Pope[26] and subsequently many others[27][15], the internal reliability of a model design is measured by how large an error must be before it is detectable. As Pope notes[26]:

the traditional objections to least-squares adjustments of triangulation; that they cannot be relied upon to localize the error and instead "Spread it around"

which identifies two critical aspects in data snooping:

  1. The initial detection of an error
  2. The localisation of the error.

While an examination of measurement cycle miscloses will reveal the existence of most gross errors, the localisation of these errors requires more than a misclose analysis. This is explored in Chapter 6.

1.1.3  Previous Research

Although the L1 norm minimiser has often been suggested as an auxiliary quality control step [15][14][18][24][22][24] there is currently no widely known commercial solution for dealing with large networks of survey data.

Harvey[15] also notes the application of L1 norm minimisation to identify non-measurement errors in a survey adjustment such as datum defect (also known as rank defect). This results from the robustness of the simplex method in optimising an under-determined system of linear equations[8].

1.2  Research Aims

This thesis covers three major areas in the problem of finding errors in input observations for a geodetic datum using L1 Norm estimators as described by Cen et al [6] and Marshall [22]. These are

  1. Provide a solid theoretical background for newcomers to the L1 norm minimisation model.
  2. Evaluate the time performance of various L1 norm minimiser algorithms on survey data.
  3. Find a well-balanced practical solution to the computational time intractability issue arising in algorithm performance on large data sets such that the desired output information on measurement residuals is preserved.

The theoretical basis for L1 norm is a large area which encompasses the mathematical model, the various computational methods and the proposed application of the output. This task will be broken down into the following subsections

  • The formulation of a linear program of the L1 norm minimiser.
  • A worked example using modified simplex method analogous to Barrodale and Roberts’ L1 algorithm[2][3].
  • The same example problem using a simplified interior point method based on Dikin’s affine scaling algorithm[8][31].

The performance in time, correctness and convergence of the below solvers was tested. The source of the data is the complete set of NSW GNSS baselines provided by LPI unless otherwise stated. See footnotes1 2 3 4 for references.

Software Solver Type Time Complexity O(n) Time Taken
Barrodale and
Roberts FORTRAN [2 ][3 ]
Primal Simplex Exponential Very slow1: > 1 second
Mathematica Simplex Exponential Slow1: > 1 second
Mathematica Primal-Dual IP Polynomial Fast1: < 0.01 seconds
Matlab CVX[11] Primal-Dual IP Polynomial Fast1: < 0.01 seconds
Dikin Affine Scaling[8][18] Primal Affine IP Not known Did not converge1
GLPK2[21] Simplex Exponential 6 hours3
GLPK[21] Primal-Dual IP4 Polynomial 30 seconds3

Table 1: Performance of all of the tested linear programming solvers using L1 models examined in this thesis.

The objective of research in this thesis was to find a L1 norm minimiser algorithm to operate on a very large data set where the number of parameters m exceeds 10, 000. Therefore the very real problem of computational time intractability was examined. Prior to finding an acceptable solution using GLPK, a proposal for a segment-based solution with the intention to preserve the validity of the output L1 norm residuals with respect to the whole of a large data set was investigated in Chapter 7. The naïve approach was to evaluate a complete subset of data from the larger set with the hope that all of the information on residuals (observed - calculated) relative to the subset is completely relevant to the original large data set. This approach was abandoned once the 30 second solver was found and applied but the investigation and the work on finding a complete method of segmenting the network was preserved for the possibility of future research.

1.2.1  Assumed Knowledge

It is assumed that the reader is familiar with the principles of weighted least squares as applied to a geodetic network adjustment, with specific knowledge of the formulation of the Jacobian matrix A and the O C (observed minus calculated) vector b. More importantly, it is assumed that the reader understands the definition of a residual with respect to a least squares solution. For more information, the recommended reference text is Harvey’s Monograph 13[14]. A brief review of this material is provided in the early sections of chapter3

2  Outliers and Gross Errors in Geodetic Survey Networks

2.1  Appearance of Outliers in Existing Data

The appearance of outlier measurements in a network are often in one or more of the following forms:

  • Wrong standpoint errors are where observations are taken from a witness mark near a trig station but this detail is missing from meta-data.
  • Height booking errors describe incorrectly booked instrument height meta-data.
  • Station name errors are measurements where one or both of the station numbers in the meta-data are incorrect, causing either
    1. Skew in the network as a function of the measurement length (similar to wrong standpoint errors).
    2. Separation of the station into two nearby points that therefore reduces redundancy and weakens the network at that point
  • Deformation of the network over consecutive epochs.
  • Incorrect ambiguity resolution where range estimates to satellites are incorrect in all coordinate components. A common cause is multi-path bias which is localised to individual stations.

The effect of gross errors on a least squares adjustment has been shown to "spread" the error across two or more measurements when there is enough redundancy to do so[14]. In the case where gross errors can be small in magnitude relative to the magnitude of the measurement, such as a height booking error that is incorrect by 100mm over a GPS baseline of 50 km, it is possible for a significant proportion of that error to be spread over the residuals of several neighbouring measurements.

2.1.1  Examples

Haasdyk et al[13][12] note examples of incorrect ambiguity resolution, where the length of the range to satellite(s) is therefore incorrect, and affects all coordinate components. These manifested in very small (sub-metre and often sub-decimetre) gross errors that are sometimes undetectable in constrained local-neighborhood adjustments that rely on a "perfect" Class 2A0 control network.

In a related issue, multiple processing of the same observations necessitated by observing outliers when fitting a local neighborhood adjustment to a larger neighborhood using a wider control network often resulted in several different "versions" of the same baseline that are significantly different[13]. When using such measurements in a very large unconstrained network adjustment, the ability to "choose" the correct measurement is compounded by the quality of the surrounding network.

Observations with station name errors were sometimes easy to spot when the measurement appeared as a significant outlier that traversed an unlikely distance. For example in Figure 1 the measurements highlighted in blue clearly shows measurements where one or more station names have been taken from the incorrect trigonometric station, often several hundred kilometres away.


Figure 1: Large station name errors are sometimes easy to isolate

In contrast, a station naming error that does not refer to an existing station but instead "creates" a new, nearby station in the adjustment can be difficult to identify in a large adjustment. For example, in Figure 2 the station TS5110 appears as two stations — one at the apex of most measurements and the lower line is drawn terminating in its nominated location at TS5110-1. When cross-referencing with existing station names it appears that it is equivalent to TS5110-2.

PIC (a) Station name errors can negate redundancy in network design PIC (b) Replaced marks require renaming to avoid propagating bias in an adjustment

Figure 2: Station naming errors

Other examples related to the above as noted by Haasdyk in correspondence and in [13] are:


TS5458 = TS12033-3 Same mark
SS36065 = TS6418-4 Same mark
TS5458 TS12033  2mm apart
(Destroyed) TS10533 TS10985 20mm apart

2.2  Review of Outlier Detection Methods

Caspary [4] notes that there is a difficulty in using the least squares vector of residuals v̂ = v QvPiΔi = v qvpiΔi (where Δi is the gross error on the ith measurement) in the detection of gross errors as outliers because the whole vector of residuals is contaminated by Δi . Roughly speaking, given a large system of parameters where the likelihood of gross error contamination is high, the number of marginally detectable errors can also be proportionally high.

The sensitivity of an adjustment in detecting outliers is referred to as the "internal reliability" [27]. As stated by Rizos [27], the internal reliability of an adjustment is quantified by the set of marginally detectable errors. A large set of MDEs indicates low internal reliability, and conversely the smaller the set of MDEs, the higher the internal reliability.

A common method of outlier detection in a least squares adjustment is Baarda’s data snooping method [27]. This method assumes the quality of the measurements is known reliably and, more importantly, it assumes that only one outlier exists in the set of observations. In everyday use on small networks this method works well because there are generally relatively few outliers present. However, when considering large and highly interconnected networks, this data-snooping method can sometimes fail due to bias introduced by outliers on neighbouring measurements.

Another commonly used method is Pope’s Tau test, which is primarily used when measurement quality is not reliably known. Instead of normalising residuals using a priori weights, the Tau test first computes the standardised residuals using weights svi derived from the variance co-factor (VCV) matrix of the residuals computed from a least squares adjustment once it has converged. Each standardised residual is

ui = vi svi

where svi is σvi scaled by the square root of the variance factor of the adjustment, which is

svi = σvi × vT Pv n u

where σvi is the square root of the ith diagonal entry in the co-factor matrix of the residuals Qv.

The Tau test then compares each normalised residual with the Tau value

τα0 2 ,nu


a0 1 (1 α) 1 nu

2.2.1  An Alternate Test for Outliers

The need for a robust method to detect more than one outlier in a large set of data, as justified above, has already been addressed in the form of an alternative adjustment. Rather than use the L2 norm, that is the square root of the sum of squares of residuals, researchers such as Branham[3] argue that a statistically more likely method of revealing outlier observations reliably is by minimising the L1 norm of the residuals, that is the sum of the absolute values of the vector of residuals

Minimise  ivi1 where  v = b Ax

The choice of the L1 norm minimiser over the least squares (that is, the L2 norm) minimiser is precisely because it does not spread the magnitude of the residuals over the neighbouring measurements as evenly as possible. This poses the question: Why does the L1 norm minimiser not spread residuals? The answer can be roughly explained by looking at the effect of each on large and small residuals.

Say in a loop of height measurements, one measurement h1 has a gross error of 2m. All measurements have an expected variance of ± 100mm and neighboring measurements have no gross errors. Mathematical method aside, a least squares adjustment would "consider" the magnitude of a single residual vh1 = (2)2 = 4 alone versus the possibility of assuming that this error magnitude was divided equally between h1 and a neighboring measurement h2 such that the error in each is assumed to be vh1 = h1 E(h1) = 1m and vh2 = h2 E(h2) = +1m. In the latter case the L2 norm is

vh12 + v h22 = (1)2 + (1)2 = 1 + 1 = 2

which is clearly smaller than the first case. A least squares minimiser will not stop there though - it would then consider spreading the error in h1 and h2 to their neighboring measurements, and so on until all error has been spread as evenly as possible such that the L2 norm is at a minimum.

2.2.2  Penalising Large Residuals

In the discipline of mathematical optimisation this phenomenon where many smaller residuals are preferred over few large residuals would be referred to as penalising large residuals. This is precisely what least squares does as a by-product of obtaining a best fit because it assumes that there are no outliers in the data. However, data containing several outliers necessitates a minimiser that does not penalise residuals by magnitude.

Using the above example, say an L1 norm minimiser "considers" the 2m error as vh1 = h1 E(h1) = 2 and all other residuals as negligible vhi <𝜖, i1. It then considers spreading the error to a neighboring measurement h2 such that vh1 + vh2 = 2. The L1 norm of the latter is

|v1| + |v2| |v1 + v2| = |2| = 2

Here we can see that "spreading" the error across two residuals will result in a L1 norm that is greater than or equal to the L1 norm of restricting the error to the residual of one measurement. Hence, any L1 norm minimiser would prefer the case where the residual error is assigned to as few measurements as possible.

The next chapter will elaborate on the analysis of the residuals of an L1 norm minimiser with some example using geodetic observations to illustrate the relevance to this problem.

3  Estimator Models

The introduction to estimator models is discussed in Chapter 3 of the thesis.

3.3  The L1 Norm Estimator

3.3.1  Sampling Distribution of L1 Norm residuals

Monte Carlo experiments conducted by Marshall[23] indicate quite clearly that the sampling distribution of L1 norm residuals is not the Gaussian (normal) distribution but instead exhibits a very large spike at the first moment. Branham[3] states that this sampling distribution corresponds to a modified exponential (or alternatively, Laplace) probability density function

P(X) = 1 2heh|X|

This is a modified form of the exponential distribution, which in general form is

P(X; μ, σ) = 1 2σexp(2|X X| σ )

For h = 1 the function P(X) is graphed in Figure 5.

L1 Sampling Distribution

Figure 5: The sampling distribution of residuals for the L1 norm minimiser where h = 1.

By maximising the log density function logP(X), the first and second moments are

μest = E(X) = median (7) σ2 = E(X2) = 2 N i=1N|Xi μ| (8)

The distortion of the first moment with respect to outliers is restricted to movement by one "data point" rather than any proportion of the magnitude of the outlier. Assuming an equal distribution of positive and negative outliers and residuals, the first moment will be zero. If the distribution of residuals of a L1 minimiser happen to be biased such that they are mostly all positive or all negative, this would imply the model assumptions are incorrect. In a minimally constrained adjustment, this should not occur.

3.3.2  Minimally Constrained Models

As noted by Harvey[14], outlier detection is often done on a minimally constrained model. Ideally, no distance scale would be enforced by the measurements such that the resultant residuals would contain minimal bias. A minimally constrained adjustment is more likely to have an average error of zero.

Likewise, the L1 minimiser that is used for quality analysis on a geodetic survey network could be applied successfully using a minimally constrained model.

3.3.3  Internal Reliability

Marshall[23] investigated the bounds of internal reliability (the relative magnitude of measurement error before it is statistically flagged as an outlier) using Monte Carlo methods. By using a consistent a priori standard deviation on all measurements Marshall notes that the primary deviation in internal reliability is caused by variations in network geometry. In his horizontal network experiment of 6 stations (12 parameters) and 15 distance measurements he observed that across 100,000 Monte Carlo simulations where random bias was introduced into the measurements, one particular measurement (observation #9) was never flagged as an outlier in an L1 adjustment. Figure 6 below shows measurement #9 in Marshall’s trilateration network which clearly shows that there is enough redundancy for the four parameters (coordinates of stations Δ1 and Δ5) which it is estimating. It was the view of this author that there was perhaps another reason for the lack of outlier information output by the adjustment, which judging by the degrees of freedom (redundancy) in the network is likely due to there being not enough redundancy to adequately locate outliers for all measurements.


Figure 6: The network used by Marshall[23] for internal reliability analysis using Monte Carlo simulation.

It is hypothesised by the author but not proven that the internal reliability of a network in the L1 norm increases with an increase in redundancy. For measurements that can have a reverse or check measurement e.g. an EDM distance from Δi to Δj and another from Δj to Δi it is trivial to see a gross error in one of these measurements (although not trivial to know which measurement contains the gross error without further measurements). This presents the analyst with a dilemma: An adjustment on the whole network would likely have better redundancy than an adjustment on part of the network (given that the whole network is connected) but there are far more computations to perform in an adjustment of the whole network than a part of it. In fact, as discussed in chapter 4 the relationship between number of measurements and number of adjustment computations is in some cases exponential, which is troublesome when the network is very large.

3.4  Weighted L1 Minimisation

Due to measurement noise there is always the need to weight the importance of a measurement based on an a priori assumption of precision. GPS vectors at the NSW LPI are typically given a standard deviation of ±(10mm + 7ppm) for least squares adjustments. This a priori assumption is carried into L1 norm models but the scheme for weighting the model differs in the method.

Minimise  i=1m𝜖i2 σi2 (9)

In least squares, the method allows a priori weighting of observations by inserting the square of each weight σi2 corresponding to measurement gi as entries in a diagonal m × m matrix Q.

Q = σ12 0 0 0 σ22 0 0 0 σn2

By taking W = Q1

x̂ = (ATWA)1ATWb

In the L1 model, the objective function is

Minimise  i=1m|𝜖i| σi (10)

Due to the non-linearity of the objective function, the formulation of the problem must instead rely on a linear program, and thus the inverse matrix of squared weights would be applied as described by Harvey[15] as follows

W1 2(Ax̂ + 𝜖) = W1 2b

3.6  L1 Norm Minimisation Formulated as a Linear Program

The first step in formulating the L1 norm minimisation problem as a linear program is to recognise that the objective function z = |vi| is not linear. To make it linear, and to also satisfy the non-negativity constraint that linear programs typically require3 the objective function can be reformulated to be

z = i=1n(u + v) = i=1nu + i=1nv (11)

Where a residual is u v such that u 0 and v 0. These constraints satisfy u + v = |u v|.

The normal (or model) equation for each parameter would be the same formulation as for least squares problems but with the residual included as a model parameter. For example, a height difference would be

ĥi = h2 h1 + ui vi

Similarly for least squares, each element in the Jacobian matrix for the L1 norm minimisation is a partial derivative of the model equation (in order to linearise the problem). Hence for the height example the Jacobian matrix is already linear, that is

ĥi h1 ĥi h2 ĥi ui ĥi vi = 1 1 1 1

For non-linear models a Taylor series of one or two terms is usually sufficient to achieve convergence given good enough a priori coordinates or parameters [14].

3.6.1  Error Estimates

Since the use of the L1 norm adjustment in the context of this thesis is to highlight outlier observations, the question on whether to weight the observations prior to the adjustment must be specifically asked with regards to the desired outcome. In a GPS baseline network, there will exist very short lines of only a few metres and very long lines of hundreds of kilometers.

The separate issue of weighting the residuals should always be done according to the expected quality of the observations. An L1 norm adjustment will typically be done prior to a least squares adjustment, and so the scaled residuals that are an output of "tuning" a least squares adjustment will often be unavailable to the analyst. A parts-per-million component is often the a crucial normaliser for residuals because of the large variation in accumulated error over long distances, for example the scale in long GNSS baseline vectors.

3.7  Example: Similarities Between Loop Mis-Close and L1 Norm Residuals

As stated already, the benefit of the L1 norm minimisation is primarily the analysis of residuals. In a survey network adjustment with equally weighted observations (all σ = 1) a direct comparison of the residuals to loop mis-closes yields strong similarities.

Using an example of just height data from a NSW RMS survey on the Sydney Anzac Bridge (see Figure 7), the results obtained from B. Harvey’s FIXIT4 of L1 raw residuals versus mis-closes is shown in Table 3. It can be seen that in this example, the loop mis-closes and the magnitude of the L1 residuals for particular observations on the loops co-incide, specifically the residuals |2.3|mm and |1.8|mm on lines 2-3 and 5-2 respectively.


Figure 7: Horizontal geometry of RMS Data from the Sydney Anzac Bridge as output by the FIXIT4 program. (Data sourced from correspondence with B. Harvey)

From To dH (m) sdv (mm) v (mm)

Hei 1 3 17.5863 1.5 0.0
5 3 25.7134 1.5 0.0
2 3 18.1929 1.5 -2.3
2 4 18.2125 1.5 0.0
5 4 25.7353 1.5 0.0
5 2 7.5246 1.5 -1.8

Table 2: Height difference observations of RMS survey(correspondence with Harvey) with L1 residuals v

Height loop

4.1mm 2 3 5 2
1.8mm 2 4 5 2
-2.3mm 2 4 5 3 2

Table 3: Loop mis-closes of RMS height network

An L1 minimiser identifies the measurements which have the propensity to introduce the largest residuals and excludes those measurements from the computation of the parameters. In this example height network, these residuals have the same magnitude as loop mis-closes because there are no correlations between input observations and input standard deviations are being ignored.

3.8  Linear Program Formulation of the L1 Norm Estimator

Continuing from Equation 11, each model equation takes the form

W1 2Ax W1 2b v = 0 (12)

Note that x and v are vectors of parameters any of which can be positive or negative. This results in the problem being unbounded in the negative direction and as such this formulation needs modification before it can be used in a linear solver. By taking two positive variables u, w 0 we can say that each of the residual parameters can represented by v = u w. This also applies to the coordinate parameters x = x+ x, x+, x 0. By generalising, with coordinate parameters represented by xP± and residual parameters represented by xR±, a linearised formulation of the L1 norm estimator is

Minimise z = eTx R+ + eTx R + 0x P+ + 0x P Subject to BTx = I I A A xR+ xR xP+ xP = b xP+, x P, x R+, x R 0 eT = [11]

This is very similar to the formulation in Equations 3.24 on page 71 in Dantzig et al [7] except that the basic and non-basic variables xB and xN have been replaced with the parameters and residuals xP± and xR± respectively.4

Note also that the objective function has the coefficients effectively switched, so that instead of

z = 0xR± + cTx P± (13) (14)

we have

z = eTx R± + 0x P± (15) where eT = [11] (16)

This is an explicit formulation of a linear program but it is still solvable by any variation of Dantzig’s simplex methods (primal, revised, primal-dual etc) and by interior point methods for linear programming problems (explored in Chapter 4).

4  Algorithms For The L1 Norm Estimator

This chapter will begin with Section 4.1 describing the Simplex algorithm followed by an example demonstrating how it works on survey height data. Subsequently the concept of interior point methods will be introduced and applied to the same example.

4.1  Search-based Simplex Algorithm

Use the standard form of the L1 problem from Chapter 3, replicated below:

Minimise z = eTx R+ + eTx R + 0x P+ + 0x P Subject to BTx = I I A A xR+ xR xP+ xP = b xP+, x P, x R+, x R 0 eT = [11]

To re-iterate, the notation xP±, xR± refers to the strictly positive nature of the parameters. This is especially pertinent in non-linear problems where at each iteration the delta x is estimated and is often negative. When a coordinate (or delta) has a negative value, the parameter xi contains the absolute value and xi+ = 0.

A search must always start somewhere, and in a linear program (LP) this is the canonical form.

4.1.1  Canonical Form and Basic Variables

As noted by Dantzig[8] the primal simplex method finds the optimal solution by a sequence of pivot steps from the original system of equations. Initialisation of the algorithm is achieved by converting the problem from standard form to canonical form. This is where the current solution is a basic feasible solution and:

  1. The objective coefficients to all the basic variables are zero, that is, the objective function contains only non-basic variables;
  2. The constraint matrix for the basic variables form an identity matrix (with some permutation if necessary). It’s easy to see whether or not the current Basic Feasible Solution (BFS) is optimal or not, if the LP is in canonical form

The objective function

z = cTx = eTx R+ + eTx R + 0x P+ + 0x P

and constraints

BTx = I I A A xR+ xR xP+ xP = b

permits the initial basic feasible solution

xP± = 0 xR± = b Ax = b z = xR+ + xR

4.1.3  Exponential Computational Time Complexity

Before continuing, it is necessary to discuss the computational time complexity of the algorithm proposed by Barrodale and Roberts[2] and the simplex method in general.

Briefly, computational time complexity of an algorithm is generally assessed with regards to the rate of growth of computational time with respect to the size of the input. This time complexity is typically described using three metrics - minimum time or best-case time complexity f(n) = Ω(g(n)), average time complexity f(n) = Θ(g(n)) and worst-case time complexity f(n) = (g(n)). f(n) = Ω(g(n)) represents the lower bound function for the run time of an algorithm f(n) where n denotes the size of the input (typically an array). f(n) = Θ(g(n)) represents the run time of an algorithm f(n) when averaged over all permutations of an input of size n. This metric is a good indication of how the algorithm will behave in the real world using real measurements, however we should always consider the worst-case time complexity f(n) = (g(n)) where g(n) is the upper bound of the rate of growth of computational time of f(n). This latter metric describes just how poorly the algorithm can perform in the worst case.

When considering the simplex method to solve for the L1 Norm estimate as proposed by Barrodale and Roberts[2] and its application in this thesis in finding outliers in a large set of GPS observations, the size of the input will always be large (over 100,000 measurements). Thus if the rate of growth of computational time in Barrodale and Robert’s L1 method [2] is non-linear, it is of great importance to consider time complexity.

An unfortunate aspect of the simplex method on which the Barrodale and Roberts L1 algorithm is based is, while it performs well in most cases, there exists a set of inputs which make the algorithm run in exponential time. Using the notation described above, the average case Θ(g(n)) is polynomial and often linear but the worst-case (g(n)) is exponential, i.e. g(n) = abn for some constants a and b. Exponential time complexity was shown by Klee and Minty [19] to exist in cases where a hyper-cube of constraints can lead to 2d 1pivots where d is the dimension of the hyper cube. See the Appendix A.

4.1.4  The Dilemma

As was raised briefly in Chapter 3, there is a desire for the analyst to preserve as much redundancy as possible in order to increase the likelihood of properly detecting all gross errors in an adjustment. Since the problem of exponential time complexity is very real in the application of Dantzig’s simplex method on large sets of data (as experienced by the author in practice) this presents a dilemma.

To balance the desire for maximum redundancy with the desire for reasonable computation time, two approaches can be taken. The first is to find a way to reduce the size of an adjustment whilst preserving maximum redundancy. The second approach is to find a way to reduce the time complexity of the algorithm used to perform the adjustment. The next section and Chapter 5 address the latter approach, and Chapter 7 attempts to address the former. The two approaches need not be exclusive.

4.2  Interior Point Algorithms

Recent developments in the field of optimisation and operations research have yielded several new methods for solving linear programs other than Dantzig’s[7] simplex method. Since the publication of Klee and Minty’s [19] examples of worst-case exponential computational time, researchers have made developments in the field of ellipsoid algorithms employed by Leonid Khachiyan in 1979 that run in worst-case polynomial time. Although in the average case the simplex method far outperforms early ellipsoid methods, the idea of iteratively moving through the interior of a problem space to converge at the optimal solution at a boundary yielded a ground-breaking result less than 30 years ago.

In 1984 Karmarkar published a paper on an interior point method[16] where feasible solutions are ’projected’ from within a convex solution space to the best feasible solution. The worst-case computational time was determined to be (n3.5), and that in practice it often performed faster than the simplex method on large sparse problems.

The method described below is is an earlier interior point algorithm known as Dikin’s Affine Scaling Interior Point Method[8], first published by Dikin in 1967 with a proof of convergence in 1974. This algorithm does not possess the convergence properties of Karmarker’s algorithm but does demonstrate the concept of the central projections.

4.3  Dikin’s Primal Affine Method

From [8] the following algorithm summarises the Primal Affine Method for finding an interior point solution to a linear program.

Assume that an initial feasible interior point solution x = x0>0 satisfying Ax0 = b is known. The steps of the algorithm are as follows:

  1. Initialize Counter: Set t 0.
  2. Create D: Set D = Diag(xt).
  3. Compute Centering Transformation: Compute  = AD, ĉ = Dc.
  4. Determine the Projection Matrix: Compute P = I AT(AAT)1A.
  5. Compute Steepest Descent Direction: Set pt = Pc.
  6. Set Θ = minjpjt.
  7. Test for Unbounded Objective. If Θ 0.0 report the objective as being unbounded and stop.
  8. Obtain xt+1: Compute
    x̂t+1 = e + (α Θ)pt

    where e = [1, 1, , 1]T and α is set to a number strictly between 0 and 1. Typically, α is set to be between 0.5 and 0.95 for primal affine methods (in most implementations it is set between 0.9 and 0.95).

  9. Transform Back to Original Space: Compute xt+1 = Dx̂t+1.
  10. Termination Check: If xt+1 is "close" to xt, report xt+1 as "close" to optimal and stop.
  11. Set t t + 1 and go to Step 2.

4.3.1  Initialisation

An interior point method such as the affine scaling method described above requires an initial solution on which to begin iterating. As the reader may notice in step 3 the centering transform uses the initial solution to project the entire solution space around the point 1, 1, , 1. Since zero values in the initial solution could not be projected in such a way (and also values on a boundary of the solution space, that is a parameter that lies on a constraint hyperplane) the initial solution must be interior to the solution space.

An interior point is not found trivially, and as such methods have been developed to find an interior point. One such method used here is the Big M method, which adds an additional variable xM to the problem Ax = b such that a new problem Ax = b is solved. A is simply A with an extra column of coefficients for xM. The objective function coefficient of xM is a very large value denoted M (the namesake of the method). This large value of M acts as a penalty such that the solver will try to reduce the value of this variable as much as possible in order to minimise the impact of MxM on the objective value (which in the L1 norm problem is the sum of absolute values of the residuals).

The coefficients of xM in A are calculated such that, given an initial point of x = 1, 1, , 1, all of the constraints are satisfied. As such, the

Ax = b [AAM] x xM = b AMxM = b Ax = b A1, , 1T

By setting xM = 1, AM is easily calculated using the above relationship to obtain the new problem

Minimise z = cTx + Mx M Subject to Ax = b

4.3.2  Heights Example Using Affine Scaling Solver

The Matlab implementation of the affine solver in Appendix B was applied to the RMS heights example used earlier in this chapter. The formulation of the A matrix and b vector was the same as for the simplex method. The value of M was set to 1200 for initialisation. The algorithm converged after 55 iterations, thus it was not practical to show the values of the parameters at each iteration in this section. However, the rate at which the parameters converged to the accepted values are visible in Figure 8 for the station heights and the residual parameters. It should be noted that this algorithm yielded a different result for the residuals as noted in Table 5.

Measurement 1 2 3 4 5 6

Residual (mm) 0 -1.2 1.2 0 0 1.8

Table 5: Residuals of Affine Scaling L1 adjustment

Convergence of height parameters in Affine IPM (a) Station heights Convergence of residual parameters in Affine IPM (b) Measurement residuals

Figure 8: Convergence of parameters in the RMS heights problem using the Affine Scaling method.

It can be seen in Figure 9 the value of the Big M variable drops from 1 to nearly zero very quickly, demonstrating a correct initialisation of the problem and convergence to the initial problem. The stopping criteria used was the primal-dual gap (see Appendix B for the definition in Matlab). The convergence of the solution is accepted when the value of the primal-dual gap drops below a specified tolerance (usually 0.0001).

Convergence of Big M parameter in Affine IPM (a) Reduction of Big M Variable Convergence of the primal-dual gap in Affine IPM (b) Primal Dual Gap

Figure 9: Convergence of the Big M coefficient to zero and the Primal Dual Gap to below the tolerance level.

4.3.3  Implementation for Larger Problems

Clearly convergence after 55 iterations is not an acceptable solution for such a small problem. As an academic exercise, this method was applied to larger problems and exhibited similarly slow performance. Convergence was not achieved for a problem of 195 GNSS measurements and 92 stations (defining a problem size of n = 595, m = 276) using this algorithm.

Dikin’s affine scaling algorithm was implemented in C++ using the Armadillo[28] linear algebra library. A warm start was used whereby initial ECEF coordinates converted from geodetic coordinates (in GDA94) using Gpstk[29] and the ellipsoid parameters of GRS80[10]. ECEF coordinates were then translated to be centred at the apriori coordinates of the first station in the adjustment. The A matrix and b vector were then normalised (scaled to approximately unity) using the following relationship

( 1 A2A)( 1 b2x) = 1 A2b2b

The author attempted to speed up the Affine Scaling method by warm starting the solver with input coordinates and residuals that would be close in value to the final solution. These were obtained from the input bath_syd_stn.xml file in GDA94 geodetic coordinates and converted to GDA94 ECEF Cartesian coordinates using the Position class in gpstk[29].

Although in small problems a warm start offers no advantage over a cold start (whereby there is a phase I problem initialisation), in very large problems using a solver with polynomial time convergence properties a warm start can assist with providing a correct initialisation and reduce the phase I run time by a constant (usually between 0.3 and 0.5). For problems that can take hours to days to solve this time saving is significant.

The next chapter will discuss an alternate formulation of the L1 problem and will demonstrate the application of a faster solver with better convergence properties. Part II APPLICATIONS OF L1 NORM MINIMISERS IN LARGE SURVEY NETWORKS

Further analysis of L1 norm minimisation algorithms and results pertaining to survey networks and reliability

5  More Formulations of the L1 Minimiser

The standard form of the L1 problem from Chapter 3, replicated below, is appropriate for some simplex-based solvers but suffers from numerical instability issues using many open-source interior point method solvers1 .

Minimise z = eTx R+ + eTx R + 0x P+ + 0x P Subject to BTx = I I A A xR+ xR xP+ xP = b xP+, x P, x R+, x R 0 eT = [11] (17)

It is necessary to remodel the problem such that it is "easier"2 for an IPM solver to find a solution. One approach is to find a solution on the dual problem.3 By first converting the primal problem from above to a form that allows negative parameters (as allowed by some solvers because of automatic remodeling), below

Minimise z = 0xP± + eTx R+ Subject to BTx = A I A I xP± xR+ b b xR+ 0 xP± is unbounded where eT = [11] (18)

Note that the row constraints are no longer bound by a strict equality =. This allows for residuals that are negative to be specified exactly in one of a pair of complementary constraints, with the other permitting the former equality with a positive surplus variable. This means

aixi + ajxj + rk bk aixi ajxj + rk bk


aixi + ajxj + rk + sk = bk aixi ajxj + rk + skc = bk (19)
such that either sk = 0  and  skc 0 or sk 0  and  skc = 0 exclusively.

Subsequently, by using the Tucker Diagram (see Appendix I Table 7) the primal problem in Equation 18 can be converted to the dual problem

Maximise v = bT bT y Subject to BTx = AT AT In In y 0m en y 0 where eT = [11] (20)

Note that, according to the rules defined by the Tucker diagram (Appendix I Table 6), the inequality in the row constraints becomes a y 0 lower bound for the dual variables. This problem is far "easier" to solve using Mehrotra’s primal-dual solver.

The above approach to remodeling the problem in the dual form, although mathematically correct, demonstrates nothing of the relevance of the dual formulation to the original problem in Equation 18. To see it another way, consider the constraints in the primal problem as being bounded from below by the measurements. The primal problem objective is to minimise the residual parameters such that this lower bound still holds. In the dual formulation we first substitute every original parameter with xi = 1 and then scale each constraint j by a parameter yj such that for each column (corresponding to the coefficient xi in the primal) in the constraint matrix the scalar product of the coefficients with y1yn is bounded above by the corresponding coefficient ci of xi in the objective function. It intuitively follows that by maximising the values bjyj subject to the constraints a1,iy1 + + an,iyn ci this effectively maximises the overall "utility" of the measurements whilst being constrained by the "utility" of each parameter.

As shown by Dantzig and Von Neumann[8], if the primal problem has a feasible solution then the dual also has a feasible solution and the value of the objective function of the dual problem v in an optimal feasible solution is the same as the value of the objective function z in the primal problem when it is optimal, i.e. z = v for optimal solutions of the primal and dual. This is stated more formally in the Strong Duality Theorem.

Strong Duality Theorem 5.0.1. If the primal system min z = cTx, Ax b, x 0 has a feasible solution and the dual system max v = bTy, ATy c, y 0 has a feasible solution, then there exist optimal feasible solutions x = x and y = y to the primal and dual systems such that bTy = max v = min z = cTx.

It remains to determine how to recover the residuals from the optimal solution to the dual problem. In summary, by the Complementary Slackness Theorem (below) the reduced costs of the slack (or surplus) variables in the optimal solution to the dual problem are equal to the values of the slack variables in the optimal solution to the primal problem.

Complementary Slackness Theorem 5.0.1. Assume a primal problem (P) has a solution x and the dual problem (D) has a solution y.

  1. If xj>0 in (P), then the jth constraint in (D) is binding (this means the slack variable in the dual is zero).
  2. If the jth constraint in (D) is not binding, then xj = 0.
  3. If yi>0, then the ith constraint in (P) is binding.
  4. If the ith constraint in (P) is not binding, then yi = 0.

By the exclusive slackness property of the primal problem as shown in Equation 19, assuming that one of the constraints has a zero-value surplus value (i.e. it is a binding residual constraint) and thus the positive value of the residual for that constraint is also binding, the complement constraint has a non-zero-valued surplus value of exactly the negative value of the binding residual in the first constraint. By item 4 in the Complementary Slackness Theorem, a non-zero positive surplus variable in the primal problem (i.e. the constraint is not binding) infers that the corresponding value of the variable in the dual problem is zero, which means it is not in the dual basis and thus has a non-zero reduced cost.

Now, when the dual solution is optimal, the value of all of these reduced costs will be positive or zero (by the Optimality Test Theorem). The cost of a basis exchange to add one of these variables with a positive reduced cost to the basis of the dual solution is precisely the magnitude of the non-zero-valued surplus variable in the corresponding row constraint in the primal problem. Put more simply, the surplus value representing the residual in the primal problem is exactly equal to the reduced cost of this surplus variable in the dual problem.

This has the side effect that all residuals will be reported as absolute values by this method since the effect of the exclusive slackness constraints 19 represents the familiar model for an absolute value in the dual problem. For the purposes of outlier detection, this is not an issue because only the magnitude of the normalised residual is considered in the outlier test.

5.1  L1 Minimiser Model in the Gnu Linear Programming Kit

The Gnu Linear Programming Kit (GLPK)[21] is an open source library for linear programming that contains two separate solvers:

  1. A simplex-method (SM) solver.
  2. An interior-point-method (IPM) solver.

The SM solver has support for primal and dual problems. This is demonstrated to be very robust and resilient to numerical instability.

The IPM solver uses a modified implementation of Mehrotra’s primal-dual method for interior point problems[25]. It is more exposed to numerical instability than the simplex method as demonstrated by the problem in this chapter.

Both solvers have pre-processing stages that remodel the problem to a standard form. This allows the user to specify inequality constraints and negative bounds on parameters which is usually disallowed by the underlying algorithms. Extra constraints are added to the model to permit the problem specification in this way, but care must be taken to appropriately specify a model such that it will have a feasible solution and is not unbounded. Using the above dual formulation of the L1 problem on the entire set of more than 66,000 GNSS observations, GLPK was able to converge to an optimal solution in 30 seconds on a standard laptop computer.

6  L1 Norm Analysis on NSW GNSS Network

The focus of an argument for using L1 Norm minimisation for outlier detection hinges on reliability. As discussed by Baarda[1], Pope[26] and subsequently many others[27][15], the internal reliability of a model design is measured by how large an error must be before it is detectable. As Pope notes[26]:

the traditional objections to least-squares adjustments of triangulation; that they cannot be relied upon to localize the error and instead "Spread it around"

which identifies two critical aspects in data snooping:

  1. The initial detection of an error
  2. The localisation of the error.

While an examination of measurement cycle mis-closes will reveal the existence of most gross errors, the localisation of these errors requires more than a mis-close analysis.

6.1  Residuals

The fit of residuals to the Laplace probability density function (described in Chapter 3) was carried out in Mathematica using Pearson’s Chi-Square fit test. The histogram of residuals for the initial set of observations is in Figure 10 below. Note that there is always a "spike" in the histogram at the zero level. This correlates exactly to the number of parameters in the basis u, with the remainder of the histogram representing the degrees of freedom n u. By removing the residuals corresponding to all basis parameters the histogram of residuals more closely fits the Laplace probability density function.

PIC (a) All standardised residuals PIC (b) Non-zero (degrees of freedom) standardised residuals

Figure 10: Histogram of residuals with and without the zero (basis) spike.

The Pearson Chi-Square fit of the Laplace probability density function was calculated for the histogram of non-zero standardised residuals. The parameters of this distribution, with an assumed mean of zero, are μ = 0, β = 0.665766 (or instead of β, the h = 1 β = 1.502 using the model defined by Branham[3]). The sigma threshold was calculated at the 99% confidence level on this distribution was solved from the cumulative density function

1 - e 1 2 ( - 1.502 v ^ σ ) = 0.995 v ^ = 3.06 σ

By performing a cursory removal of measurements demonstrating outlier behavior from the initial set of data in two passes the resultant reduced data set still contained outliers that "reappeared" after the first two passes, but the maximum magnitude the standardised residuals (|v| |s|) of outliers decreased from ±181.2 to ±8.6. The number of residuals above the sigma threshold of v̂ = ±3.06σ decreased from 4,486 to 1,698.

The histogram of "degrees of freedom only" residuals of the "cleaner" data set is visually no different to that of the initial data set due to the sheer amount of data. Additionally, the parameters for the Laplace probability density function for these standardised residuals are essentially identical, which demonstrates that the outliers removed from the "dirty" data set imparted no bias on the network model in a minimally constrained L1 adjustment. A visual comparison of the network graph between the initial data set and the "cleaner" set is in Figure 11.

PIC (a) L1 adjustment of all measurements PIC (b) After two passes and the removal of 1475 outlier measurements

Figure 11: Map of residual measurements at various thresholds ( <3σ, >3σ, >10σ)

To properly address outliers in such a large data set, it is advisable to construct an automated procedure to iteratively report on and test the measurement with the largest outlier on a network adjusted with and without the said measurement until no standardised residuals test above a specified sigma threshold. This course of action is recommended for further research and development.

6.2  Reliability Tests

In a continuation of the method proposed by Marshall [23] a Monte Carlo based simulation was executed on the "cleaner" data set obtained in the above section. The test was conducted by adding random uniformly distributed artificial gross errors in the range Δi [1000mm, 1000mm] to 100 randomly selected measurements in the data set across 1000 passes. The results from each pass were analysed in Matlab using the code in Appendix D which classified the residuals of each adjustment into four groups:

  1. True negative: No artificial bias was added and measurement was not flagged as an outlier.
  2. True positive: Artificial bias was added and measurement was flagged as an outlier.
  3. False negative: Artificial bias was added and measurement was not flagged as an outlier.
  4. False positive: No artificial bias was added and measurement was flagged as an outlier.

The true negatives are displayed in Figure 12, the false positives are displayed in Figure 13 and the false negatives are displayed in Figure 14. Note that the rate of false positives is relatively low except for standardised residuals near the sigma threshold which identifies that there is a concentration of marginally detectable errors in the simulation and thus implies the propensity for reliability errors in the NSW network.

A visual comparison of the true negatives and false negatives in Figures 12 and 14 would indicate that the distribution of the false negatives with respect to the value of the artificial gross error is reflected in observations not present in the true negatives, where the latter would ideally display a flat uniform distribution over the entire spectrum of gross errors. The shape of the distribution of false negatives demonstrates that, above a "noise floor" of an approximate failure rate of detection of between 5% and 10%, the failure rate of detection increases sharply once the magnitude of the bias drops below 0.1 metres. Between a bias magnitude of 0.1 and 0.2 metres the failure rate of detection does also demonstrably increase relative to the range outside ± 0.2m.

Chart of outliers correctly identified as errors

Figure 12: Correct detection of simulated gross error vs the scale of the error in metres

Histogram of outliers that did not have artificial errors added to the network

Figure 13: Frequency of measurements incorrectly flagged as containing an error (false positives). The x-axis indicates the absolute difference between the original standardised residual and the flagged standardised residual for each measurement.

Histogram of measurements with artificial errors that were not flagged as outliers. Frequency vs Bias.

Figure 14: Frequency of measurements not flagged when an error exists (false negatives) with respect to the magnitude of the artificial error introduced on each measurement.

6.2.1  Localisation of False Negatives

The localisation of the propensity for failure to detect gross errors as outliers has not been addressed in this thesis. However, this would be a viable avenue as a topic of further research. Perhaps the network neighborhood aggregation discussed in Chapter 7 could be of use in this area.

7  Datum Network Segmentation

As postulated by Marshall [22] there is little known about the pre-analysis of robust estimators such as the L1 norm.

One solution to reduce the overhead time complexity of an L1 adjustment is to reduce the problem size (or dimension) by dividing the problem into a series of sub-problems that have the property that the union of the outliers detected across all sub-problems is equivalent to the outliers detected in the initial problem.

The problem of dividing a network of measurements in to a set of sub-networks (this term will be interchangeable with the term segments from here-on) is made difficult by the constraint mentioned above, which in the context of this problem can be expanded into the following:

Every measurement contained in the original network that has redundancy is contained in at least one of the sub-networks with redundancy.

This thesis will investigate the efficacy of dividing the network by the detection of a basis of cycles (or loops) of measurements and to group these cycles into segments based on the overlapping observations shared between cycles.

The hypothesis of correctness of the above proposal lies in the property that we wish to preserve, that is, the set of outlier observations. The magnitude of the error of these outliers is not as important as whether or not these observations are flagged as such in all cases. In order to test this hypothesis, a series of simulations will be run in order to ascertain the statistical likelihood of correctness. These simulations will consist of simple permutations of outliers in across permutations of network geometries.

7.1  Cycle Detection

The problem of detecting cycles lies in the properties of the cycles we wish to obtain. Since a complex network can contain many more cycles than there are edges (or measurements) in the network, we wish to obtain a minimally spanning cycle basis for the network in order to effectively minimise the number of adjustments performed. This task can be complicated depending on how much we want to minimise this cycle basis.

The chosen cycle detection algorithm was achieved using the following basic process:

  • Find a spanning forest in the network.
  • For each tree in the forest
    • For each back-edge not in the spanning tree
      • Find the smallest path joining the start and end nodes of the back-edge using A* search. This path cannot contain the back-edge.

7.2  Search Algorithms

It was desired that a set of cycles be obtained from a network such that:

  • Every measurement in the network that can belong to a cycle does belong to at least one cycle
  • The cycles must be as small as possible.

The cycles are then grouped to ensure high commonality between measurements and stations in a segment.

7.3  The A search algorithm

The most effective and commonly-used search algorithm for scenarios defined by a single goal or target in a search space that can always allow a reasonable (and admissible) method of predicting the distance from a potential solution candidate to the goal state is the A (pronounced A-star) search. Initially proposed in 1968 by Nils Nilsson [citation needed] as a faster algorithm to select paths for robot navigation, the A search has persisted in popularity due to much the same reason as Dantzig’s simplex method - the good average-case computational time complexity and the real-world nature of the conditions under which it can be applied.

A is effectively an iterative state-based path-traversal search whereby a queue of possible solution candidates at various points in the search space is maintained and ordered by the cost. This cost is calculated as the sum of the cost (or distance) the candidate has traversed already plus an estimate or heuristic of the minimum cost that is expected to be incurred for this candidate to reach the goal state.

In the context of a cycle search in a geodetic network, this search algorithm can be applied using the following conditions and metrics:

  • Nodes are stations in the geodetic network, and edges are the measurements.
  • Each edge is required to belong to at least one cycle, hence every edge will incur a cycle search for that edge.
  • The start node is the first node of the edge being searched. The goal node is the second node of the same edge.
  • The edge being cycle-searched is removed from the set of edges that can be selected as candidates in the A* search.
  • The additional cost of a traversal is the Euclidean distance ||ΔX|| from the last node to the next node.
  • The admissible heuristic is the straight line distance from the last node in the candidate to the goal node.

The algorithm for this search is summarised in pseudo-code in 15. This algorithm was effectively implemented in the program LDNet. See Appendix F for a C++ implementation of the below pseudocode algorithm.

1:  procedure CYCLESEARCH(e, G)
2:   Let Q be an ascending-cost-sorted queue of edges remaining to be searched
3:   Let C contain a list of edges that have been searched already
4:   Remove e from G
5:   eprevious END
6:   Enqueue e onto Q with cost MAXCOST
7:   while Q contains edges to be searched do
8:   d dequeue(Q)
9:   if d.node2 = e.node2 then
10:   return Path traced by dprevious until END is reached.
11:   end if
12:   for All adjoining edges h G, h C do
13:   hprevious d
14:   hpathlength dpathlenth + hlength
15:   hcost hpathlenth + distance(h.node2, e.node2)
Path cost plus estimated remaining distance as the admissible heuristic

16:   Q enqueue(h) by hcost
17:   end for
18:   end while
19:  end procedure

Figure 15: Adaptation of the A* search to finding the shortest cycle using back-edge e in a network graph G using a distance heuristic

7.4  Grouping Cycles

Initial testing of the L1 adjustment was done on networks of naively grouped cycles. This yielded results that confirmed that the algorithm performed correctly but ultimately did not provide an easily identifiable way of analysing the resultant residuals of the measurements in the network as a whole.

It was hypothesised that by intelligently grouping a subset of measurements from the whole network, an L1 norm analysis of these measurements could yield reliable information on the residuals of a subset of these measurements. The author proposes that for each measurement oxyz in the whole network G, (i.e. for oxyz G, an L1 adjustment on a neighbourhood of measurements V about oxyz of appropriate depth would yield reliable residuals for oxyz relative to the whole network. See Appendix G of the thesis for an implementation of the aggregation algorithm in C++.

The algorithmic robustness of neighbourhood aggregation would be compromised if a station has a particularly high degree (number of measurements to it). This phenomenon exists in the provided set of GNSS baseline data, particularly around TS12042 which has more than 30,000 measurements to it. To allow the simplex method to solve within reasonable time, the number of measurements should not exceed 1000 per segment. Therefore in such cases it may be necessary to further split the data into overlapping neighborhood partitions by taking a comb of measurements at stations with a degree more than 500. When testing for outliers in each neighborhood partition it is required that there are no false negatives, therefore a rigorous method must be devised for this neighborhood partitioning.

7.5  Future Research

As noted earlier, testing of internal reliability on a fixed size neighborhood relative to the whole network did not commence during the term of research. Should another researcher be interested in pursuing this avenue then all available source code will be provided by the author.

8  Concluding Remarks

The application of fast linear solvers to the L1 minimisation problem in a large network has demonstrated the viability of such a device in the quality analysis stage of datum determination. It is not difficult to see the automation possibilities to achieve near-real-time datum quality analysis. As noted by Haasdyk et al [12] to achieve the measurement density required for a dynamic datum the use of crowd-sourced survey data is almost inevitable and as such a rigorous quality analysis system would be required as a first level assessment to provide immediate feedback to both the contributing and receiving parties.

8.1  Further Development

Currently the software developed for this thesis is capable of processing GNSS baselines but there is no support for terrestrial observations. The latter would require the implementation of ellipsoidal model equations as described in Harvey’s Monograph 13[14] in addition to support for a geoid-ellipsoid separation description such as AusGeoid09[9].

Further to the above, the process of iteratively down-weighting or removing outliers by fitting the Laplace distribution via the Pearson Chi-Square test is, at present, performed manually. With potentially thousands of outliers in a set of hundreds of thousands or millions of measurements it would be wise to automate this procedure by the specification of decision parameters for outlier removal.

Lastly, multi-epoch support should be addressed in the context of a dynamic datum. Given the propensity for localised deformation on the Australian plate[12] the datum definition could itself be used as a large-scale monitoring survey to identify marks that have potentially moved between consecutive epochs. The likelihood of movement of marks would be statistically tested and the basis for such tests would depend on the quality of surrounding measurements at the local level.

8.2  Further Research

The result of initial internal reliability testing revealed the potential for a significant proportion of type 1 and type 2 errors in the identification of outliers as measurement errors. As such it is recommended that further testing of internal reliability should be conducted on the network geometry of the NSW network of class 2A0 marks. Of particular relevance is the reduction of the rate of false negative flagging in areas of very low redundancy.

Although GNSS baseline vector components are correlated and this was considered during the term of this thesis, it was beyond the scope of this thesis to investigate the effect of this correlation with respect to L1 norm minimisation in detail.

In general, the network geometry and redundancy still primarily determine the ability to detect gross errors in survey networks. As demonstrated by the internal reliability testing in this thesis, even with sufficient redundancy there is still a chance that gross errors will not be localised exactly using an L1 adjustment. However, when compared with other existing methods such as the Baarda test[1] in least squares, the same techniques on an L1 adjusted data set provide significant information that when used in conjunction with existing data snooping methods can greatly assist the process of gross error identification. Part III APPENDIX

A.  Glossary

A.1  Glossary of terms

Depending on academic background, some terminology used in this document may not be immediately familiar to the reader. Since this document spans both aspects of computer science, mathematical graph theory and traditional survey concepts, there are some terms that have different meanings in different contexts and some others that can only be adequately explained using the most rigorous mathematical terminology. The author has made an attempt to describe all terms in this section with references and examples in the relevant areas of study.

A.1.1  Basis

In a linear system, the basis is the set of linearly independent vectors upon which all vectors in the system are represented componentwise.

A.1.2  Cycle

In graph theory, a cycle is specifically a path through a graph (or survey network) in which the start node (or point) is also the end node. This is often called a loop in the context of a survey network, however the term loop in graph theory specifically refers to a single edge (or measurement) that links a single node to itself. This latter property is never exhibited in a survey network.

A.1.3  Over-determined

As a basic property of linear algebra: If A is n × n, there exists exactly one feasible solution for z. However, typically A is n × u and hence the system of equations Ax b is over-determined, which means that there is no single solution for z, but there exists as many feasible solutions as there are degrees of freedom in A, that is n u, where u is the number of parameters in the system.

A.1.4  Convexity and a Convex Set

An aspect of a geometric figure that is a requirement of the solution space of the Simplex method that can be mathematically determined. In a vector space, convexity can be visualised by the following statement: A vector space is convex if there does not exist any pair of vectors between which a straight line passes outside the bounds of the set.

A.1.5  Sentinel Value

In computer programming, a sentinel value is a special value whose presence guarantees termination of a loop that processes structured (especially sequential) data. In more complex algorithms it is also a term used to describe a variable that is used to decide the flow control of the execution based on certain values.

A.1.6  Norm

A Norm is a type of measure or length. Typically it is used in the context of the measurement of the size of a vector or matrix. In the case of the L1 Norm of a geodetic network, the set being normed is the set of residuals.

A.1.7  Degeneracy

A linear program is said to be degenerate or to become degenerate when basic variables of zero value can result in pivot operations for which there is no improvement in the objective value. This would then cause the next pivot operation to use

A.1.8  Cycling

A linear program can encounter conditions whereby it will run indefinitely and never reach an optimal solution. This occurs when a particular set of basic variables is encountered more than once. Since linear programming is deterministic, the pivot operations on a particular basis will always be the same, hence encountering a basis more than once means that the program will "cycle" through the same sequence of sets of basic variables in an infinite loop.

A.1.9  Degree

In a graph, the degree of a node refers to the number of edges adjoining that node.

A.1.10  Non-sparsity

In the context of a network (or graph) of vectors, the author has chosen the term Non-Sparsity to represent the likelihood of a single gross error in a measurement near stations of very high degree being hidden or spread across the residuals of many neighbouring measurements.

A.1.11  Hyper-cube

An n-dimensional cube. The defining characteristic when considering a hyper-cube problem-space of a linear-programming problem is that each constraint does not bypass any other constraint and that the correct pivot choice is often not the first one chosen by the simplex method, hence increasing computational time complexity.

A.1.12  Undirected Graph

The mathematical representation of a collection of nodes and adjoining edges whereby edges do not have a direction of flow, i.e. paths that are traversed in this graph can flow in either direction across an edge.

A.1.13  Spanning Tree

A spanning tree is a single connected tree in a graph, rooted at any node, whereby every node in the graph is visited once and only once by a path connecting the root node to that node. It is thus a property of a spanning tree that the spanning tree contains no cycles. A graph whereby a path can be found between any pair of nodes is connected, and thus a single spanning tree can be found for this graph.

B.  Affine Scaling Solver in Matlab

H.  LDNet I/O Formats

H.1  Input Measurements

The below is an example of a single measurement input to LDNet.

I.  Duality Tables and Equivalences

Primal Dual

Minimize Primal Objective Maximize Dual Objective
Objective Coefficients RHS of Dual
RHS of Primal Objective Coefficients
Coefficient Matrix Transposed Coefficient Matrix
Primal Relation: Dual Variable:
(ith) Inequality: yi 0
(ith) Inequality: yi 0
(ith) Equation: = yi unrestricted in sign
Primal Variable: Dual Relation:
xj 0 (jth) Inequality:
xj 0 (jth) Inequality:
xj unrestricted in sign (jth) Equation: =

Table 6: Correspondence Rules Between Primal and Dual LPs from Dantzig and Thapa [8] page 131.


Variables x 1 0 xn 0 Relation Constants

y1 0 a11 a1n b1
y1 0 am1 amn bm

Max v

Constants c1 cn
Min z

Table 7: Tucker Diagram from Dantzig and Thapa [8] page 130.



[1]    W. Baarda. A Testing Procedure for Use in Geodetic Networks. Number v. 2, no. 5 in A testing procedure for use in geodetic networks. Netherlands Geodetic Commission, 1968. ISBN 9789061322092. URL

[2]    I. Barrodale and F. D. K. Roberts. Solution of an overdetermined system of equations in the l1 norm [f4]. Commun. ACM, 17(6):319–320, June 1974. ISSN 0001-0782. doi: 10.1145/355616.361024. URL

[3]    R.L. Branham. Scientific data analysis: an introduction to overdetermined systems. Springer-Verlag, 1990. ISBN 9780387972015. URL

[4]    W.F. Caspary and J.M. Rueger. Concepts of Network and Deformation Analysis. Monograph (University of New South Wales. School of Surveying). School of Surveying, The University of New South Wales, 1987. ISBN 9780858390447. URL

[5]     I. Castillo and E. Barnes. Chaotic behavior of the affine scaling algorithm for linear programming. SIAM Journal on Optimization, 11(3):781–795, 2001. doi: 10.1137/S1052623496314070. URL

[6]    M. Cen, Z. Li, X. Ding, and J. Zhuo. Gross error diagnostics before least squares adjustment of observations. Journal of Geodesy, 77(9): 503–513, 2003. ISSN 0949-7714. doi: 10.1007/s00190-003-0343-4. URL

[7]    G. Dantzig. Linear programming and extensions. A Rand Corporation research study. PRINCETON University Press, 1965. ISBN 9780691059136. URL

[8]    George B Dantzig and Mukund N Thapa. Linear programing: 1: Introduction, volume 1. Springer, 1997.

[9]    WE Featherstone, JF Kirby, Christian Hirt, MS Filmer, SJ Claessens, NJ Brown, Guorong Hu, and GM Johnston. The ausgeoid09 model of the australian height datum. Journal of Geodesy, 85:133–150, 2011.

[10] Common ellipsoid parameters, May 2013. URL

[11]    Michael Grant and Stephen Boyd. CVX: Matlab software for disciplined convex programming, version 2.0 beta., September 2013.

[12]    Joel Haasdyk and Craig Roberts. Monitoring station movement using a state-wide simultaneous adjustment of everything - implications for a next-generation australian datum. In Proceedings of International Global Navigation Satellite Systems Society IGNSS Synopsysm 2013, page 15, July 2013.

[13]    Joel Haasdyk and Tony Watson. Data-mining in nsw: Towards a new and improved australian datum. In Proceedings of Association of Public Authority Surveyors Conference, pages 85–102, March 2013.

[14]    B. R. Harvey. Practical least squares and statistics for surveyors. School of Surveying, University of New South Wales, Kensington, N.S.W. :, 1st ed. edition, 2011. ISBN 0858390590.

[15]    Bruce R Harvey. Survey network adjustments by the l1 method. Australian Journal of Geodesy, Photogrammetry and Surveying, 59:39–52, 1993.

[16]    Narendra Karmarkar. A New Polynomial-Time Algorithm for Linear Programming. Combinatorica, 4:302–311, 1984. doi: 10.1007/BF02579150.

[17]    SamwelSimon Katambi, Guo Jiming, and Kong Xiangyuan. Applications of graph theory to gross error detection for gps geodetic control networks. Geo-spatial Information Science, 5: 26–31, 2002. ISSN 1009-5020. doi: 10.1007/BF02826471. URL

[18]    A. Khodabandeh and A. Amiri-Simkooei. Recursive algorithm for l1 norm estimation in linear models. Journal of Surveying Engineering, 137(1):1–8, 2011. doi: 10.1061/(ASCE)SU.1943-5428.0000031. URL

[19]    V. Klee and G. J. Minty. How Good is the Simplex Algorithm? In O. Shisha, editor, Inequalities III, pages 159–175. Academic Press Inc., New York, 1972.

[20]    D.C. Lay. Linear Algebra and Its Applications. World student series. Addison-Wesley Longman, Incorporated, 1997. ISBN 9780201824780. URL

[21]    Andrew Makhorin. Gnu linear programming kit, version 4.53, 2013. URL

[22]    J. Marshall. L1-norm pre-analysis measures for geodetic networks. Journal of Geodesy, 76(6-7): 334–344, 2002. ISSN 0949-7714. doi: 10.1007/s00190-002-0254-9. URL

[23]    John Marshall and James Bethel. Analysis of residuals from l 1 norm estimation. ISPRS Archives, pages 38–43, 1996.

[24]    John Marshall and James Bethel. Basic concepts of l 1 norm minimization for surveying applications. Journal of surveying engineering, 122(4):168–179, 1996. doi: 10.1061/(ASCE)0733-9453(1996)122:4(168). URL

[25]    S. Mehrotra. On the implementation of a primal-dual interior point method. SIAM Journal on Optimization, 2(4):575–601, 1992. doi: 10.1137/0802028. URL

[26]    A.J. Pope, Geodetic Research, Development Laboratory, National Geodetic Survey (U.S.), and National Ocean Survey. The statistics of residuals and the detection of outliers. NOAA technical report. U.S. Dept. of Commerce, National Oceanic and Atmospheric Administration, National Ocean Survey, Geodetic Research and Development Laboratory, 1976. URL

[27]    Chris Rizos. Principles and practice of gps surveying, January 2000. URL

[28]    Conrad Sanderson. Armadillo c++ linear algebra library, June 2013. URL

[29]    Brian Tolman, R. Benjamin Harris, Tom Gaussiran, David Munton, Jon Little, Richard Mach, Scot Nelsen, and Brent Renfro. The GPS Toolkit: Open Source GPS Software. In Proceedings of the 16th International Technical Meeting of the Satellite Division of the Institute of Navigation, Long Beach, California, September 2004.

[30]    Henry Wolkowicz. C&o 350 linear programming course notes, fall semester 1995, 1995. URL

[31]     Y. Zhang. Primal-dual interior point approach for computing l-1-solutions andl l-infinity-solutions of overdetermined linear systems. Journal of Optimization Theory and Applications, 77: 323–341, 1993. ISSN 0022-3239. doi: 10.1007/BF00940715. URL

1 Tested on a subset of the NSW LPI data of 195 measurements and 92 stations

2 Gnu Linear Programming Kit

3 Tested on dual model formulation (see Chapter 5) using entire NSW LPI Data set

4 Based on Mehrotra’s[25] Primal-Dual interior point method for linear programming

1 For further information on the formulation of the Jacobian matrix A, see Harvey’s Monograph 13[14].

2 by referring to random error, this excludes systematic error or bias which can skew a least squares solution as we know from practice.

3 Algorithms such as the Simplex algorithm are made much simpler when all parameters are non-negative.

4 The notation x± is used throughout this thesis to represent the pair x+ 0 and x 0. Whenever x± is referred to in an equation, relation or assignment such as xi± = bi this is symbolically equivalent to xi+ = max(0, bi) and xi = max(0, bi).

1 Such as GLPK, discussed later in this chapter

2 As noted in repeated experiments by the author, Mehrotra’s primal-dual method does not often converge on small-medium (n>50) problems using the primal formulation in Equation 17. Large problems always resulted in numerical instability.

3 Before continuing, it should be noted that there is a large amount of theory behind this approach that will not be covered in this thesis. For reference, the very readable Linear Programming Volume 1 by Dantzig and Thapa [8] contains all of the relevant definitions and theorems and some very good examples on duality theory.