# Gross Error Detection Using The ${L}_{1}$ Norm Method In A Large Survey Data Set

## NSW LPI GNSS Vectors - Central West Network Segments

A subset of an L1 adjustment on the LPI NSW GNSS network is displayed to the right.

Click on a baseline in the map to see residual information.

If the map does not appear, refresh the page or contact the webmaster

For a complete kml of the NSW GNSS network as provided by LPI NSW click here.

Note that all measurements in the kml are precise only to eight significant figures and thus should not be relied upon for any geodetic surveys.

## Abstract

This dissertation investigates computational methods based on the minimisation of the ${L}_{1}$ norm of residuals for the identiﬁcation of outlier observations in a datum network comprised of GNSS baselines. The Barrodale & Roberts[2] algorithm based on Dantzig’s[7] Simplex method was examined in depth and the computational complexity issues that arose from large data sets necessitated an investigation into a network segmentation-based compromise and the application of interior point solvers for linear programming in the minimisation of the ${L}_{1}$ norm of residuals in very large data sets. The dual of the ${L}_{1}$ norm minimiser primal model was ﬁnally investigated and successfully applied to yield a solution in thirty seconds using an implementation of Mehrotra’s[25][21] Primal-Dual Interior Point Solver as opposed to more than six hours for the same problem model on the same hardware using a simplex solver. Further analysis of the segmentation-based solution was not deemed necessary once a fast interior point method was successfully applied.

## 1Introduction

### 1.1Literature Review

Gaussian parametric weighted least squares (WLS) is still the most widely known method of obtaining both a best ﬁt for a survey network adjustment and reliable quality parameters for that ﬁt. In the context of a very large set of measurements such as that for a national datum, where thousands of parameters are to be estimated, WLS is still the ﬁrst choice for obtaining coordinate and quality parameters. However, when the volumes of data are large and the quality of the meta-data is sometimes uncertain, the result from a WLS adjustment can be equally uncertain. The analyst will often be required to employ supplementary quality control techniques to assist in cleaning a set of data.

Of the existing methods for the detection of outliers, the ${L}_{1}$ norm minimiser using both simplex[2] and an interior point[8] algorithms was assessed for the ability to detect real and simulated outlier observations in large data sets. This was compared against the other more conventional least squares Baarda test[1] and misclose analysis as explored by Katambi et al[17].

The mechanism of the simplex method as applied to a ${L}_{1}$ minimiser in geodetic networks is explained in Chapter4 followed by a discussion on an adaptation of an interior-point optimisation algorithm[8][31] which in practice can be more computation-time eﬃcient on large sets of data but has the unfortunate aspect of being less numerically stable[5].

#### 1.1.1Problem Scope

The ongoing eﬀort to source historical geodetic measurement data from archives[13] for the next Geodetic Datum of Australia has resulted in a large set of measurement data containing 66,000+ GPS baseline measurements that will estimate the Earth-Centred, Earth Fixed (ECEF) positions of 20,000+ class A survey marks. Characteristic gross errors in measurements have been encountered and continue to arise as outliers in existing data as new data is added[12].

A more rigorous pre-processing step prior to the ﬁnal least squares adjustment of the GPS baseline network was requested to reduce the repetitive aspect of individually testing each outlier as it appears in a large adjustment. The author trialled several methods for simplifying the problem and succeeded with a residual analysis technique using a proposed adjustment-per-measurement method that approximates redundancy in the whole network with a "redundancy neighbourhood" of ﬁxed depth from the measurement in question. The method is detailed in Chapter 7.

#### 1.1.2Reliability

The focus of an argument for using ${L}_{1}$ Norm minimisation for outlier detection hinges on reliability. As discussed by Baarda[1], Pope[26] and subsequently many others[27][15], the internal reliability of a model design is measured by how large an error must be before it is detectable. As Pope notes[26]:

$\cdots \phantom{\rule{0.3em}{0ex}}$ the traditional objections to least-squares adjustments of triangulation; that they cannot be relied upon to localize the error and instead "Spread it around" $\cdots \phantom{\rule{0.3em}{0ex}}$

which identiﬁes two critical aspects in data snooping:

1. The initial detection of an error
2. The localisation of the error.

While an examination of measurement cycle miscloses will reveal the existence of most gross errors, the localisation of these errors requires more than a misclose analysis. This is explored in Chapter 6.

#### 1.1.3Previous Research

Although the ${L}_{1}$ norm minimiser has often been suggested as an auxiliary quality control step [15][14][18][24][22][24] there is currently no widely known commercial solution for dealing with large networks of survey data.

Harvey[15] also notes the application of ${L}_{1}$ norm minimisation to identify non-measurement errors in a survey adjustment such as datum defect (also known as rank defect). This results from the robustness of the simplex method in optimising an under-determined system of linear equations[8].

### 1.2Research Aims

This thesis covers three major areas in the problem of ﬁnding errors in input observations for a geodetic datum using ${L}_{1}$ Norm estimators as described by Cen et al [6] and Marshall [22]. These are

1. Provide a solid theoretical background for newcomers to the ${L}_{1}$ norm minimisation model.
2. Evaluate the time performance of various ${L}_{1}$ norm minimiser algorithms on survey data.
3. Find a well-balanced practical solution to the computational time intractability issue arising in algorithm performance on large data sets such that the desired output information on measurement residuals is preserved.

The theoretical basis for L1 norm is a large area which encompasses the mathematical model, the various computational methods and the proposed application of the output. This task will be broken down into the following subsections

• The formulation of a linear program of the ${L}_{1}$ norm minimiser.
• A worked example using modiﬁed simplex method analogous to Barrodale and Roberts’ ${L}_{1}$ algorithm[2][3].
• The same example problem using a simpliﬁed interior point method based on Dikin’s aﬃne scaling algorithm[8][31].

The performance in time, correctness and convergence of the below solvers was tested. The source of the data is the complete set of NSW GNSS baselines provided by LPI unless otherwise stated. See footnotes1 2 3 4 for references.

Software Solver Type Time Complexity $O\left(n\right)$ Time Taken
 Barrodale and Roberts FORTRAN [2 ][3 ]
Primal Simplex Exponential Very slow1: > 1 second
Mathematica Simplex Exponential Slow1: > 1 second
Mathematica Primal-Dual IP Polynomial Fast1: < 0.01 seconds
Matlab CVX[11] Primal-Dual IP Polynomial Fast1: < 0.01 seconds
Dikin Aﬃne Scaling[8][18] Primal Aﬃne IP Not known Did not converge1
GLPK2[21] Simplex Exponential 6 hours3
GLPK[21] Primal-Dual IP4 Polynomial 30 seconds3

 Table 1: Performance of all of the tested linear programming solvers using ${L}_{1}$ models examined in this thesis.

The objective of research in this thesis was to ﬁnd a ${L}_{1}$ norm minimiser algorithm to operate on a very large data set where the number of parameters $m$ exceeds $10,000$. Therefore the very real problem of computational time intractability was examined. Prior to ﬁnding an acceptable solution using GLPK, a proposal for a segment-based solution with the intention to preserve the validity of the output ${L}_{1}$ norm residuals with respect to the whole of a large data set was investigated in Chapter 7. The naïve approach was to evaluate a complete subset of data from the larger set with the hope that all of the information on residuals (observed - calculated) relative to the subset is completely relevant to the original large data set. This approach was abandoned once the 30 second solver was found and applied but the investigation and the work on ﬁnding a complete method of segmenting the network was preserved for the possibility of future research.

#### 1.2.1Assumed Knowledge

It is assumed that the reader is familiar with the principles of weighted least squares as applied to a geodetic network adjustment, with speciﬁc knowledge of the formulation of the Jacobian matrix $A$ and the $O-C$ (observed minus calculated) vector $b$. More importantly, it is assumed that the reader understands the deﬁnition of a residual with respect to a least squares solution. For more information, the recommended reference text is Harvey’s Monograph 13[14]. A brief review of this material is provided in the early sections of chapter3

## 2Outliers and Gross Errors in Geodetic Survey Networks

### 2.1Appearance of Outliers in Existing Data

The appearance of outlier measurements in a network are often in one or more of the following forms:

• Wrong standpoint errors are where observations are taken from a witness mark near a trig station but this detail is missing from meta-data.
• Height booking errors describe incorrectly booked instrument height meta-data.
• Station name errors are measurements where one or both of the station numbers in the meta-data are incorrect, causing either
1. Skew in the network as a function of the measurement length (similar to wrong standpoint errors).
2. Separation of the station into two nearby points that therefore reduces redundancy and weakens the network at that point
• Deformation of the network over consecutive epochs.
• Incorrect ambiguity resolution where range estimates to satellites are incorrect in all coordinate components. A common cause is multi-path bias which is localised to individual stations.

The eﬀect of gross errors on a least squares adjustment has been shown to "spread" the error across two or more measurements when there is enough redundancy to do so[14]. In the case where gross errors can be small in magnitude relative to the magnitude of the measurement, such as a height booking error that is incorrect by 100mm over a GPS baseline of 50 km, it is possible for a signiﬁcant proportion of that error to be spread over the residuals of several neighbouring measurements.

#### 2.1.1Examples

Haasdyk et al[13][12] note examples of incorrect ambiguity resolution, where the length of the range to satellite(s) is therefore incorrect, and aﬀects all coordinate components. These manifested in very small (sub-metre and often sub-decimetre) gross errors that are sometimes undetectable in constrained local-neighborhood adjustments that rely on a "perfect" Class 2A0 control network.

In a related issue, multiple processing of the same observations necessitated by observing outliers when ﬁtting a local neighborhood adjustment to a larger neighborhood using a wider control network often resulted in several diﬀerent "versions" of the same baseline that are signiﬁcantly diﬀerent[13]. When using such measurements in a very large unconstrained network adjustment, the ability to "choose" the correct measurement is compounded by the quality of the surrounding network.

Observations with station name errors were sometimes easy to spot when the measurement appeared as a signiﬁcant outlier that traversed an unlikely distance. For example in Figure 1 the measurements highlighted in blue clearly shows measurements where one or more station names have been taken from the incorrect trigonometric station, often several hundred kilometres away.

In contrast, a station naming error that does not refer to an existing station but instead "creates" a new, nearby station in the adjustment can be diﬃcult to identify in a large adjustment. For example, in Figure 2 the station TS5110 appears as two stations — one at the apex of most measurements and the lower line is drawn terminating in its nominated location at TS5110-1. When cross-referencing with existing station names it appears that it is equivalent to TS5110-2.

Other examples related to the above as noted by Haasdyk in correspondence and in [13] are:

 Marks Comment Renamed TS5458 = TS12033-3 Same mark SS36065 = TS6418-4 Same mark Replaced TS5458 $\to$ TS12033 2mm apart (Destroyed) TS10533 $\to$ TS10985 20mm apart

### 2.2Review of Outlier Detection Methods

Caspary [4] notes that there is a diﬃculty in using the least squares vector of residuals $\stackrel{̂}{v}=v-{Q}_{v}{P}_{i}{\Delta }_{i}=v-{q}_{v}{p}_{i}{\Delta }_{i}$ (where ${\Delta }_{i}$ is the gross error on the ${i}^{th}$ measurement) in the detection of gross errors as outliers because the whole vector of residuals is contaminated by ${\Delta }_{i}$ . Roughly speaking, given a large system of parameters where the likelihood of gross error contamination is high, the number of marginally detectable errors can also be proportionally high.

The sensitivity of an adjustment in detecting outliers is referred to as the "internal reliability" [27]. As stated by Rizos [27], the internal reliability of an adjustment is quantiﬁed by the set of marginally detectable errors. A large set of MDEs indicates low internal reliability, and conversely the smaller the set of MDEs, the higher the internal reliability.

A common method of outlier detection in a least squares adjustment is Baarda’s data snooping method [27]. This method assumes the quality of the measurements is known reliably and, more importantly, it assumes that only one outlier exists in the set of observations. In everyday use on small networks this method works well because there are generally relatively few outliers present. However, when considering large and highly interconnected networks, this data-snooping method can sometimes fail due to bias introduced by outliers on neighbouring measurements.

Another commonly used method is Pope’s Tau test, which is primarily used when measurement quality is not reliably known. Instead of normalising residuals using a priori weights, the Tau test ﬁrst computes the standardised residuals using weights ${s}_{{v}_{i}}$ derived from the variance co-factor ($VCV$) matrix of the residuals computed from a least squares adjustment once it has converged. Each standardised residual is

${u}_{i}=\frac{{v}_{i}}{{s}_{{v}_{i}}}$

where ${s}_{{v}_{i}}$ is ${\sigma }_{{v}_{i}}$ scaled by the square root of the variance factor of the adjustment, which is

${s}_{{v}_{i}}=\sqrt{{\sigma }_{{v}_{i}}}×\sqrt{\frac{{v}^{T}Pv}{n-u}}$

where ${\sigma }_{{v}_{i}}$ is the square root of the ${i}^{th}$ diagonal entry in the co-factor matrix of the residuals ${Q}_{v}$.

The Tau test then compares each normalised residual with the Tau value

${\tau }_{\frac{{\alpha }_{0}}{2},n-u}$

where

${a}_{0}\approx 1-\left(1-\alpha {}^{\right)}\frac{1}{n-u}$

#### 2.2.1An Alternate Test for Outliers

The need for a robust method to detect more than one outlier in a large set of data, as justiﬁed above, has already been addressed in the form of an alternative adjustment. Rather than use the ${L}_{2}$ norm, that is the square root of the sum of squares of residuals, researchers such as Branham[3] argue that a statistically more likely method of revealing outlier observations reliably is by minimising the ${L}_{1}$ norm of the residuals, that is the sum of the absolute values of the vector of residuals

The choice of the ${L}_{1}$ norm minimiser over the least squares (that is, the ${L}_{2}$ norm) minimiser is precisely because it does not spread the magnitude of the residuals over the neighbouring measurements as evenly as possible. This poses the question: Why does the ${L}_{1}$ norm minimiser not spread residuals? The answer can be roughly explained by looking at the eﬀect of each on large and small residuals.

Say in a loop of height measurements, one measurement ${h}_{1}$ has a gross error of $-2m$. All measurements have an expected variance of $±100mm$ and neighboring measurements have no gross errors. Mathematical method aside, a least squares adjustment would "consider" the magnitude of a single residual ${v}_{{h}_{1}}=\left(-2{}^{\right)}2=4$ alone versus the possibility of assuming that this error magnitude was divided equally between ${h}_{1}$ and a neighboring measurement ${h}_{2}$ such that the error in each is assumed to be ${v}_{{h}_{1}}={h}_{1}-E\left({h}_{1}\right)=-1m$ and ${v}_{{h}_{2}}={h}_{2}-E\left({h}_{2}\right)=+1m$. In the latter case the ${L}_{2}$ norm is

$\begin{array}{lll}\hfill {v}_{{h}_{1}}^{2}+{v}_{{h}_{2}}^{2}=\left(-1{}^{\right)}2+\left(1{}^{\right)}2=1+1=2& \phantom{\rule{2em}{0ex}}& \hfill \end{array}$

which is clearly smaller than the ﬁrst case. A least squares minimiser will not stop there though - it would then consider spreading the error in ${h}_{1}$ and ${h}_{2}$ to their neighboring measurements, and so on until all error has been spread as evenly as possible such that the ${L}_{2}$ norm is at a minimum.

#### 2.2.2Penalising Large Residuals

In the discipline of mathematical optimisation this phenomenon where many smaller residuals are preferred over few large residuals would be referred to as penalising large residuals. This is precisely what least squares does as a by-product of obtaining a best ﬁt because it assumes that there are no outliers in the data. However, data containing several outliers necessitates a minimiser that does not penalise residuals by magnitude.

Using the above example, say an ${L}_{1}$ norm minimiser "considers" the $2m$ error as ${v}_{{h}_{1}}={h}_{1}-E\left({h}_{1}\right)=2$ and all other residuals as negligible ${v}_{{h}_{i}}<𝜖$, $i\ne 1$. It then considers spreading the error to a neighboring measurement ${h}_{2}$ such that ${v}_{{h}_{1}}+{v}_{{h}_{2}}=2$. The ${L}_{1}$ norm of the latter is

$|{v}_{1}|+|{v}_{2}|\ge |{v}_{1}+{v}_{2}|=|2|=2$

Here we can see that "spreading" the error across two residuals will result in a ${L}_{1}$ norm that is greater than or equal to the ${L}_{1}$ norm of restricting the error to the residual of one measurement. Hence, any ${L}_{1}$ norm minimiser would prefer the case where the residual error is assigned to as few measurements as possible.

The next chapter will elaborate on the analysis of the residuals of an ${L}_{1}$ norm minimiser with some example using geodetic observations to illustrate the relevance to this problem.

## 3Estimator Models

The introduction to estimator models is discussed in Chapter 3 of the thesis.

### 3.3The ${L}_{1}$ Norm Estimator

#### 3.3.1Sampling Distribution of ${L}_{1}$ Norm residuals

Monte Carlo experiments conducted by Marshall[23] indicate quite clearly that the sampling distribution of ${L}_{1}$ norm residuals is not the Gaussian (normal) distribution but instead exhibits a very large spike at the ﬁrst moment. Branham[3] states that this sampling distribution corresponds to a modiﬁed exponential (or alternatively, Laplace) probability density function

$P\left(X\right)=\frac{1}{2}h{e}^{-h|X|}$

This is a modiﬁed form of the exponential distribution, which in general form is

$P\left(X;\mu ,\sigma \right)=\frac{1}{\sqrt{2}\sigma }exp\left(-\sqrt{2}\frac{|X-⟨X⟩|}{\sigma }\right)$

For $h=1$ the function $P\left(X\right)$ is graphed in Figure 5.

By maximising the log density function $logP\left(X\right)$, the ﬁrst and second moments are

$\begin{array}{lll}\hfill {\mu }_{est}=E\left(X\right)& =median\phantom{\rule{2em}{0ex}}& \hfill \text{(7)}\\ \hfill {\sigma }^{2}=E\left({X}^{2}\right)& =\frac{\sqrt{2}}{N}\sum _{i=1}N|{X}_{i}-\mu |\phantom{\rule{2em}{0ex}}& \hfill \text{(8)}\end{array}$

The distortion of the ﬁrst moment with respect to outliers is restricted to movement by one "data point" rather than any proportion of the magnitude of the outlier. Assuming an equal distribution of positive and negative outliers and residuals, the ﬁrst moment will be zero. If the distribution of residuals of a ${L}_{1}$ minimiser happen to be biased such that they are mostly all positive or all negative, this would imply the model assumptions are incorrect. In a minimally constrained adjustment, this should not occur.

#### 3.3.2Minimally Constrained Models

As noted by Harvey[14], outlier detection is often done on a minimally constrained model. Ideally, no distance scale would be enforced by the measurements such that the resultant residuals would contain minimal bias. A minimally constrained adjustment is more likely to have an average error of zero.

Likewise, the ${L}_{1}$ minimiser that is used for quality analysis on a geodetic survey network could be applied successfully using a minimally constrained model.

#### 3.3.3Internal Reliability

Marshall[23] investigated the bounds of internal reliability (the relative magnitude of measurement error before it is statistically ﬂagged as an outlier) using Monte Carlo methods. By using a consistent a priori standard deviation on all measurements Marshall notes that the primary deviation in internal reliability is caused by variations in network geometry. In his horizontal network experiment of 6 stations (12 parameters) and 15 distance measurements he observed that across 100,000 Monte Carlo simulations where random bias was introduced into the measurements, one particular measurement (observation $#9$) was never ﬂagged as an outlier in an ${L}_{1}$ adjustment. Figure 6 below shows measurement $#9$ in Marshall’s trilateration network which clearly shows that there is enough redundancy for the four parameters (coordinates of stations $\Delta 1$ and $\Delta 5$) which it is estimating. It was the view of this author that there was perhaps another reason for the lack of outlier information output by the adjustment, which judging by the degrees of freedom (redundancy) in the network is likely due to there being not enough redundancy to adequately locate outliers for all measurements.

It is hypothesised by the author but not proven that the internal reliability of a network in the ${L}_{1}$ norm increases with an increase in redundancy. For measurements that can have a reverse or check measurement e.g. an EDM distance from ${\Delta }_{i}$ to ${\Delta }_{j}$ and another from ${\Delta }_{j}$ to ${\Delta }_{i}$ it is trivial to see a gross error in one of these measurements (although not trivial to know which measurement contains the gross error without further measurements). This presents the analyst with a dilemma: An adjustment on the whole network would likely have better redundancy than an adjustment on part of the network (given that the whole network is connected) but there are far more computations to perform in an adjustment of the whole network than a part of it. In fact, as discussed in chapter 4 the relationship between number of measurements and number of adjustment computations is in some cases exponential, which is troublesome when the network is very large.

### 3.4Weighted ${L}_{1}$ Minimisation

Due to measurement noise there is always the need to weight the importance of a measurement based on an a priori assumption of precision. GPS vectors at the NSW LPI are typically given a standard deviation of $±$(10mm + 7ppm) for least squares adjustments. This a priori assumption is carried into ${L}_{1}$ norm models but the scheme for weighting the model diﬀers in the method.

In least squares, the method allows a priori weighting of observations by inserting the square of each weight ${\sigma }_{i}^{2}$ corresponding to measurement ${g}_{i}$ as entries in a diagonal $m×m$ matrix $Q$.

$Q=\left[\begin{array}{cccc}\hfill {\sigma }_{1}^{2}\hfill & \hfill 0\hfill & \hfill \cdots \hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill {\sigma }_{2}^{2}\hfill & \hfill \cdots \hfill & \hfill 0\hfill \\ \hfill ⋮\hfill & \hfill ⋮\hfill & \hfill \ddots \hfill & \hfill ⋮\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill \cdots \hfill & \hfill {\sigma }_{n}^{2}\hfill \end{array}\right]$

By taking $W={Q}^{-1}$

$\stackrel{̂}{x}=\left({A}^{T}WA{}^{\right)}-1{A}^{T}W\stackrel{\to }{b}$

In the ${L}_{1}$ model, the objective function is

Due to the non-linearity of the objective function, the formulation of the problem must instead rely on a linear program, and thus the inverse matrix of squared weights would be applied as described by Harvey[15] as follows

${W}^{\frac{1}{2}}\left(A\stackrel{̂}{x}+\stackrel{\to }{𝜖}\right)={W}^{\frac{1}{2}}\stackrel{\to }{b}$

### 3.6${L}_{1}$ Norm Minimisation Formulated as a Linear Program

The ﬁrst step in formulating the ${L}_{1}$ norm minimisation problem as a linear program is to recognise that the objective function $z=\sum |{v}_{i}|$ is not linear. To make it linear, and to also satisfy the non-negativity constraint that linear programs typically require3 the objective function can be reformulated to be

$\begin{array}{lll}\hfill z=\sum _{i=1}^{n}\left(u+v\right)=\sum _{i=1}^{n}u+\sum _{i=1}^{n}v& \phantom{\rule{2em}{0ex}}& \hfill \text{(11)}\end{array}$

Where a residual is $u-v$ such that $u\ge 0$ and $v\ge 0$. These constraints satisfy $u+v=|u-v|$.

The normal (or model) equation for each parameter would be the same formulation as for least squares problems but with the residual included as a model parameter. For example, a height diﬀerence would be

$\begin{array}{lll}\hfill {ĥ}_{i}={h}_{2}-{h}_{1}+{u}_{i}-{v}_{i}& \phantom{\rule{2em}{0ex}}& \hfill \end{array}$

Similarly for least squares, each element in the Jacobian matrix for the ${L}_{1}$ norm minimisation is a partial derivative of the model equation (in order to linearise the problem). Hence for the height example the Jacobian matrix is already linear, that is

$\begin{array}{llll}\hfill \left[\begin{array}{cccccc}\hfill \frac{\partial {ĥ}_{i}}{\partial {h}_{1}}\hfill & \hfill \frac{\partial {ĥ}_{i}}{\partial {h}_{2}}\hfill & \hfill \cdots \hfill & \hfill \frac{\partial {ĥ}_{i}}{\partial {u}_{i}}\hfill & \hfill \frac{\partial {ĥ}_{i}}{\partial {v}_{i}}\hfill & \hfill \cdots \hfill \\ \hfill ⋮\hfill & \hfill \hfill & \hfill \hfill & \hfill \hfill & \hfill ⋮\hfill \end{array}\right]& =\left[\begin{array}{cccccc}\hfill -1\hfill & \hfill 1\hfill & \hfill \cdots \hfill & \hfill 1\hfill & \hfill -1\hfill & \hfill \cdots \hfill \\ \hfill ⋮\hfill & \hfill \hfill & \hfill \hfill & \hfill \hfill & \hfill ⋮\hfill \end{array}\right]\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$

For non-linear models a Taylor series of one or two terms is usually suﬃcient to achieve convergence given good enough a priori coordinates or parameters [14].

#### 3.6.1Error Estimates

Since the use of the ${L}_{1}$ norm adjustment in the context of this thesis is to highlight outlier observations, the question on whether to weight the observations prior to the adjustment must be speciﬁcally asked with regards to the desired outcome. In a GPS baseline network, there will exist very short lines of only a few metres and very long lines of hundreds of kilometers.

The separate issue of weighting the residuals should always be done according to the expected quality of the observations. An ${L}_{1}$ norm adjustment will typically be done prior to a least squares adjustment, and so the scaled residuals that are an output of "tuning" a least squares adjustment will often be unavailable to the analyst. A parts-per-million component is often the a crucial normaliser for residuals because of the large variation in accumulated error over long distances, for example the scale in long GNSS baseline vectors.

### 3.7Example: Similarities Between Loop Mis-Close and ${L}_{1}$ Norm Residuals

As stated already, the beneﬁt of the ${L}_{1}$ norm minimisation is primarily the analysis of residuals. In a survey network adjustment with equally weighted observations (all $\sigma =1$) a direct comparison of the residuals to loop mis-closes yields strong similarities.

Using an example of just height data from a NSW RMS survey on the Sydney Anzac Bridge (see Figure 7), the results obtained from B. Harvey’s FIXIT4 of ${L}_{1}$ raw residuals versus mis-closes is shown in Table 3. It can be seen that in this example, the loop mis-closes and the magnitude of the ${L}_{1}$ residuals for particular observations on the loops co-incide, speciﬁcally the residuals $|2.3|$mm and $|1.8|$mm on lines 2-3 and 5-2 respectively.

 From To dH (m) sdv (mm) $v$ (mm) Hei 1 3 17.5863 1.5 0.0 5 3 25.7134 1.5 0.0 2 3 18.1929 1.5 -2.3 2 4 18.2125 1.5 0.0 5 4 25.7353 1.5 0.0 5 2 7.5246 1.5 -1.8

 Table 2: Height diﬀerence observations of RMS survey(correspondence with Harvey) with ${L}_{1}$ residuals $v$

 Mis-close Height loop 4.1mm 2 3 5 2 1.8mm 2 4 5 2 -2.3mm 2 4 5 3 2

 Table 3: Loop mis-closes of RMS height network

An ${L}_{1}$ minimiser identiﬁes the measurements which have the propensity to introduce the largest residuals and excludes those measurements from the computation of the parameters. In this example height network, these residuals have the same magnitude as loop mis-closes because there are no correlations between input observations and input standard deviations are being ignored.

### 3.8Linear Program Formulation of the ${L}_{1}$ Norm Estimator

Continuing from Equation 11, each model equation takes the form

 ${W}^{\frac{1}{2}}A\stackrel{\to }{x}-{W}^{\frac{1}{2}}b-\stackrel{\to }{v}=0$ (12)

Note that $\stackrel{\to }{x}$ and $\stackrel{\to }{v}$ are vectors of parameters any of which can be positive or negative. This results in the problem being unbounded in the negative direction and as such this formulation needs modiﬁcation before it can be used in a linear solver. By taking two positive variables $u,w\ge 0$ we can say that each of the residual parameters can represented by $v=u-w$. This also applies to the coordinate parameters $x={x}^{+}-{x}^{-}$, ${x}^{+},{x}^{-}\ge 0$. By generalising, with coordinate parameters represented by ${\stackrel{\to }{x}}_{P}^{±}$ and residual parameters represented by ${\stackrel{\to }{x}}_{R}^{±}$, a linearised formulation of the ${L}_{1}$ norm estimator is

This is very similar to the formulation in Equations $3.24$ on page $71$ in Dantzig et al [7] except that the basic and non-basic variables ${\stackrel{\to }{x}}_{B}$ and ${\stackrel{\to }{x}}_{N}$ have been replaced with the parameters and residuals ${\stackrel{\to }{x}}_{P}^{±}$ and ${\stackrel{\to }{x}}_{R}^{±}$ respectively.4

Note also that the objective function has the coeﬃcients eﬀectively switched, so that instead of

$\begin{array}{lll}\hfill z& =0{\stackrel{\to }{x}}_{R}^{±}+{\stackrel{\to }{c}}^{T}{\stackrel{\to }{x}}_{P}^{±}\phantom{\rule{2em}{0ex}}& \hfill \text{(13)}\\ \hfill & \phantom{\rule{2em}{0ex}}& \hfill \text{(14)}\end{array}$

we have

This is an explicit formulation of a linear program but it is still solvable by any variation of Dantzig’s simplex methods (primal, revised, primal-dual etc) and by interior point methods for linear programming problems (explored in Chapter 4).

## 4Algorithms For The ${L}_{1}$ Norm Estimator

This chapter will begin with Section 4.1 describing the Simplex algorithm followed by an example demonstrating how it works on survey height data. Subsequently the concept of interior point methods will be introduced and applied to the same example.

### 4.1Search-based Simplex Algorithm

Use the standard form of the ${L}_{1}$ problem from Chapter 3, replicated below:

To re-iterate, the notation ${\stackrel{\to }{x}}_{P}^{±},{\stackrel{\to }{x}}_{R}^{±}$ refers to the strictly positive nature of the parameters. This is especially pertinent in non-linear problems where at each iteration the delta $x$ is estimated and is often negative. When a coordinate (or delta) has a negative value, the parameter ${x}_{i}^{-}$ contains the absolute value and ${x}_{i}^{+}=0$.

A search must always start somewhere, and in a linear program (LP) this is the canonical form.

#### 4.1.1Canonical Form and Basic Variables

As noted by Dantzig[8] the primal simplex method ﬁnds the optimal solution by a sequence of pivot steps from the original system of equations. Initialisation of the algorithm is achieved by converting the problem from standard form to canonical form. This is where the current solution is a basic feasible solution and:

1. The objective coeﬃcients to all the basic variables are zero, that is, the objective function contains only non-basic variables;
2. The constraint matrix for the basic variables form an identity matrix (with some permutation if necessary). It’s easy to see whether or not the current Basic Feasible Solution (BFS) is optimal or not, if the LP is in canonical form

The objective function

$z={\stackrel{\to }{c}}^{T}\stackrel{\to }{x}={\stackrel{\to }{e}}^{T}{\stackrel{\to }{x}}_{R}^{+}+{\stackrel{\to }{e}}^{T}{\stackrel{\to }{x}}_{R}^{-}+0{\stackrel{\to }{x}}_{P}^{+}+0{\stackrel{\to }{x}}_{P}^{-}$

and constraints

$\begin{array}{llll}\hfill {B}^{T}\stackrel{\to }{x}& =\left[\begin{array}{ccccccc}\hfill I\hfill & \hfill ⋮\hfill & \hfill -I\hfill & \hfill ⋮\hfill & \hfill A\hfill & \hfill ⋮\hfill & \hfill -A\hfill \end{array}\right]\left[\begin{array}{c}\hfill {\stackrel{\to }{x}}_{R}^{+}\hfill \\ \hfill {\stackrel{\to }{x}}_{R}^{-}\hfill \\ \hfill {\stackrel{\to }{x}}_{P}^{+}\hfill \\ \hfill {\stackrel{\to }{x}}_{P}^{-}\hfill \end{array}\right]=b\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$

permits the initial basic feasible solution

$\begin{array}{llll}\hfill {\stackrel{\to }{x}}_{P}^{±}& =\stackrel{\to }{0}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill {\stackrel{\to }{x}}_{R}^{±}& =b-A\stackrel{\to }{x}=\stackrel{\to }{b}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill z& =\sum \underset{R}{\overset{+}{x}}+\sum \underset{R}{\overset{-}{x}}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$

#### 4.1.3Exponential Computational Time Complexity

Before continuing, it is necessary to discuss the computational time complexity of the algorithm proposed by Barrodale and Roberts[2] and the simplex method in general.

Brieﬂy, computational time complexity of an algorithm is generally assessed with regards to the rate of growth of computational time with respect to the size of the input. This time complexity is typically described using three metrics - minimum time or best-case time complexity $f\left(n\right)=\Omega \left(g\left(n\right)\right)$, average time complexity $f\left(n\right)=\Theta \left(g\left(n\right)\right)$ and worst-case time complexity $f\left(n\right)=○\left(g\left(n\right)\right)$. $f\left(n\right)=\Omega \left(g\left(n\right)\right)$ represents the lower bound function for the run time of an algorithm $f\left(n\right)$ where $n$ denotes the size of the input (typically an array). $f\left(n\right)=\Theta \left(g\left(n\right)\right)$ represents the run time of an algorithm $f\left(n\right)$ when averaged over all permutations of an input of size $n$. This metric is a good indication of how the algorithm will behave in the real world using real measurements, however we should always consider the worst-case time complexity $f\left(n\right)=○\left(g\left(n\right)\right)$ where $g\left(n\right)$ is the upper bound of the rate of growth of computational time of $f\left(n\right)$. This latter metric describes just how poorly the algorithm can perform in the worst case.

When considering the simplex method to solve for the ${L}_{1}$ Norm estimate as proposed by Barrodale and Roberts[2] and its application in this thesis in ﬁnding outliers in a large set of GPS observations, the size of the input will always be large (over 100,000 measurements). Thus if the rate of growth of computational time in Barrodale and Robert’s ${L}_{1}$ method [2] is non-linear, it is of great importance to consider time complexity.

An unfortunate aspect of the simplex method on which the Barrodale and Roberts ${L}_{1}$ algorithm is based is, while it performs well in most cases, there exists a set of inputs which make the algorithm run in exponential time. Using the notation described above, the average case $\Theta \left(g\left(n\right)\right)$ is polynomial and often linear but the worst-case $○\left(g\left(n\right)\right)$ is exponential, i.e. $g\left(n\right)={a}^{bn}$ for some constants $a$ and $b$. Exponential time complexity was shown by Klee and Minty [19] to exist in cases where a hyper-cube of constraints can lead to ${2}^{d}-1$pivots where $d$ is the dimension of the hyper cube. See the Appendix A.

#### 4.1.4The Dilemma

As was raised brieﬂy in Chapter 3, there is a desire for the analyst to preserve as much redundancy as possible in order to increase the likelihood of properly detecting all gross errors in an adjustment. Since the problem of exponential time complexity is very real in the application of Dantzig’s simplex method on large sets of data (as experienced by the author in practice) this presents a dilemma.

To balance the desire for maximum redundancy with the desire for reasonable computation time, two approaches can be taken. The ﬁrst is to ﬁnd a way to reduce the size of an adjustment whilst preserving maximum redundancy. The second approach is to ﬁnd a way to reduce the time complexity of the algorithm used to perform the adjustment. The next section and Chapter 5 address the latter approach, and Chapter 7 attempts to address the former. The two approaches need not be exclusive.

### 4.2Interior Point Algorithms

Recent developments in the ﬁeld of optimisation and operations research have yielded several new methods for solving linear programs other than Dantzig’s[7] simplex method. Since the publication of Klee and Minty’s [19] examples of worst-case exponential computational time, researchers have made developments in the ﬁeld of ellipsoid algorithms employed by Leonid Khachiyan in 1979 that run in worst-case polynomial time. Although in the average case the simplex method far outperforms early ellipsoid methods, the idea of iteratively moving through the interior of a problem space to converge at the optimal solution at a boundary yielded a ground-breaking result less than 30 years ago.

In 1984 Karmarkar published a paper on an interior point method[16] where feasible solutions are ’projected’ from within a convex solution space to the best feasible solution. The worst-case computational time was determined to be $○\left({n}^{3.5}\right)$, and that in practice it often performed faster than the simplex method on large sparse problems.

The method described below is is an earlier interior point algorithm known as Dikin’s Aﬃne Scaling Interior Point Method[8], ﬁrst published by Dikin in 1967 with a proof of convergence in 1974. This algorithm does not possess the convergence properties of Karmarker’s algorithm but does demonstrate the concept of the central projections.

### 4.3Dikin’s Primal Aﬃne Method

From [8] the following algorithm summarises the Primal Aﬃne Method for ﬁnding an interior point solution to a linear program.

Assume that an initial feasible interior point solution $\stackrel{\to }{x}={\stackrel{\to }{x}}_{0}>0$ satisfying $A{\stackrel{\to }{x}}_{0}=\stackrel{\to }{b}$ is known. The steps of the algorithm are as follows:

1. Initialize Counter: Set $t←0$.
2. Create $D$: Set $D=Diag\left({\stackrel{\to }{x}}_{t}\right)$.
3. Compute Centering Transformation: Compute $Â=AD,ĉ=D\stackrel{\to }{c}$.
4. Determine the Projection Matrix: Compute $P=I-{A}^{T}\left(A{A}^{T}{}^{\right)}-1A$.
5. Compute Steepest Descent Direction: Set ${\stackrel{\to }{p}}^{t}=-P\stackrel{\to }{c}$.
6. Set $\Theta =-mi{n}_{j}{p}_{j}^{t}$.
7. Test for Unbounded Objective. If $\Theta \le 0.0$ report the objective as being unbounded and stop.
8. Obtain ${\stackrel{\to }{x}}^{t+1}$: Compute
${\stackrel{̂}{x}}^{t+1}=\stackrel{\to }{e}+\left(\frac{\alpha }{\Theta }\right){\stackrel{\to }{p}}^{t}$

where $\stackrel{\to }{e}=\left[1,1,\cdots \phantom{\rule{0.3em}{0ex}},1{}^{\right]}T$ and $\alpha$ is set to a number strictly between 0 and 1. Typically, $\alpha$ is set to be between 0.5 and 0.95 for primal aﬃne methods (in most implementations it is set between 0.9 and 0.95).

9. Transform Back to Original Space: Compute ${\stackrel{\to }{x}}^{t+1}=D{\stackrel{̂}{x}}^{t+1}$.
10. Termination Check: If ${\stackrel{\to }{x}}^{t+1}$ is "close" to ${\stackrel{\to }{x}}^{t}$, report ${\stackrel{\to }{x}}^{t+1}$ as "close" to optimal and stop.
11. Set $t←t+1$ and go to Step 2.

#### 4.3.1Initialisation

An interior point method such as the aﬃne scaling method described above requires an initial solution on which to begin iterating. As the reader may notice in step 3 the centering transform uses the initial solution to project the entire solution space around the point $⟨1,1,\cdots \phantom{\rule{0.3em}{0ex}},1⟩$. Since zero values in the initial solution could not be projected in such a way (and also values on a boundary of the solution space, that is a parameter that lies on a constraint hyperplane) the initial solution must be interior to the solution space.

An interior point is not found trivially, and as such methods have been developed to ﬁnd an interior point. One such method used here is the Big M method, which adds an additional variable ${x}_{M}$ to the problem $Ax=b$ such that a new problem ${A}^{\prime }{x}^{\prime }=b$ is solved. ${A}^{\prime }$ is simply $A$ with an extra column of coeﬃcients for ${x}_{M}$. The objective function coeﬃcient of ${x}_{M}$ is a very large value denoted $M$ (the namesake of the method). This large value of $M$ acts as a penalty such that the solver will try to reduce the value of this variable as much as possible in order to minimise the impact of $M{x}_{M}$ on the objective value (which in the ${L}_{1}$ norm problem is the sum of absolute values of the residuals).

The coeﬃcients of ${x}_{M}$ in ${A}^{\prime }$ are calculated such that, given an initial point of ${x}^{\prime }=⟨1,1,\cdots \phantom{\rule{0.3em}{0ex}},1⟩$, all of the constraints are satisﬁed. As such, the

$\begin{array}{llll}\hfill {A}^{\prime }{x}^{\prime }& =b\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill \left[A⋮{A}_{M}\right]\left[\begin{array}{c}\hfill x\hfill \\ \hfill \cdots \hfill \\ \hfill {x}_{M}\hfill \end{array}\right]& =b\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill {A}_{M}{x}_{M}& =b-Ax\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =b-A{⟨1,\cdots \phantom{\rule{0.3em}{0ex}},1⟩}^{T}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$

By setting ${x}_{M}=1$, ${A}_{M}$ is easily calculated using the above relationship to obtain the new problem

#### 4.3.2Heights Example Using Aﬃne Scaling Solver

The Matlab implementation of the aﬃne solver in Appendix B was applied to the RMS heights example used earlier in this chapter. The formulation of the $A$ matrix and $b$ vector was the same as for the simplex method. The value of $M$ was set to 1200 for initialisation. The algorithm converged after 55 iterations, thus it was not practical to show the values of the parameters at each iteration in this section. However, the rate at which the parameters converged to the accepted values are visible in Figure 8 for the station heights and the residual parameters. It should be noted that this algorithm yielded a diﬀerent result for the residuals as noted in Table 5.

 Measurement 1 2 3 4 5 6 Residual (mm) 0 -1.2 1.2 0 0 1.8

 Table 5: Residuals of Aﬃne Scaling ${L}_{1}$ adjustment

It can be seen in Figure 9 the value of the Big M variable drops from 1 to nearly zero very quickly, demonstrating a correct initialisation of the problem and convergence to the initial problem. The stopping criteria used was the primal-dual gap (see Appendix B for the deﬁnition in Matlab). The convergence of the solution is accepted when the value of the primal-dual gap drops below a speciﬁed tolerance (usually 0.0001).

#### 4.3.3Implementation for Larger Problems

Clearly convergence after 55 iterations is not an acceptable solution for such a small problem. As an academic exercise, this method was applied to larger problems and exhibited similarly slow performance. Convergence was not achieved for a problem of $195$ GNSS measurements and $92$ stations (deﬁning a problem size of $n=595,m=276$) using this algorithm.

Dikin’s aﬃne scaling algorithm was implemented in C++ using the Armadillo[28] linear algebra library. A warm start was used whereby initial ECEF coordinates converted from geodetic coordinates (in GDA94) using Gpstk[29] and the ellipsoid parameters of GRS80[10]. ECEF coordinates were then translated to be centred at the apriori coordinates of the ﬁrst station in the adjustment. The $A$ matrix and $b$ vector were then normalised (scaled to approximately unity) using the following relationship

$\begin{array}{lll}\hfill \left(\frac{1}{\parallel A{\parallel }_{2}}A\right)\left(\frac{1}{\parallel b{\parallel }_{2}}\stackrel{\to }{x}\right)=\frac{1}{\parallel A{\parallel }_{2}\parallel b{\parallel }_{2}}\stackrel{\to }{b}& \phantom{\rule{2em}{0ex}}& \hfill \end{array}$

The author attempted to speed up the Aﬃne Scaling method by warm starting the solver with input coordinates and residuals that would be close in value to the ﬁnal solution. These were obtained from the input bath_syd_stn.xml ﬁle in GDA94 geodetic coordinates and converted to GDA94 ECEF Cartesian coordinates using the Position class in gpstk[29].

Although in small problems a warm start oﬀers no advantage over a cold start (whereby there is a phase I problem initialisation), in very large problems using a solver with polynomial time convergence properties a warm start can assist with providing a correct initialisation and reduce the phase I run time by a constant (usually between 0.3 and 0.5). For problems that can take hours to days to solve this time saving is signiﬁcant.

The next chapter will discuss an alternate formulation of the ${L}_{1}$ problem and will demonstrate the application of a faster solver with better convergence properties. Part II APPLICATIONS OF ${L}_{1}$ NORM MINIMISERS IN LARGE SURVEY NETWORKS

Further analysis of ${L}_{1}$ norm minimisation algorithms and results pertaining to survey networks and reliability

## 5More Formulations of the ${L}_{1}$ Minimiser

The standard form of the ${L}_{1}$ problem from Chapter 3, replicated below, is appropriate for some simplex-based solvers but suﬀers from numerical instability issues using many open-source interior point method solvers1 .

It is necessary to remodel the problem such that it is "easier"2 for an IPM solver to ﬁnd a solution. One approach is to ﬁnd a solution on the dual problem.3 By ﬁrst converting the primal problem from above to a form that allows negative parameters (as allowed by some solvers because of automatic remodeling), below

Note that the row constraints are no longer bound by a strict equality $=$. This allows for residuals that are negative to be speciﬁed exactly in one of a pair of complementary constraints, with the other permitting the former equality with a positive surplus variable. This means

$\begin{array}{llll}\hfill {a}_{i}{x}_{i}+{a}_{j}{x}_{j}+{r}_{k}& \ge {b}_{k}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill -{a}_{i}{x}_{i}-{a}_{j}{x}_{j}+{r}_{k}& \ge -{b}_{k}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$

becomes

 $\begin{array}{cccc}\hfill {a}_{i}{x}_{i}+{a}_{j}{x}_{j}+{r}_{k}& \hfill +{s}_{k}\hfill & \hfill \hfill & ={b}_{k}\hfill \\ \hfill -{a}_{i}{x}_{i}-{a}_{j}{x}_{j}+{r}_{k}& \hfill \hfill & \hfill +{s}_{k}^{c}\hfill & =-{b}_{k}\hfill \\ \hfill \end{array}$ (19)

Subsequently, by using the Tucker Diagram (see Appendix I Table 7) the primal problem in Equation 18 can be converted to the dual problem

Note that, according to the rules deﬁned by the Tucker diagram (Appendix I Table 6), the $\ge$ inequality in the row constraints becomes a $y\ge 0$ lower bound for the dual variables. This problem is far "easier" to solve using Mehrotra’s primal-dual solver.

The above approach to remodeling the problem in the dual form, although mathematically correct, demonstrates nothing of the relevance of the dual formulation to the original problem in Equation 18. To see it another way, consider the constraints in the primal problem as being bounded from below by the measurements. The primal problem objective is to minimise the residual parameters such that this lower bound still holds. In the dual formulation we ﬁrst substitute every original parameter with ${x}_{i}=1$ and then scale each constraint $j$ by a parameter ${y}_{j}$ such that for each column (corresponding to the coeﬃcient ${x}_{i}$ in the primal) in the constraint matrix the scalar product of the coeﬃcients with ${y}_{1}\cdots {y}_{n}$ is bounded above by the corresponding coeﬃcient ${c}_{i}$ of ${x}_{i}$ in the objective function. It intuitively follows that by maximising the values ${b}_{j}{y}_{j}$ subject to the constraints ${a}_{1,i}{y}_{1}+\cdots +{a}_{n,i}{y}_{n}\le {c}_{i}$ this eﬀectively maximises the overall "utility" of the measurements whilst being constrained by the "utility" of each parameter.

As shown by Dantzig and Von Neumann[8], if the primal problem has a feasible solution then the dual also has a feasible solution and the value of the objective function of the dual problem $v$ in an optimal feasible solution is the same as the value of the objective function $z$ in the primal problem when it is optimal, i.e. $z=v$ for optimal solutions of the primal and dual. This is stated more formally in the Strong Duality Theorem.

Strong Duality Theorem 5.0.1. If the primal system min $z={c}^{T}x$, $Ax\ge b$, $x\ge 0$ has a feasible solution and the dual system max $v={b}^{T}y$, ${A}^{T}y\ge c$, $y\ge 0$ has a feasible solution, then there exist optimal feasible solutions $x={x}^{\ast }$ and $y={y}^{\ast }$ to the primal and dual systems such that .

It remains to determine how to recover the residuals from the optimal solution to the dual problem. In summary, by the Complementary Slackness Theorem (below) the reduced costs of the slack (or surplus) variables in the optimal solution to the dual problem are equal to the values of the slack variables in the optimal solution to the primal problem.

Complementary Slackness Theorem 5.0.1. Assume a primal problem ($P$) has a solution ${x}^{\ast }$ and the dual problem ($D$) has a solution ${y}^{\ast }$.

1. If ${x}_{j}>0$ in ($P$), then the ${j}^{th}$ constraint in ($D$) is binding (this means the slack variable in the dual is zero).
2. If the ${j}^{th}$ constraint in ($D$) is not binding, then ${x}_{j}=0$.
3. If ${y}_{i}>0$, then the ${i}^{th}$ constraint in ($P$) is binding.
4. If the ${i}^{th}$ constraint in ($P$) is not binding, then ${y}_{i}=0$.

By the exclusive slackness property of the primal problem as shown in Equation 19, assuming that one of the constraints has a zero-value surplus value (i.e. it is a binding residual constraint) and thus the positive value of the residual for that constraint is also binding, the complement constraint has a non-zero-valued surplus value of exactly the negative value of the binding residual in the ﬁrst constraint. By item $4$ in the Complementary Slackness Theorem, a non-zero positive surplus variable in the primal problem (i.e. the constraint is not binding) infers that the corresponding value of the variable in the dual problem is zero, which means it is not in the dual basis and thus has a non-zero reduced cost.

Now, when the dual solution is optimal, the value of all of these reduced costs will be positive or zero (by the Optimality Test Theorem). The cost of a basis exchange to add one of these variables with a positive reduced cost to the basis of the dual solution is precisely the magnitude of the non-zero-valued surplus variable in the corresponding row constraint in the primal problem. Put more simply, the surplus value representing the residual in the primal problem is exactly equal to the reduced cost of this surplus variable in the dual problem.

This has the side eﬀect that all residuals will be reported as absolute values by this method since the eﬀect of the exclusive slackness constraints 19 represents the familiar model for an absolute value in the dual problem. For the purposes of outlier detection, this is not an issue because only the magnitude of the normalised residual is considered in the outlier test.

### 5.1${L}_{1}$ Minimiser Model in the Gnu Linear Programming Kit

The Gnu Linear Programming Kit (GLPK)[21] is an open source library for linear programming that contains two separate solvers:

1. A simplex-method (SM) solver.
2. An interior-point-method (IPM) solver.

The SM solver has support for primal and dual problems. This is demonstrated to be very robust and resilient to numerical instability.

The IPM solver uses a modiﬁed implementation of Mehrotra’s primal-dual method for interior point problems[25]. It is more exposed to numerical instability than the simplex method as demonstrated by the problem in this chapter.

Both solvers have pre-processing stages that remodel the problem to a standard form. This allows the user to specify inequality constraints and negative bounds on parameters which is usually disallowed by the underlying algorithms. Extra constraints are added to the model to permit the problem speciﬁcation in this way, but care must be taken to appropriately specify a model such that it will have a feasible solution and is not unbounded. Using the above dual formulation of the ${L}_{1}$ problem on the entire set of more than 66,000 GNSS observations, GLPK was able to converge to an optimal solution in 30 seconds on a standard laptop computer.

## 6${L}_{1}$ Norm Analysis on NSW GNSS Network

The focus of an argument for using ${L}_{1}$ Norm minimisation for outlier detection hinges on reliability. As discussed by Baarda[1], Pope[26] and subsequently many others[27][15], the internal reliability of a model design is measured by how large an error must be before it is detectable. As Pope notes[26]:

$\cdots \phantom{\rule{0.3em}{0ex}}$ the traditional objections to least-squares adjustments of triangulation; that they cannot be relied upon to localize the error and instead "Spread it around" $\cdots \phantom{\rule{0.3em}{0ex}}$

which identiﬁes two critical aspects in data snooping:

1. The initial detection of an error
2. The localisation of the error.

While an examination of measurement cycle mis-closes will reveal the existence of most gross errors, the localisation of these errors requires more than a mis-close analysis.

### 6.1Residuals

The ﬁt of residuals to the Laplace probability density function (described in Chapter 3) was carried out in Mathematica using Pearson’s Chi-Square ﬁt test. The histogram of residuals for the initial set of observations is in Figure 10 below. Note that there is always a "spike" in the histogram at the zero level. This correlates exactly to the number of parameters in the basis $u$, with the remainder of the histogram representing the degrees of freedom $n-u$. By removing the residuals corresponding to all basis parameters the histogram of residuals more closely ﬁts the Laplace probability density function.

The Pearson Chi-Square ﬁt of the Laplace probability density function was calculated for the histogram of non-zero standardised residuals. The parameters of this distribution, with an assumed mean of zero, are $\mu =0,\beta =0.665766$ (or instead of $\beta$, the $h=\frac{1}{\beta }=1.502$ using the model deﬁned by Branham[3]). The sigma threshold was calculated at the 99% conﬁdence level on this distribution was solved from the cumulative density function

$\begin{array}{c}\begin{array}{cc}\hfill 1-{e}^{\genfrac{}{}{0.1ex}{}{1}{2}\left(-1.502\genfrac{}{}{0.1ex}{}{\stackrel{^}{v}}{\sigma }\right)}& =0.995\hfill \\ \hfill \stackrel{^}{v}& =3.06\sigma \hfill \end{array}\hfill \end{array}$

By performing a cursory removal of measurements demonstrating outlier behavior from the initial set of data in two passes the resultant reduced data set still contained outliers that "reappeared" after the ﬁrst two passes, but the maximum magnitude the standardised residuals ($\frac{|v|}{|s|}$) of outliers decreased from $±$181.2 to $±$8.6. The number of residuals above the sigma threshold of $\stackrel{̂}{v}=±3.06\sigma$ decreased from 4,486 to 1,698.

The histogram of "degrees of freedom only" residuals of the "cleaner" data set is visually no diﬀerent to that of the initial data set due to the sheer amount of data. Additionally, the parameters for the Laplace probability density function for these standardised residuals are essentially identical, which demonstrates that the outliers removed from the "dirty" data set imparted no bias on the network model in a minimally constrained ${L}_{1}$ adjustment. A visual comparison of the network graph between the initial data set and the "cleaner" set is in Figure 11.

To properly address outliers in such a large data set, it is advisable to construct an automated procedure to iteratively report on and test the measurement with the largest outlier on a network adjusted with and without the said measurement until no standardised residuals test above a speciﬁed sigma threshold. This course of action is recommended for further research and development.

### 6.2Reliability Tests

In a continuation of the method proposed by Marshall [23] a Monte Carlo based simulation was executed on the "cleaner" data set obtained in the above section. The test was conducted by adding random uniformly distributed artiﬁcial gross errors in the range ${\Delta }_{i}\in \left[-1000mm,1000mm\right]$ to 100 randomly selected measurements in the data set across 1000 passes. The results from each pass were analysed in Matlab using the code in Appendix D which classiﬁed the residuals of each adjustment into four groups:

1. True negative: No artiﬁcial bias was added and measurement was not ﬂagged as an outlier.
2. True positive: Artiﬁcial bias was added and measurement was ﬂagged as an outlier.
3. False negative: Artiﬁcial bias was added and measurement was not ﬂagged as an outlier.
4. False positive: No artiﬁcial bias was added and measurement was ﬂagged as an outlier.

The true negatives are displayed in Figure 12, the false positives are displayed in Figure 13 and the false negatives are displayed in Figure 14. Note that the rate of false positives is relatively low except for standardised residuals near the sigma threshold which identiﬁes that there is a concentration of marginally detectable errors in the simulation and thus implies the propensity for reliability errors in the NSW network.

A visual comparison of the true negatives and false negatives in Figures 12 and 14 would indicate that the distribution of the false negatives with respect to the value of the artiﬁcial gross error is reﬂected in observations not present in the true negatives, where the latter would ideally display a ﬂat uniform distribution over the entire spectrum of gross errors. The shape of the distribution of false negatives demonstrates that, above a "noise ﬂoor" of an approximate failure rate of detection of between 5% and 10%, the failure rate of detection increases sharply once the magnitude of the bias drops below 0.1 metres. Between a bias magnitude of 0.1 and 0.2 metres the failure rate of detection does also demonstrably increase relative to the range outside $±0.2m$.

#### 6.2.1Localisation of False Negatives

The localisation of the propensity for failure to detect gross errors as outliers has not been addressed in this thesis. However, this would be a viable avenue as a topic of further research. Perhaps the network neighborhood aggregation discussed in Chapter 7 could be of use in this area.

## 7Datum Network Segmentation

As postulated by Marshall [22] there is little known about the pre-analysis of robust estimators such as the ${L}_{1}$ norm.

One solution to reduce the overhead time complexity of an ${L}_{1}$ adjustment is to reduce the problem size (or dimension) by dividing the problem into a series of sub-problems that have the property that the union of the outliers detected across all sub-problems is equivalent to the outliers detected in the initial problem.

The problem of dividing a network of measurements in to a set of sub-networks (this term will be interchangeable with the term segments from here-on) is made diﬃcult by the constraint mentioned above, which in the context of this problem can be expanded into the following:

Every measurement contained in the original network that has redundancy is contained in at least one of the sub-networks with redundancy.

This thesis will investigate the eﬃcacy of dividing the network by the detection of a basis of cycles (or loops) of measurements and to group these cycles into segments based on the overlapping observations shared between cycles.

The hypothesis of correctness of the above proposal lies in the property that we wish to preserve, that is, the set of outlier observations. The magnitude of the error of these outliers is not as important as whether or not these observations are ﬂagged as such in all cases. In order to test this hypothesis, a series of simulations will be run in order to ascertain the statistical likelihood of correctness. These simulations will consist of simple permutations of outliers in across permutations of network geometries.

### 7.1Cycle Detection

The problem of detecting cycles lies in the properties of the cycles we wish to obtain. Since a complex network can contain many more cycles than there are edges (or measurements) in the network, we wish to obtain a minimally spanning cycle basis for the network in order to eﬀectively minimise the number of adjustments performed. This task can be complicated depending on how much we want to minimise this cycle basis.

The chosen cycle detection algorithm was achieved using the following basic process:

• Find a spanning forest in the network.
• For each tree in the forest
• For each back-edge not in the spanning tree
• Find the smallest path joining the start and end nodes of the back-edge using A* search. This path cannot contain the back-edge.

### 7.2Search Algorithms

It was desired that a set of cycles be obtained from a network such that:

• Every measurement in the network that can belong to a cycle does belong to at least one cycle
• The cycles must be as small as possible.

The cycles are then grouped to ensure high commonality between measurements and stations in a segment.

### 7.3The $A\ast$ search algorithm

The most eﬀective and commonly-used search algorithm for scenarios deﬁned by a single goal or target in a search space that can always allow a reasonable (and admissible) method of predicting the distance from a potential solution candidate to the goal state is the $A\ast$ (pronounced A-star) search. Initially proposed in 1968 by Nils Nilsson [citation needed] as a faster algorithm to select paths for robot navigation, the $A\ast$ search has persisted in popularity due to much the same reason as Dantzig’s simplex method - the good average-case computational time complexity and the real-world nature of the conditions under which it can be applied.

$A\ast$ is eﬀectively an iterative state-based path-traversal search whereby a queue of possible solution candidates at various points in the search space is maintained and ordered by the cost. This cost is calculated as the sum of the cost (or distance) the candidate has traversed already plus an estimate or heuristic of the minimum cost that is expected to be incurred for this candidate to reach the goal state.

In the context of a cycle search in a geodetic network, this search algorithm can be applied using the following conditions and metrics:

• Nodes are stations in the geodetic network, and edges are the measurements.
• Each edge is required to belong to at least one cycle, hence every edge will incur a cycle search for that edge.
• The start node is the ﬁrst node of the edge being searched. The goal node is the second node of the same edge.
• The edge being cycle-searched is removed from the set of edges that can be selected as candidates in the A* search.
• The additional cost of a traversal is the Euclidean distance $||\Delta X||$ from the last node to the next node.
• The admissible heuristic is the straight line distance from the last node in the candidate to the goal node.

The algorithm for this search is summarised in pseudo-code in 15. This algorithm was eﬀectively implemented in the program LDNet. See Appendix F for a C++ implementation of the below pseudocode algorithm.

1:  procedure CYCLESEARCH($e,G$)
2:   Let $Q$ be an ascending-cost-sorted queue of edges remaining to be searched
3:   Let $C$ contain a list of edges that have been searched already
4:   Remove $e$ from $G$
5:   ${e}_{previous}←END$
6:   Enqueue $e$ onto $Q$ with cost MAXCOST
7:   while $Q$ contains edges to be searched do
8:   $d←dequeue\left(Q\right)$
9:   if $d.nod{e}_{2}=e.nod{e}_{2}$ then
10:   return Path traced by ${d}_{previous}$ until $END$ is reached.
11:   end if
12:   for All adjoining edges $h\in G$, $h\ni C$ do
13:   ${h}_{previous}←d$
14:   ${h}_{pathlength}←{d}_{pathlenth}+{h}_{length}$
15:   ${h}_{cost}←{h}_{pathlenth}+distance\left(h.nod{e}_{2},e.nod{e}_{2}\right)$
$⊳$Path cost plus estimated remaining distance as the admissible heuristic

16:   $Q←enqueue\left(h\right)$ by ${h}_{cost}$
17:   end for
18:   end while
19:  end procedure

 Figure 15: Adaptation of the A* search to ﬁnding the shortest cycle using back-edge $e$ in a network graph $G$ using a distance heuristic

### 7.4Grouping Cycles

Initial testing of the L1 adjustment was done on networks of naively grouped cycles. This yielded results that conﬁrmed that the algorithm performed correctly but ultimately did not provide an easily identiﬁable way of analysing the resultant residuals of the measurements in the network as a whole.

It was hypothesised that by intelligently grouping a subset of measurements from the whole network, an ${L}_{1}$ norm analysis of these measurements could yield reliable information on the residuals of a subset of these measurements. The author proposes that for each measurement ${o}_{xyz}$ in the whole network $G$, (i.e. for ${o}_{xyz}\in G$, an ${L}_{1}$ adjustment on a neighbourhood of measurements $V$ about ${o}_{xyz}$ of appropriate depth would yield reliable residuals for ${o}_{xyz}$ relative to the whole network. See Appendix G of the thesis for an implementation of the aggregation algorithm in C++.

The algorithmic robustness of neighbourhood aggregation would be compromised if a station has a particularly high degree (number of measurements to it). This phenomenon exists in the provided set of GNSS baseline data, particularly around TS12042 which has more than 30,000 measurements to it. To allow the simplex method to solve within reasonable time, the number of measurements should not exceed 1000 per segment. Therefore in such cases it may be necessary to further split the data into overlapping neighborhood partitions by taking a comb of measurements at stations with a degree more than 500. When testing for outliers in each neighborhood partition it is required that there are no false negatives, therefore a rigorous method must be devised for this neighborhood partitioning.

### 7.5Future Research

As noted earlier, testing of internal reliability on a ﬁxed size neighborhood relative to the whole network did not commence during the term of research. Should another researcher be interested in pursuing this avenue then all available source code will be provided by the author.

## 8Concluding Remarks

The application of fast linear solvers to the ${L}_{1}$ minimisation problem in a large network has demonstrated the viability of such a device in the quality analysis stage of datum determination. It is not diﬃcult to see the automation possibilities to achieve near-real-time datum quality analysis. As noted by Haasdyk et al [12] to achieve the measurement density required for a dynamic datum the use of crowd-sourced survey data is almost inevitable and as such a rigorous quality analysis system would be required as a ﬁrst level assessment to provide immediate feedback to both the contributing and receiving parties.

### 8.1Further Development

Currently the software developed for this thesis is capable of processing GNSS baselines but there is no support for terrestrial observations. The latter would require the implementation of ellipsoidal model equations as described in Harvey’s Monograph 13[14] in addition to support for a geoid-ellipsoid separation description such as AusGeoid09[9].

Further to the above, the process of iteratively down-weighting or removing outliers by ﬁtting the Laplace distribution via the Pearson Chi-Square test is, at present, performed manually. With potentially thousands of outliers in a set of hundreds of thousands or millions of measurements it would be wise to automate this procedure by the speciﬁcation of decision parameters for outlier removal.

Lastly, multi-epoch support should be addressed in the context of a dynamic datum. Given the propensity for localised deformation on the Australian plate[12] the datum deﬁnition could itself be used as a large-scale monitoring survey to identify marks that have potentially moved between consecutive epochs. The likelihood of movement of marks would be statistically tested and the basis for such tests would depend on the quality of surrounding measurements at the local level.

### 8.2Further Research

The result of initial internal reliability testing revealed the potential for a signiﬁcant proportion of type 1 and type 2 errors in the identiﬁcation of outliers as measurement errors. As such it is recommended that further testing of internal reliability should be conducted on the network geometry of the NSW network of class 2A0 marks. Of particular relevance is the reduction of the rate of false negative ﬂagging in areas of very low redundancy.

Although GNSS baseline vector components are correlated and this was considered during the term of this thesis, it was beyond the scope of this thesis to investigate the eﬀect of this correlation with respect to ${L}_{1}$ norm minimisation in detail.

In general, the network geometry and redundancy still primarily determine the ability to detect gross errors in survey networks. As demonstrated by the internal reliability testing in this thesis, even with suﬃcient redundancy there is still a chance that gross errors will not be localised exactly using an ${L}_{1}$ adjustment. However, when compared with other existing methods such as the Baarda test[1] in least squares, the same techniques on an ${L}_{1}$ adjusted data set provide signiﬁcant information that when used in conjunction with existing data snooping methods can greatly assist the process of gross error identiﬁcation. Part III APPENDIX

## A.  Glossary

### A.1Glossary of terms

Depending on academic background, some terminology used in this document may not be immediately familiar to the reader. Since this document spans both aspects of computer science, mathematical graph theory and traditional survey concepts, there are some terms that have diﬀerent meanings in diﬀerent contexts and some others that can only be adequately explained using the most rigorous mathematical terminology. The author has made an attempt to describe all terms in this section with references and examples in the relevant areas of study.

#### A.1.1Basis

In a linear system, the basis is the set of linearly independent vectors upon which all vectors in the system are represented componentwise.

#### A.1.2Cycle

In graph theory, a cycle is speciﬁcally a path through a graph (or survey network) in which the start node (or point) is also the end node. This is often called a loop in the context of a survey network, however the term loop in graph theory speciﬁcally refers to a single edge (or measurement) that links a single node to itself. This latter property is never exhibited in a survey network.

#### A.1.3Over-determined

As a basic property of linear algebra: If A is $n×n$, there exists exactly one feasible solution for $z$. However, typically $A$ is $n×u$ and hence the system of equations $Ax\le b$ is over-determined, which means that there is no single solution for $z$, but there exists as many feasible solutions as there are degrees of freedom in $A$, that is $n-u$, where $u$ is the number of parameters in the system.

#### A.1.4Convexity and a Convex Set

An aspect of a geometric ﬁgure that is a requirement of the solution space of the Simplex method that can be mathematically determined. In a vector space, convexity can be visualised by the following statement: A vector space is convex if there does not exist any pair of vectors between which a straight line passes outside the bounds of the set.

#### A.1.5Sentinel Value

In computer programming, a sentinel value is a special value whose presence guarantees termination of a loop that processes structured (especially sequential) data. In more complex algorithms it is also a term used to describe a variable that is used to decide the ﬂow control of the execution based on certain values.

#### A.1.6Norm

A Norm is a type of measure or length. Typically it is used in the context of the measurement of the size of a vector or matrix. In the case of the L1 Norm of a geodetic network, the set being normed is the set of residuals.

#### A.1.7Degeneracy

A linear program is said to be degenerate or to become degenerate when basic variables of zero value can result in pivot operations for which there is no improvement in the objective value. This would then cause the next pivot operation to use

#### A.1.8Cycling

A linear program can encounter conditions whereby it will run indeﬁnitely and never reach an optimal solution. This occurs when a particular set of basic variables is encountered more than once. Since linear programming is deterministic, the pivot operations on a particular basis will always be the same, hence encountering a basis more than once means that the program will "cycle" through the same sequence of sets of basic variables in an inﬁnite loop.

#### A.1.9Degree

In a graph, the degree of a node refers to the number of edges adjoining that node.

#### A.1.10Non-sparsity

In the context of a network (or graph) of vectors, the author has chosen the term Non-Sparsity to represent the likelihood of a single gross error in a measurement near stations of very high degree being hidden or spread across the residuals of many neighbouring measurements.

#### A.1.11Hyper-cube

An $n$-dimensional cube. The deﬁning characteristic when considering a hyper-cube problem-space of a linear-programming problem is that each constraint does not bypass any other constraint and that the correct pivot choice is often not the ﬁrst one chosen by the simplex method, hence increasing computational time complexity.

#### A.1.12Undirected Graph

The mathematical representation of a collection of nodes and adjoining edges whereby edges do not have a direction of ﬂow, i.e. paths that are traversed in this graph can ﬂow in either direction across an edge.

#### A.1.13Spanning Tree

A spanning tree is a single connected tree in a graph, rooted at any node, whereby every node in the graph is visited once and only once by a path connecting the root node to that node. It is thus a property of a spanning tree that the spanning tree contains no cycles. A graph whereby a path can be found between any pair of nodes is connected, and thus a single spanning tree can be found for this graph.

## H.  LDNet I/O Formats

### H.1Input Measurements

The below is an example of a single measurement input to LDNet.

## I.  Duality Tables and Equivalences

 Primal Dual Minimize Primal Objective Maximize Dual Objective Objective Coeﬃcients RHS of Dual RHS of Primal Objective Coeﬃcients Coeﬃcient Matrix Transposed Coeﬃcient Matrix Primal Relation: Dual Variable: (${i}^{th}$) Inequality: $\ge$ ${y}_{i}\ge 0$ (${i}^{th}$) Inequality: $\le$ ${y}_{i}\le 0$ (${i}^{th}$) Equation: $=$ ${y}_{i}$ unrestricted in sign Primal Variable: Dual Relation: ${x}_{j}\ge 0$ (${j}^{th}$) Inequality: $\le$ ${x}_{j}\le 0$ (${j}^{th}$) Inequality: $\ge$ ${x}_{j}$ unrestricted in sign (${j}^{th}$) Equation: $=$

 Table 6: Correspondence Rules Between Primal and Dual LPs from Dantzig and Thapa [8] page 131.

Primal
 Variables ${x}_{1}\ge 0$ $\cdots \phantom{\rule{0.3em}{0ex}}$ ${x}_{n}\ge 0$ Relation Constants ${y}_{1}\ge 0$ ${a}_{11}$ $\cdots \phantom{\rule{0.3em}{0ex}}$ ${a}_{1n}$ $\ge$ ${b}_{1}$ Dual $⋮$ $⋮$ $\ddots$ $⋮$ $⋮$ $⋮$ ${y}_{1}\ge 0$ ${a}_{m1}$ $\cdots \phantom{\rule{0.3em}{0ex}}$ ${a}_{mn}$ $\ge$ ${b}_{m}$ Relation $\le$ $\cdots \phantom{\rule{0.3em}{0ex}}$ $\le$ $\le$ Max $v$ Constants ${c}_{1}$ $\cdots \phantom{\rule{0.3em}{0ex}}$ ${c}_{n}$ $\ge$ Min $z$

 Table 7: Tucker Diagram from Dantzig and Thapa [8] page 130.

________________________________________________________________

## Bibliography

[1]    W. Baarda. A Testing Procedure for Use in Geodetic Networks. Number v. 2, no. 5 in A testing procedure for use in geodetic networks. Netherlands Geodetic Commission, 1968. ISBN 9789061322092. URL http://books.google.com.au/books?id=4Ho-SwAACAAJ.

[2]    I. Barrodale and F. D. K. Roberts. Solution of an overdetermined system of equations in the l1 norm [f4]. Commun. ACM, 17(6):319–320, June 1974. ISSN 0001-0782. doi: 10.1145/355616.361024. URL http://doi.acm.org/10.1145/355616.361024.

[3]    R.L. Branham. Scientiﬁc data analysis: an introduction to overdetermined systems. Springer-Verlag, 1990. ISBN 9780387972015. URL http://books.google.com.au/books?id=USLvAAAAMAAJ.

[4]    W.F. Caspary and J.M. Rueger. Concepts of Network and Deformation Analysis. Monograph (University of New South Wales. School of Surveying). School of Surveying, The University of New South Wales, 1987. ISBN 9780858390447. URL http://books.google.com.au/books?id=i-W5SAAACAAJ.

[5]     I. Castillo and E. Barnes. Chaotic behavior of the aﬃne scaling algorithm for linear programming. SIAM Journal on Optimization, 11(3):781–795, 2001. doi: 10.1137/S1052623496314070. URL http://epubs.siam.org/doi/abs/10.1137/S1052623496314070.

[6]    M. Cen, Z. Li, X. Ding, and J. Zhuo. Gross error diagnostics before least squares adjustment of observations. Journal of Geodesy, 77(9): 503–513, 2003. ISSN 0949-7714. doi: 10.1007/s00190-003-0343-4. URL http://dx.doi.org/10.1007/s00190-003-0343-4.

[7]    G. Dantzig. Linear programming and extensions. A Rand Corporation research study. PRINCETON University Press, 1965. ISBN 9780691059136. URL http://books.google.com.au/books?id=2j46uCX5ZAYC.

[8]    George B Dantzig and Mukund N Thapa. Linear programing: 1: Introduction, volume 1. Springer, 1997.

[9]    WE Featherstone, JF Kirby, Christian Hirt, MS Filmer, SJ Claessens, NJ Brown, Guorong Hu, and GM Johnston. The ausgeoid09 model of the australian height datum. Journal of Geodesy, 85:133–150, 2011.

[10]    geodesy@ga.gov.au. Common ellipsoid parameters, May 2013. URL http://www.ga.gov.au/earth-monitoring/geodesy/geodetic-datums/historical-datums-of-australia/common-ellipsoid-parameters.html.

[11]    Michael Grant and Stephen Boyd. CVX: Matlab software for disciplined convex programming, version 2.0 beta. http://cvxr.com/cvx, September 2013.

[12]    Joel Haasdyk and Craig Roberts. Monitoring station movement using a state-wide simultaneous adjustment of everything - implications for a next-generation australian datum. In Proceedings of International Global Navigation Satellite Systems Society IGNSS Synopsysm 2013, page 15, July 2013.

[13]    Joel Haasdyk and Tony Watson. Data-mining in nsw: Towards a new and improved australian datum. In Proceedings of Association of Public Authority Surveyors Conference, pages 85–102, March 2013.

[14]    B. R. Harvey. Practical least squares and statistics for surveyors. School of Surveying, University of New South Wales, Kensington, N.S.W. :, 1st ed. edition, 2011. ISBN 0858390590.

[15]    Bruce R Harvey. Survey network adjustments by the l1 method. Australian Journal of Geodesy, Photogrammetry and Surveying, 59:39–52, 1993.

[16]    Narendra Karmarkar. A New Polynomial-Time Algorithm for Linear Programming. Combinatorica, 4:302–311, 1984. doi: 10.1007/BF02579150.

[17]    SamwelSimon Katambi, Guo Jiming, and Kong Xiangyuan. Applications of graph theory to gross error detection for gps geodetic control networks. Geo-spatial Information Science, 5: 26–31, 2002. ISSN 1009-5020. doi: 10.1007/BF02826471. URL http://dx.doi.org/10.1007/BF02826471.

[18]    A. Khodabandeh and A. Amiri-Simkooei. Recursive algorithm for l1 norm estimation in linear models. Journal of Surveying Engineering, 137(1):1–8, 2011. doi: 10.1061/(ASCE)SU.1943-5428.0000031. URL http://ascelibrary.org/doi/abs/10.1061/%28ASCE%29SU.1943-5428.0000031.

[19]    V. Klee and G. J. Minty. How Good is the Simplex Algorithm? In O. Shisha, editor, Inequalities III, pages 159–175. Academic Press Inc., New York, 1972.

[20]    D.C. Lay. Linear Algebra and Its Applications. World student series. Addison-Wesley Longman, Incorporated, 1997. ISBN 9780201824780. URL http://books.google.com.au/books?id=4OanQgAACAAJ.

[21]    Andrew Makhorin. Gnu linear programming kit, version 4.53, 2013. URL http://www.gnu.org/software/glpk/glpk.html.

[22]    J. Marshall. L1-norm pre-analysis measures for geodetic networks. Journal of Geodesy, 76(6-7): 334–344, 2002. ISSN 0949-7714. doi: 10.1007/s00190-002-0254-9. URL http://dx.doi.org/10.1007/s00190-002-0254-9.

[23]    John Marshall and James Bethel. Analysis of residuals from l 1 norm estimation. ISPRS Archives, pages 38–43, 1996.

[24]    John Marshall and James Bethel. Basic concepts of l 1 norm minimization for surveying applications. Journal of surveying engineering, 122(4):168–179, 1996. doi: 10.1061/(ASCE)0733-9453(1996)122:4(168). URL http://ascelibrary.org/doi/abs/10.1061/%28ASCE%290733-9453%281996%29122%3A4%28168%29.

[25]    S. Mehrotra. On the implementation of a primal-dual interior point method. SIAM Journal on Optimization, 2(4):575–601, 1992. doi: 10.1137/0802028. URL http://epubs.siam.org/doi/abs/10.1137/0802028.

[26]    A.J. Pope, Geodetic Research, Development Laboratory, National Geodetic Survey (U.S.), and National Ocean Survey. The statistics of residuals and the detection of outliers. NOAA technical report. U.S. Dept. of Commerce, National Oceanic and Atmospheric Administration, National Ocean Survey, Geodetic Research and Development Laboratory, 1976. URL http://books.google.com.au/books?id=tHUeAQAAIAAJ.

[27]    Chris Rizos. Principles and practice of gps surveying, January 2000. URL http://www.gmat.unsw.edu.au/snap/gps/gps_survey/principles_gps.htm.

[28]    Conrad Sanderson. Armadillo c++ linear algebra library, June 2013. URL http://arma.sourceforge.net/.

[29]    Brian Tolman, R. Benjamin Harris, Tom Gaussiran, David Munton, Jon Little, Richard Mach, Scot Nelsen, and Brent Renfro. The GPS Toolkit: Open Source GPS Software. In Proceedings of the 16th International Technical Meeting of the Satellite Division of the Institute of Navigation, Long Beach, California, September 2004.

[30]    Henry Wolkowicz. C&o 350 linear programming course notes, fall semester 1995, 1995. URL http://orion.math.uwaterloo.ca/~hwolkowi/henry/teaching/f95/350.f95/matlabfiles/affine.m.gz.

[31]     Y. Zhang. Primal-dual interior point approach for computing l-1-solutions andl l-inﬁnity-solutions of overdetermined linear systems. Journal of Optimization Theory and Applications, 77: 323–341, 1993. ISSN 0022-3239. doi: 10.1007/BF00940715. URL http://dx.doi.org/10.1007/BF00940715.

1 Tested on a subset of the NSW LPI data of 195 measurements and 92 stations

2 Gnu Linear Programming Kit

3 Tested on dual model formulation (see Chapter 5) using entire NSW LPI Data set

4 Based on Mehrotra’s[25] Primal-Dual interior point method for linear programming

1 For further information on the formulation of the Jacobian matrix $A$, see Harvey’s Monograph 13[14].

2 by referring to random error, this excludes systematic error or bias which can skew a least squares solution as we know from practice.

3 Algorithms such as the Simplex algorithm are made much simpler when all parameters are non-negative.

4 The notation ${\stackrel{\to }{x}}^{±}$ is used throughout this thesis to represent the pair ${\stackrel{\to }{x}}^{+}\ge 0$ and ${\stackrel{\to }{x}}^{-}\ge 0$. Whenever ${\stackrel{\to }{x}}^{±}$ is referred to in an equation, relation or assignment such as ${x}_{i}^{±}={b}_{i}$ this is symbolically equivalent to ${x}_{i}^{+}=max\left(0,{b}_{i}\right)$ and ${x}_{i}^{-}=max\left(0,-{b}_{i}\right)$.

1 Such as GLPK, discussed later in this chapter

2 As noted in repeated experiments by the author, Mehrotra’s primal-dual method does not often converge on small-medium ($n>50$) problems using the primal formulation in Equation 17. Large problems always resulted in numerical instability.

3 Before continuing, it should be noted that there is a large amount of theory behind this approach that will not be covered in this thesis. For reference, the very readable Linear Programming Volume 1 by Dantzig and Thapa [8] contains all of the relevant deﬁnitions and theorems and some very good examples on duality theory.