NSW LPI GNSS Vectors - Central West Network Segments
A subset of an L_{1} adjustment on the LPI NSW GNSS network is displayed to the right.
Click on a baseline in the map to see residual information.
If the map does not appear, refresh the page or contact the webmaster
For a complete kml of the NSW GNSS network as provided by LPI NSW click here.
Note that all measurements in the kml are precise only to eight significant figures and thus should not be relied upon for any geodetic surveys.
Abstract
This dissertation investigates computational methods based on the minimisation of the ${L}_{1}$ norm of residuals for the identiﬁcation of outlier observations in a datum network comprised of GNSS baselines. The Barrodale & Roberts[2] algorithm based on Dantzig’s[7] Simplex method was examined in depth and the computational complexity issues that arose from large data sets necessitated an investigation into a network segmentation-based compromise and the application of interior point solvers for linear programming in the minimisation of the ${L}_{1}$ norm of residuals in very large data sets. The dual of the ${L}_{1}$ norm minimiser primal model was ﬁnally investigated and successfully applied to yield a solution in thirty seconds using an implementation of Mehrotra’s[25][21] Primal-Dual Interior Point Solver as opposed to more than six hours for the same problem model on the same hardware using a simplex solver. Further analysis of the segmentation-based solution was not deemed necessary once a fast interior point method was successfully applied.
1 Introduction
1.1 Literature Review
Gaussian parametric weighted least squares (WLS) is still the most widely
known method of obtaining both a best ﬁt for a survey network
adjustment and reliable quality parameters for that ﬁt. In the context of a
very large set of measurements such as that for a national datum, where
thousands of parameters are to be estimated, WLS is still the ﬁrst choice
for obtaining coordinate and quality parameters. However, when
the volumes of data are large and the quality of the meta-data is
sometimes uncertain, the result from a WLS adjustment can be
equally uncertain. The analyst will often be required to employ
supplementary quality control techniques to assist in cleaning a set of
data.
Of the existing methods for the detection of outliers, the ${L}_{1}$ norm minimiser using both simplex[2] and an interior point[8] algorithms was assessed for the ability to detect real and simulated outlier observations in large data sets. This was compared against the other more conventional least squares Baarda test[1] and misclose analysis as explored by Katambi et al[17].
The mechanism of the simplex method as applied to a
${L}_{1}$
minimiser in geodetic networks is explained in Chapter4 followed by
a discussion on an adaptation of an interior-point optimisation
algorithm[8][31] which in practice can be more computation-time eﬃcient
on large sets of data but has the unfortunate aspect of being less
numerically stable[5].
1.1.1 Problem Scope
The ongoing eﬀort to source historical geodetic measurement data from archives[13] for the next Geodetic Datum of Australia has resulted in a large set of measurement data containing 66,000+ GPS baseline measurements that will estimate the Earth-Centred, Earth Fixed (ECEF) positions of 20,000+ class A survey marks. Characteristic gross errors in measurements have been encountered and continue to arise as outliers in existing data as new data is added[12].
A more rigorous pre-processing step prior to the ﬁnal least squares
adjustment of the GPS baseline network was requested to reduce the
repetitive aspect of individually testing each outlier as it appears in a large
adjustment. The author trialled several methods for simplifying the
problem and succeeded with a residual analysis technique using a
proposed adjustment-per-measurement method that approximates
redundancy in the whole network with a "redundancy neighbourhood" of
ﬁxed depth from the measurement in question. The method is detailed in
Chapter 7.
1.1.2 Reliability
The focus of an argument for using
${L}_{1}$
Norm minimisation for outlier detection hinges on reliability. As discussed
by Baarda[1], Pope[26] and subsequently many others[27][15], the internal
reliability of a model design is measured by how large an error must be
before it is detectable. As Pope notes[26]:
$\cdots \phantom{\rule{0.3em}{0ex}}$ the traditional objections to least-squares adjustments of triangulation; that they cannot be relied upon to localize the error and instead "Spread it around" $\cdots \phantom{\rule{0.3em}{0ex}}$
which identiﬁes two critical aspects in data snooping:
- The initial detection of an error
- The localisation of the error.
While an examination of measurement cycle miscloses will reveal the existence of most gross errors, the localisation of these errors requires more than a misclose analysis. This is explored in Chapter 6.
1.1.3 Previous Research
Although the ${L}_{1}$ norm minimiser has often been suggested as an auxiliary quality control step [15][14][18][24][22][24] there is currently no widely known commercial solution for dealing with large networks of survey data.
Harvey[15] also notes the application of ${L}_{1}$ norm minimisation to identify non-measurement errors in a survey adjustment such as datum defect (also known as rank defect). This results from the robustness of the simplex method in optimising an under-determined system of linear equations[8].
1.2 Research Aims
This thesis covers three major areas in the problem of ﬁnding errors in input observations for a geodetic datum using ${L}_{1}$ Norm estimators as described by Cen et al [6] and Marshall [22]. These are
- Provide a solid theoretical background for newcomers to the ${L}_{1}$ norm minimisation model.
- Evaluate the time performance of various ${L}_{1}$ norm minimiser algorithms on survey data.
- Find a well-balanced practical solution to the computational time intractability issue arising in algorithm performance on large data sets such that the desired output information on measurement residuals is preserved.
The theoretical basis for L1 norm is a large area which encompasses the mathematical model, the various computational methods and the proposed application of the output. This task will be broken down into the following subsections
- The formulation of a linear program of the ${L}_{1}$ norm minimiser.
- A worked example using modiﬁed simplex method analogous to Barrodale and Roberts’ ${L}_{1}$ algorithm[2][3].
- The same example problem using a simpliﬁed interior point method based on Dikin’s aﬃne scaling algorithm[8][31].
The performance in time, correctness and convergence of the below
solvers was tested. The source of the data is the complete set of
NSW GNSS baselines provided by LPI unless otherwise stated. See
footnotes^{1} ^{2} ^{3} ^{4}
for references.
Software | Solver Type | Time Complexity $O\left(n\right)$ | Time Taken |
Primal Simplex | Exponential | Very slow^{1}: > 1 second | |
Mathematica | Simplex | Exponential | Slow^{1}: > 1 second |
Mathematica | Primal-Dual IP | Polynomial | Fast^{1}: < 0.01 seconds |
Matlab CVX[11] | Primal-Dual IP | Polynomial | Fast^{1}: < 0.01 seconds |
Dikin Aﬃne Scaling[8][18] | Primal Aﬃne IP | Not known | Did not converge^{1} |
GLPK^{2}[21] | Simplex | Exponential | 6 hours^{3} |
GLPK[21] | Primal-Dual IP^{4} | Polynomial | 30 seconds^{3} |
The objective of research in this thesis was to ﬁnd a ${L}_{1}$ norm minimiser algorithm to operate on a very large data set where the number of parameters $m$ exceeds $10,000$. Therefore the very real problem of computational time intractability was examined. Prior to ﬁnding an acceptable solution using GLPK, a proposal for a segment-based solution with the intention to preserve the validity of the output ${L}_{1}$ norm residuals with respect to the whole of a large data set was investigated in Chapter 7. The naïve approach was to evaluate a complete subset of data from the larger set with the hope that all of the information on residuals (observed - calculated) relative to the subset is completely relevant to the original large data set. This approach was abandoned once the 30 second solver was found and applied but the investigation and the work on ﬁnding a complete method of segmenting the network was preserved for the possibility of future research.
1.2.1 Assumed Knowledge
It is assumed that the reader is familiar with the principles of weighted least squares as applied to a geodetic network adjustment, with speciﬁc knowledge of the formulation of the Jacobian matrix $A$ and the $O-C$ (observed minus calculated) vector $b$. More importantly, it is assumed that the reader understands the deﬁnition of a residual with respect to a least squares solution. For more information, the recommended reference text is Harvey’s Monograph 13[14]. A brief review of this material is provided in the early sections of chapter3
2 Outliers and Gross Errors in Geodetic Survey Networks
2.1 Appearance of Outliers in Existing Data
The appearance of outlier measurements in a network are often in one or more of the following forms:
- Wrong standpoint errors are where observations are taken from a witness mark near a trig station but this detail is missing from meta-data.
- Height booking errors describe incorrectly booked instrument height meta-data.
- Station name errors are measurements where one or both of
the station numbers in the meta-data are incorrect, causing
either
- Skew in the network as a function of the measurement length (similar to wrong standpoint errors).
- Separation of the station into two nearby points that therefore reduces redundancy and weakens the network at that point
- Deformation of the network over consecutive epochs.
- Incorrect ambiguity resolution where range estimates to satellites are incorrect in all coordinate components. A common cause is multi-path bias which is localised to individual stations.
The eﬀect of gross errors on a least squares adjustment has been shown to
"spread" the error across two or more measurements when there is enough
redundancy to do so[14]. In the case where gross errors can be
small in magnitude relative to the magnitude of the measurement,
such as a height booking error that is incorrect by 100mm over a
GPS baseline of 50 km, it is possible for a signiﬁcant proportion of
that error to be spread over the residuals of several neighbouring
measurements.
2.1.1 Examples
Haasdyk et al[13][12] note examples of incorrect ambiguity resolution,
where the length of the range to satellite(s) is therefore incorrect, and
aﬀects all coordinate components. These manifested in very small
(sub-metre and often sub-decimetre) gross errors that are sometimes
undetectable in constrained local-neighborhood adjustments that rely on a
"perfect" Class 2A0 control network.
In a related issue, multiple processing of the same observations
necessitated by observing outliers when ﬁtting a local neighborhood
adjustment to a larger neighborhood using a wider control network often
resulted in several diﬀerent "versions" of the same baseline that are
signiﬁcantly diﬀerent[13]. When using such measurements in a very large
unconstrained network adjustment, the ability to "choose" the correct
measurement is compounded by the quality of the surrounding
network.
Observations with station name errors were sometimes easy to spot when the measurement appeared as a signiﬁcant outlier that traversed an unlikely distance. For example in Figure 1 the measurements highlighted in blue clearly shows measurements where one or more station names have been taken from the incorrect trigonometric station, often several hundred kilometres away.
In contrast, a station naming error that does not refer to an existing station but instead "creates" a new, nearby station in the adjustment can be diﬃcult to identify in a large adjustment. For example, in Figure 2 the station TS5110 appears as two stations — one at the apex of most measurements and the lower line is drawn terminating in its nominated location at TS5110-1. When cross-referencing with existing station names it appears that it is equivalent to TS5110-2.
(a) Station
name
errors
can
negate
redundancy
in
network
design
(b) Replaced
marks
require
renaming
to
avoid
propagating
bias
in
an
adjustment
Other examples related to the above as noted by Haasdyk in correspondence and in [13] are:
Marks | Comment | |||
Renamed | TS5458 | = | TS12033-3 | Same mark |
SS36065 | = | TS6418-4 | Same mark | |
Replaced | TS5458 | $\to $ | TS12033 | 2mm apart |
(Destroyed) TS10533 | $\to $ | TS10985 | 20mm apart |
2.2 Review of Outlier Detection Methods
Caspary [4] notes that there is a diﬃculty in using the least squares vector of residuals
$\widehat{v}=v-{Q}_{v}{P}_{i}{\Delta}_{i}=v-{q}_{v}{p}_{i}{\Delta}_{i}$ (where
${\Delta}_{i}$ is the gross
error on the ${i}^{th}$
measurement) in the detection of gross errors as outliers
because the whole vector of residuals is contaminated by
${\Delta}_{i}$ .
Roughly speaking, given a large system of parameters where the
likelihood of gross error contamination is high, the number of marginally
detectable errors can also be proportionally high.
The sensitivity of an adjustment in detecting outliers is referred to as the
"internal reliability" [27]. As stated by Rizos [27], the internal reliability of
an adjustment is quantiﬁed by the set of marginally detectable
errors. A large set of MDEs indicates low internal reliability, and
conversely the smaller the set of MDEs, the higher the internal
reliability.
A common method of outlier detection in a least squares adjustment is
Baarda’s data snooping method [27]. This method assumes the quality of
the measurements is known reliably and, more importantly, it assumes
that only one outlier exists in the set of observations. In everyday use on
small networks this method works well because there are generally
relatively few outliers present. However, when considering large and
highly interconnected networks, this data-snooping method can
sometimes fail due to bias introduced by outliers on neighbouring
measurements.
Another commonly used method is Pope’s Tau test, which is primarily used when measurement quality is not reliably known. Instead of normalising residuals using a priori weights, the Tau test ﬁrst computes the standardised residuals using weights ${s}_{{v}_{i}}$ derived from the variance co-factor ($VCV$) matrix of the residuals computed from a least squares adjustment once it has converged. Each standardised residual is
where ${s}_{{v}_{i}}$
is ${\sigma}_{{v}_{i}}$
scaled by the square root of the variance factor of the adjustment, which
is
where ${\sigma}_{{v}_{i}}$ is the square root of the ${i}^{th}$ diagonal entry in the co-factor matrix of the residuals ${Q}_{v}$.
The Tau test then compares each normalised residual with the Tau value
where
2.2.1 An Alternate Test for Outliers
The need for a robust method to detect more than one outlier in a
large set of data, as justiﬁed above, has already been addressed
in the form of an alternative adjustment. Rather than use the
${L}_{2}$
norm, that is the square root of the sum of squares of residuals, researchers
such as Branham[3] argue that a statistically more likely method
of revealing outlier observations reliably is by minimising the
${L}_{1}$
norm of the residuals, that is the sum of the absolute values of the vector of
residuals
$$\begin{array}{llll}\hfill \text{Minimise}& \sum _{\forall i}\parallel {v}_{i}{\parallel}_{1}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill \text{where}& \overrightarrow{v}=\overrightarrow{b}-A\overrightarrow{x}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$$
The choice of the ${L}_{1}$ norm minimiser over the least squares (that is, the ${L}_{2}$ norm) minimiser is precisely because it does not spread the magnitude of the residuals over the neighbouring measurements as evenly as possible. This poses the question: Why does the ${L}_{1}$ norm minimiser not spread residuals? The answer can be roughly explained by looking at the eﬀect of each on large and small residuals.
Say in a loop of height measurements, one measurement ${h}_{1}$ has a gross error of $-2m$. All measurements have an expected variance of $\pm 100mm$ and neighboring measurements have no gross errors. Mathematical method aside, a least squares adjustment would "consider" the magnitude of a single residual ${v}_{{h}_{1}}=(-2{}^{)}2=4$ alone versus the possibility of assuming that this error magnitude was divided equally between ${h}_{1}$ and a neighboring measurement ${h}_{2}$ such that the error in each is assumed to be ${v}_{{h}_{1}}={h}_{1}-E\left({h}_{1}\right)=-1m$ and ${v}_{{h}_{2}}={h}_{2}-E\left({h}_{2}\right)=+1m$. In the latter case the ${L}_{2}$ norm is
$$\begin{array}{lll}\hfill {v}_{{h}_{1}}^{2}+{v}_{{h}_{2}}^{2}=(-1{}^{)}2+(1{}^{)}2=1+1=2& \phantom{\rule{2em}{0ex}}& \hfill \end{array}$$
which is clearly smaller than the ﬁrst case. A least squares minimiser will not stop there though - it would then consider spreading the error in ${h}_{1}$ and ${h}_{2}$ to their neighboring measurements, and so on until all error has been spread as evenly as possible such that the ${L}_{2}$ norm is at a minimum.
2.2.2 Penalising Large Residuals
In the discipline of mathematical optimisation this phenomenon where many smaller residuals are preferred over few large residuals would be referred to as penalising large residuals. This is precisely what least squares does as a by-product of obtaining a best ﬁt because it assumes that there are no outliers in the data. However, data containing several outliers necessitates a minimiser that does not penalise residuals by magnitude.
Using the above example, say an ${L}_{1}$ norm minimiser "considers" the $2m$ error as ${v}_{{h}_{1}}={h}_{1}-E\left({h}_{1}\right)=2$ and all other residuals as negligible ${v}_{{h}_{i}}<\mathit{\epsilon}$, $i\ne 1$. It then considers spreading the error to a neighboring measurement ${h}_{2}$ such that ${v}_{{h}_{1}}+{v}_{{h}_{2}}=2$. The ${L}_{1}$ norm of the latter is
Here we can see that "spreading" the error across two residuals will result in a ${L}_{1}$ norm that is greater than or equal to the ${L}_{1}$ norm of restricting the error to the residual of one measurement. Hence, any ${L}_{1}$ norm minimiser would prefer the case where the residual error is assigned to as few measurements as possible.
The next chapter will elaborate on the analysis of the residuals of an ${L}_{1}$ norm minimiser with some example using geodetic observations to illustrate the relevance to this problem.
3 Estimator Models
The introduction to estimator models is discussed in Chapter 3 of the thesis.
3.3 The ${L}_{1}$ Norm Estimator
3.3.1 Sampling Distribution of ${L}_{1}$ Norm residuals
Monte Carlo experiments conducted by Marshall[23] indicate quite clearly that the sampling distribution of ${L}_{1}$ norm residuals is not the Gaussian (normal) distribution but instead exhibits a very large spike at the ﬁrst moment. Branham[3] states that this sampling distribution corresponds to a modiﬁed exponential (or alternatively, Laplace) probability density function
This is a modiﬁed form of the exponential distribution, which in general form is
For $h=1$ the function $P\left(X\right)$ is graphed in Figure 5.
By maximising the log density function $logP\left(X\right)$, the ﬁrst and second moments are
$$\begin{array}{lll}\hfill {\mu}_{est}=E\left(X\right)& =median\phantom{\rule{2em}{0ex}}& \hfill \text{(7)}\\ \hfill {\sigma}^{2}=E\left({X}^{2}\right)& =\frac{\sqrt{2}}{N}\sum _{i=1}N|{X}_{i}-\mu |\phantom{\rule{2em}{0ex}}& \hfill \text{(8)}\end{array}$$
The distortion of the ﬁrst moment with respect to outliers is restricted to movement by one "data point" rather than any proportion of the magnitude of the outlier. Assuming an equal distribution of positive and negative outliers and residuals, the ﬁrst moment will be zero. If the distribution of residuals of a ${L}_{1}$ minimiser happen to be biased such that they are mostly all positive or all negative, this would imply the model assumptions are incorrect. In a minimally constrained adjustment, this should not occur.
3.3.2 Minimally Constrained Models
As noted by Harvey[14], outlier detection is often done on a minimally constrained model. Ideally, no distance scale would be enforced by the measurements such that the resultant residuals would contain minimal bias. A minimally constrained adjustment is more likely to have an average error of zero.
Likewise, the ${L}_{1}$ minimiser that is used for quality analysis on a geodetic survey network could be applied successfully using a minimally constrained model.
3.3.3 Internal Reliability
Marshall[23] investigated the bounds of internal reliability (the relative
magnitude of measurement error before it is statistically ﬂagged as an
outlier) using Monte Carlo methods. By using a consistent a priori
standard deviation on all measurements Marshall notes that the primary
deviation in internal reliability is caused by variations in network
geometry. In his horizontal network experiment of 6 stations (12
parameters) and 15 distance measurements he observed that across
100,000 Monte Carlo simulations where random bias was introduced
into the measurements, one particular measurement (observation
$\#9$) was never ﬂagged
as an outlier in an ${L}_{1}$
adjustment. Figure 6 below shows measurement
$\#9$ in
Marshall’s trilateration network which clearly shows that there is
enough redundancy for the four parameters (coordinates of stations
$\Delta 1$ and
$\Delta 5$)
which it is estimating. It was the view of this author that there was
perhaps another reason for the lack of outlier information output by the
adjustment, which judging by the degrees of freedom (redundancy) in the
network is likely due to there being not enough redundancy to adequately
locate outliers for all measurements.
It is hypothesised by the author but not proven that the internal reliability of a network in the ${L}_{1}$ norm increases with an increase in redundancy. For measurements that can have a reverse or check measurement e.g. an EDM distance from ${\Delta}_{i}$ to ${\Delta}_{j}$ and another from ${\Delta}_{j}$ to ${\Delta}_{i}$ it is trivial to see a gross error in one of these measurements (although not trivial to know which measurement contains the gross error without further measurements). This presents the analyst with a dilemma: An adjustment on the whole network would likely have better redundancy than an adjustment on part of the network (given that the whole network is connected) but there are far more computations to perform in an adjustment of the whole network than a part of it. In fact, as discussed in chapter 4 the relationship between number of measurements and number of adjustment computations is in some cases exponential, which is troublesome when the network is very large.
3.4 Weighted ${L}_{1}$ Minimisation
Due to measurement noise there is always the need to weight the importance of a measurement based on an a priori assumption of precision. GPS vectors at the NSW LPI are typically given a standard deviation of $\pm $(10mm + 7ppm) for least squares adjustments. This a priori assumption is carried into ${L}_{1}$ norm models but the scheme for weighting the model diﬀers in the method.
$$\begin{array}{lll}\hfill \text{Minimise}& \sum _{i=1}^{m}\frac{{\mathit{\epsilon}}_{i}^{2}}{{\sigma}_{i}^{2}}\phantom{\rule{2em}{0ex}}& \hfill \text{(9)}\end{array}$$
In least squares, the method allows a priori weighting of observations by inserting the square of each weight ${\sigma}_{i}^{2}$ corresponding to measurement ${g}_{i}$ as entries in a diagonal $m\times m$ matrix $Q$.
By taking $W={Q}^{-1}$
In the ${L}_{1}$ model, the objective function is
$$\begin{array}{lll}\hfill \text{Minimise}& \sum _{i=1}^{m}\frac{\left|{\mathit{\epsilon}}_{i}\right|}{{\sigma}_{i}}\phantom{\rule{2em}{0ex}}& \hfill \text{(10)}\end{array}$$
Due to the non-linearity of the objective function, the formulation of the problem must instead rely on a linear program, and thus the inverse matrix of squared weights would be applied as described by Harvey[15] as follows
3.6 ${L}_{1}$ Norm Minimisation Formulated as a Linear Program
The ﬁrst step in formulating the ${L}_{1}$ norm minimisation problem as a linear program is to recognise that the objective function $z=\sum \left|{v}_{i}\right|$ is not linear. To make it linear, and to also satisfy the non-negativity constraint that linear programs typically require^{3} the objective function can be reformulated to be
$$\begin{array}{lll}\hfill z=\sum _{i=1}^{n}\left(u+v\right)=\sum _{i=1}^{n}u+\sum _{i=1}^{n}v& \phantom{\rule{2em}{0ex}}& \hfill \text{(11)}\end{array}$$
Where a residual is $u-v$ such that $u\ge 0$ and $v\ge 0$. These constraints satisfy $u+v=|u-v|$.
The normal (or model) equation for each parameter would be the same formulation as for least squares problems but with the residual included as a model parameter. For example, a height diﬀerence would be
$$\begin{array}{lll}\hfill {\u0125}_{i}={h}_{2}-{h}_{1}+{u}_{i}-{v}_{i}& \phantom{\rule{2em}{0ex}}& \hfill \end{array}$$
Similarly for least squares, each element in the Jacobian matrix for the ${L}_{1}$ norm minimisation is a partial derivative of the model equation (in order to linearise the problem). Hence for the height example the Jacobian matrix is already linear, that is
$$\begin{array}{llll}\hfill \left[\begin{array}{cccccc}\hfill \frac{\partial {\u0125}_{i}}{\partial {h}_{1}}\hfill & \hfill \frac{\partial {\u0125}_{i}}{\partial {h}_{2}}\hfill & \hfill \cdots \hfill & \hfill \frac{\partial {\u0125}_{i}}{\partial {u}_{i}}\hfill & \hfill \frac{\partial {\u0125}_{i}}{\partial {v}_{i}}\hfill & \hfill \cdots \hfill \\ \hfill \vdots \hfill & \hfill \hfill & \hfill \hfill & \hfill \hfill & \hfill \vdots \hfill \end{array}\right]& =\left[\begin{array}{cccccc}\hfill -1\hfill & \hfill 1\hfill & \hfill \cdots \hfill & \hfill 1\hfill & \hfill -1\hfill & \hfill \cdots \hfill \\ \hfill \vdots \hfill & \hfill \hfill & \hfill \hfill & \hfill \hfill & \hfill \vdots \hfill \end{array}\right]\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$$
For non-linear models a Taylor series of one or two terms is usually suﬃcient to achieve convergence given good enough a priori coordinates or parameters [14].
3.6.1 Error Estimates
Since the use of the ${L}_{1}$
norm adjustment in the context of this thesis is to highlight outlier
observations, the question on whether to weight the observations prior
to the adjustment must be speciﬁcally asked with regards to the
desired outcome. In a GPS baseline network, there will exist very
short lines of only a few metres and very long lines of hundreds of
kilometers.
The separate issue of weighting the residuals should always be done according to the expected quality of the observations. An ${L}_{1}$ norm adjustment will typically be done prior to a least squares adjustment, and so the scaled residuals that are an output of "tuning" a least squares adjustment will often be unavailable to the analyst. A parts-per-million component is often the a crucial normaliser for residuals because of the large variation in accumulated error over long distances, for example the scale in long GNSS baseline vectors.
3.7 Example: Similarities Between Loop Mis-Close and ${L}_{1}$ Norm Residuals
As stated already, the beneﬁt of the
${L}_{1}$
norm minimisation is primarily the analysis of residuals. In a survey
network adjustment with equally weighted observations (all
$\sigma =1$) a
direct comparison of the residuals to loop mis-closes yields strong
similarities.
Using an example of just height data from a NSW RMS survey on the Sydney
Anzac Bridge (see Figure 7), the results obtained from B. Harvey’s FIXIT4
of ${L}_{1}$
raw residuals versus mis-closes is shown in Table 3. It can be seen
that in this example, the loop mis-closes and the magnitude of the
${L}_{1}$ residuals
for particular observations on the loops co-incide, speciﬁcally the residuals
$\left|2.3\right|$mm and
$\left|1.8\right|$mm
on lines 2-3 and 5-2 respectively.
From | To | dH (m) | sdv (mm) | $v$ (mm) | |
Hei | 1 | 3 | 17.5863 | 1.5 | 0.0 |
5 | 3 | 25.7134 | 1.5 | 0.0 | |
2 | 3 | 18.1929 | 1.5 | -2.3 | |
2 | 4 | 18.2125 | 1.5 | 0.0 | |
5 | 4 | 25.7353 | 1.5 | 0.0 | |
5 | 2 | 7.5246 | 1.5 | -1.8 |
Mis-close | Height loop |
||||
4.1mm | 2 | 3 | 5 | 2 | |
1.8mm | 2 | 4 | 5 | 2 | |
-2.3mm | 2 | 4 | 5 | 3 | 2 |
An ${L}_{1}$ minimiser identiﬁes the measurements which have the propensity to introduce the largest residuals and excludes those measurements from the computation of the parameters. In this example height network, these residuals have the same magnitude as loop mis-closes because there are no correlations between input observations and input standard deviations are being ignored.
3.8 Linear Program Formulation of the ${L}_{1}$ Norm Estimator
Continuing from Equation 11, each model equation takes the form
$${W}^{\frac{1}{2}}A\overrightarrow{x}-{W}^{\frac{1}{2}}b-\overrightarrow{v}=0$$ | (12) |
Note that $\overrightarrow{x}$ and $\overrightarrow{v}$ are vectors of parameters any of which can be positive or negative. This results in the problem being unbounded in the negative direction and as such this formulation needs modiﬁcation before it can be used in a linear solver. By taking two positive variables $u,w\ge 0$ we can say that each of the residual parameters can represented by $v=u-w$. This also applies to the coordinate parameters $x={x}^{+}-{x}^{-}$, ${x}^{+},{x}^{-}\ge 0$. By generalising, with coordinate parameters represented by ${\overrightarrow{x}}_{P}^{\pm}$ and residual parameters represented by ${\overrightarrow{x}}_{R}^{\pm}$, a linearised formulation of the ${L}_{1}$ norm estimator is
$$\begin{array}{llll}\hfill \text{Minimise}z& ={\overrightarrow{e}}^{T}{\overrightarrow{x}}_{R}^{+}+{\overrightarrow{e}}^{T}{\overrightarrow{x}}_{R}^{-}+0{\overrightarrow{x}}_{P}^{+}+0{\overrightarrow{x}}_{P}^{-}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill \text{Subjectto}{B}^{T}\overrightarrow{x}& =\left[\begin{array}{ccccccc}\hfill I\hfill & \hfill \vdots \hfill & \hfill -I\hfill & \hfill \vdots \hfill & \hfill A\hfill & \hfill \vdots \hfill & \hfill -A\hfill \end{array}\right]\left[\begin{array}{c}\hfill {\overrightarrow{x}}_{R}^{+}\hfill \\ \hfill {\overrightarrow{x}}_{R}^{-}\hfill \\ \hfill {\overrightarrow{x}}_{P}^{+}\hfill \\ \hfill {\overrightarrow{x}}_{P}^{-}\hfill \end{array}\right]=b\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill {\overrightarrow{x}}_{P}^{+},{\overrightarrow{x}}_{P}^{-},{\overrightarrow{x}}_{R}^{+},{\overrightarrow{x}}_{R}^{-}& \ge 0\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill {\overrightarrow{e}}^{T}& =\left[1\cdots 1\right]\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$$
This is very similar to the formulation in Equations $3.24$ on page $71$ in Dantzig et al [7] except that the basic and non-basic variables ${\overrightarrow{x}}_{B}$ and ${\overrightarrow{x}}_{N}$ have been replaced with the parameters and residuals ${\overrightarrow{x}}_{P}^{\pm}$ and ${\overrightarrow{x}}_{R}^{\pm}$ respectively.^{4}
Note also that the objective function has the coeﬃcients eﬀectively switched, so that instead of
$$\begin{array}{lll}\hfill z& =0{\overrightarrow{x}}_{R}^{\pm}+{\overrightarrow{c}}^{T}{\overrightarrow{x}}_{P}^{\pm}\phantom{\rule{2em}{0ex}}& \hfill \text{(13)}\\ \hfill & \phantom{\rule{2em}{0ex}}& \hfill \text{(14)}\end{array}$$we have
$$\begin{array}{lll}\hfill z& ={\overrightarrow{e}}^{T}{\overrightarrow{x}}_{R}^{\pm}+0{\overrightarrow{x}}_{P}^{\pm}\phantom{\rule{2em}{0ex}}& \hfill \text{(15)}\\ \hfill \text{where}{\overrightarrow{e}}^{T}=\left[1\cdots 1\right]& \phantom{\rule{2em}{0ex}}& \hfill \text{(16)}\end{array}$$This is an explicit formulation of a linear program but it is still solvable by any variation of Dantzig’s simplex methods (primal, revised, primal-dual etc) and by interior point methods for linear programming problems (explored in Chapter 4).
4 Algorithms For The ${L}_{1}$ Norm Estimator
This chapter will begin with Section 4.1 describing the Simplex algorithm followed by an example demonstrating how it works on survey height data. Subsequently the concept of interior point methods will be introduced and applied to the same example.
4.1 Search-based Simplex Algorithm
Use the standard form of the ${L}_{1}$ problem from Chapter 3, replicated below:
$$\begin{array}{llll}\hfill \text{Minimise}z& ={\overrightarrow{e}}^{T}{\overrightarrow{x}}_{R}^{+}+{\overrightarrow{e}}^{T}{\overrightarrow{x}}_{R}^{-}+0{\overrightarrow{x}}_{P}^{+}+0{\overrightarrow{x}}_{P}^{-}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill \text{Subjectto}{B}^{T}\overrightarrow{x}& =\left[\begin{array}{ccccccc}\hfill I\hfill & \hfill \vdots \hfill & \hfill -I\hfill & \hfill \vdots \hfill & \hfill A\hfill & \hfill \vdots \hfill & \hfill -A\hfill \end{array}\right]\left[\begin{array}{c}\hfill {\overrightarrow{x}}_{R}^{+}\hfill \\ \hfill {\overrightarrow{x}}_{R}^{-}\hfill \\ \hfill {\overrightarrow{x}}_{P}^{+}\hfill \\ \hfill {\overrightarrow{x}}_{P}^{-}\hfill \end{array}\right]=b\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill {\overrightarrow{x}}_{P}^{+},{\overrightarrow{x}}_{P}^{-},{\overrightarrow{x}}_{R}^{+},{\overrightarrow{x}}_{R}^{-}& \ge 0\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill {\overrightarrow{e}}^{T}& =\left[1\cdots 1\right]\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$$
To re-iterate, the notation ${\overrightarrow{x}}_{P}^{\pm},{\overrightarrow{x}}_{R}^{\pm}$
refers to the strictly positive nature of the parameters. This is especially
pertinent in non-linear problems where at each iteration the delta
$x$ is estimated and
is often negative. When a coordinate (or delta) has a negative value, the parameter
${x}_{i}^{-}$ contains the
absolute value and ${x}_{i}^{+}=0$.
A search must always start somewhere, and in a linear program (LP) this is the canonical form.
4.1.1 Canonical Form and Basic Variables
As noted by Dantzig[8] the primal simplex method ﬁnds the optimal solution by a sequence of pivot steps from the original system of equations. Initialisation of the algorithm is achieved by converting the problem from standard form to canonical form. This is where the current solution is a basic feasible solution and:
- The objective coeﬃcients to all the basic variables are zero, that is, the objective function contains only non-basic variables;
- The constraint matrix for the basic variables form an identity matrix (with some permutation if necessary). It’s easy to see whether or not the current Basic Feasible Solution (BFS) is optimal or not, if the LP is in canonical form
The objective function
and constraints
$$\begin{array}{llll}\hfill {B}^{T}\overrightarrow{x}& =\left[\begin{array}{ccccccc}\hfill I\hfill & \hfill \vdots \hfill & \hfill -I\hfill & \hfill \vdots \hfill & \hfill A\hfill & \hfill \vdots \hfill & \hfill -A\hfill \end{array}\right]\left[\begin{array}{c}\hfill {\overrightarrow{x}}_{R}^{+}\hfill \\ \hfill {\overrightarrow{x}}_{R}^{-}\hfill \\ \hfill {\overrightarrow{x}}_{P}^{+}\hfill \\ \hfill {\overrightarrow{x}}_{P}^{-}\hfill \end{array}\right]=b\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$$
permits the initial basic feasible solution
$$\begin{array}{llll}\hfill {\overrightarrow{x}}_{P}^{\pm}& =\overrightarrow{0}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill {\overrightarrow{x}}_{R}^{\pm}& =b-A\overrightarrow{x}=\overrightarrow{b}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill z& =\sum \underset{R}{\overset{+}{x}}+\sum \underset{R}{\overset{-}{x}}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$$
4.1.3 Exponential Computational Time Complexity
Before continuing, it is necessary to discuss the computational time
complexity of the algorithm proposed by Barrodale and Roberts[2] and the
simplex method in general.
Brieﬂy, computational time complexity of an algorithm is generally
assessed with regards to the rate of growth of computational time with
respect to the size of the input. This time complexity is typically described
using three metrics - minimum time or best-case time complexity
$f\left(n\right)=\Omega \left(g\left(n\right)\right)$, average time complexity
$f\left(n\right)=\Theta \left(g\left(n\right)\right)$ and worst-case
time complexity $f\left(n\right)=\u25cb\left(g\left(n\right)\right)$.
$f\left(n\right)=\Omega \left(g\left(n\right)\right)$
represents the lower bound function for the run time of an algorithm
$f\left(n\right)$ where
$n$
denotes the size of the input (typically an array).
$f\left(n\right)=\Theta \left(g\left(n\right)\right)$ represents the run
time of an algorithm $f\left(n\right)$
when averaged over all permutations of an input of size
$n$.
This metric is a good indication of how the algorithm will
behave in the real world using real measurements, however
we should always consider the worst-case time complexity
$f\left(n\right)=\u25cb\left(g\left(n\right)\right)$ where
$g\left(n\right)$
is the upper bound of the rate of growth of computational time of
$f\left(n\right)$. This
latter metric describes just how poorly the algorithm can perform in the
worst case.
When considering the simplex method to solve for the
${L}_{1}$ Norm
estimate as proposed by Barrodale and Roberts[2] and its application in
this thesis in ﬁnding outliers in a large set of GPS observations, the size of
the input will always be large (over 100,000 measurements). Thus if the
rate of growth of computational time in Barrodale and Robert’s
${L}_{1}$
method [2] is non-linear, it is of great importance to consider time
complexity.
An unfortunate aspect of the simplex method on which the Barrodale and
Roberts ${L}_{1}$
algorithm is based is, while it performs well in most cases, there
exists a set of inputs which make the algorithm run in exponential
time. Using the notation described above, the average case
$\Theta \left(g\left(n\right)\right)$
is polynomial and often linear but the worst-case
$\u25cb\left(g\left(n\right)\right)$ is exponential,
i.e. $g\left(n\right)={a}^{bn}$ for some
constants $a$
and $b$.
Exponential time complexity was shown by Klee and Minty [19]
to exist in cases where a hyper-cube of constraints can lead to
${2}^{d}-1$pivots
where $d$
is the dimension of the hyper cube. See the Appendix A.
4.1.4 The Dilemma
As was raised brieﬂy in Chapter 3, there is a desire for the analyst to
preserve as much redundancy as possible in order to increase the
likelihood of properly detecting all gross errors in an adjustment. Since the
problem of exponential time complexity is very real in the application of
Dantzig’s simplex method on large sets of data (as experienced by the
author in practice) this presents a dilemma.
To balance the desire for maximum redundancy with the desire for reasonable computation time, two approaches can be taken. The ﬁrst is to ﬁnd a way to reduce the size of an adjustment whilst preserving maximum redundancy. The second approach is to ﬁnd a way to reduce the time complexity of the algorithm used to perform the adjustment. The next section and Chapter 5 address the latter approach, and Chapter 7 attempts to address the former. The two approaches need not be exclusive.
4.2 Interior Point Algorithms
Recent developments in the ﬁeld of optimisation and operations research
have yielded several new methods for solving linear programs other than
Dantzig’s[7] simplex method. Since the publication of Klee and Minty’s
[19] examples of worst-case exponential computational time, researchers
have made developments in the ﬁeld of ellipsoid algorithms employed by
Leonid Khachiyan in 1979 that run in worst-case polynomial time.
Although in the average case the simplex method far outperforms
early ellipsoid methods, the idea of iteratively moving through the
interior of a problem space to converge at the optimal solution at
a boundary yielded a ground-breaking result less than 30 years
ago.
In 1984 Karmarkar published a paper on an interior point method[16] where
feasible solutions are ’projected’ from within a convex solution space to the
best feasible solution. The worst-case computational time was determined to
be $\u25cb\left({n}^{3.5}\right)$,
and that in practice it often performed faster than the simplex method on
large sparse problems.
The method described below is is an earlier interior point algorithm known as Dikin’s Aﬃne Scaling Interior Point Method[8], ﬁrst published by Dikin in 1967 with a proof of convergence in 1974. This algorithm does not possess the convergence properties of Karmarker’s algorithm but does demonstrate the concept of the central projections.
4.3 Dikin’s Primal Aﬃne Method
From [8] the following algorithm summarises the Primal Aﬃne Method
for ﬁnding an interior point solution to a linear program.
Assume that an initial feasible interior point solution $\overrightarrow{x}={\overrightarrow{x}}_{0}>0$ satisfying $A{\overrightarrow{x}}_{0}=\overrightarrow{b}$ is known. The steps of the algorithm are as follows:
- Initialize Counter: Set $t\leftarrow 0$.
- Create $D$: Set $D=Diag\left({\overrightarrow{x}}_{t}\right)$.
- Compute Centering Transformation: Compute $\xc2=AD,\u0109=D\overrightarrow{c}$.
- Determine the Projection Matrix: Compute $P=I-{A}^{T}(A{A}^{T}{}^{)}-1A$.
- Compute Steepest Descent Direction: Set ${\overrightarrow{p}}^{t}=-P\overrightarrow{c}$.
- Set $\Theta =-mi{n}_{j}{p}_{j}^{t}$.
- Test for Unbounded Objective. If $\Theta \le 0.0$ report the objective as being unbounded and stop.
- Obtain ${\overrightarrow{x}}^{t+1}$:
Compute
$${\widehat{x}}^{t+1}=\overrightarrow{e}+\left(\frac{\alpha}{\Theta}\right){\overrightarrow{p}}^{t}$$
where $\overrightarrow{e}=[1,1,\cdots \phantom{\rule{0.3em}{0ex}},1{}^{]}T$ and $\alpha $ is set to a number strictly between 0 and 1. Typically, $\alpha $ is set to be between 0.5 and 0.95 for primal aﬃne methods (in most implementations it is set between 0.9 and 0.95).
- Transform Back to Original Space: Compute ${\overrightarrow{x}}^{t+1}=D{\widehat{x}}^{t+1}$.
- Termination Check: If ${\overrightarrow{x}}^{t+1}$ is "close" to ${\overrightarrow{x}}^{t}$, report ${\overrightarrow{x}}^{t+1}$ as "close" to optimal and stop.
- Set $t\leftarrow t+1$ and go to Step 2.
4.3.1 Initialisation
An interior point method such as the aﬃne scaling method described
above requires an initial solution on which to begin iterating. As
the reader may notice in step 3 the centering transform uses the
initial solution to project the entire solution space around the point
$\u27e81,1,\cdots \phantom{\rule{0.3em}{0ex}},1\u27e9$.
Since zero values in the initial solution could not be projected in such a
way (and also values on a boundary of the solution space, that is a
parameter that lies on a constraint hyperplane) the initial solution must be
interior to the solution space.
An interior point is not found trivially, and as such methods have
been developed to ﬁnd an interior point. One such method used
here is the Big M method, which adds an additional variable
${x}_{M}$ to the problem
$Ax=b$ such that a
new problem ${A}^{\prime}{x}^{\prime}=b$
is solved. ${A}^{\prime}$
is simply $A$
with an extra column of coeﬃcients for
${x}_{M}$. The objective function
coeﬃcient of ${x}_{M}$ is a very
large value denoted $M$
(the namesake of the method). This large value of
$M$ acts
as a penalty such that the solver will try to reduce the value of this
variable as much as possible in order to minimise the impact of
$M{x}_{M}$ on the objective
value (which in the ${L}_{1}$
norm problem is the sum of absolute values of the residuals).
The coeﬃcients of ${x}_{M}$ in ${A}^{\prime}$ are calculated such that, given an initial point of ${x}^{\prime}=\u27e81,1,\cdots \phantom{\rule{0.3em}{0ex}},1\u27e9$, all of the constraints are satisﬁed. As such, the
$$\begin{array}{llll}\hfill {A}^{\prime}{x}^{\prime}& =b\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill \left[A\vdots {A}_{M}\right]\left[\begin{array}{c}\hfill x\hfill \\ \hfill \cdots \hfill \\ \hfill {x}_{M}\hfill \end{array}\right]& =b\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill {A}_{M}{x}_{M}& =b-Ax\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill & =b-A{\u27e81,\cdots \phantom{\rule{0.3em}{0ex}},1\u27e9}^{T}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$$
By setting ${x}_{M}=1$,
${A}_{M}$ is
easily calculated using the above relationship to obtain the new problem
$$\begin{array}{llll}\hfill \text{Minimise}z& ={c}^{T}x+M{x}_{M}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill \text{Subjectto}{A}^{\prime}{x}^{\prime}& =b\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$$
4.3.2 Heights Example Using Aﬃne Scaling Solver
The Matlab implementation of the aﬃne solver in Appendix B was applied to
the RMS heights example used earlier in this chapter. The formulation of the
$A$ matrix
and $b$
vector was the same as for the simplex method. The value of
$M$ was
set to 1200 for initialisation. The algorithm converged after 55 iterations,
thus it was not practical to show the values of the parameters at each
iteration in this section. However, the rate at which the parameters
converged to the accepted values are visible in Figure 8 for the station
heights and the residual parameters. It should be noted that this
algorithm yielded a diﬀerent result for the residuals as noted in Table
5.
Measurement | 1 | 2 | 3 | 4 | 5 | 6 |
Residual (mm) | 0 | -1.2 | 1.2 | 0 | 0 | 1.8 |
It can be seen in Figure 9 the value of the Big M variable drops from 1 to nearly zero very quickly, demonstrating a correct initialisation of the problem and convergence to the initial problem. The stopping criteria used was the primal-dual gap (see Appendix B for the deﬁnition in Matlab). The convergence of the solution is accepted when the value of the primal-dual gap drops below a speciﬁed tolerance (usually 0.0001).
4.3.3 Implementation for Larger Problems
Clearly convergence after 55 iterations is not an acceptable solution
for such a small problem. As an academic exercise, this method
was applied to larger problems and exhibited similarly slow
performance. Convergence was not achieved for a problem of
$195$ GNSS measurements
and $92$ stations (deﬁning
a problem size of $n=595,m=276$)
using this algorithm.
Dikin’s aﬃne scaling algorithm was implemented in C++ using the Armadillo[28] linear algebra library. A warm start was used whereby initial ECEF coordinates converted from geodetic coordinates (in GDA94) using Gpstk[29] and the ellipsoid parameters of GRS80[10]. ECEF coordinates were then translated to be centred at the apriori coordinates of the ﬁrst station in the adjustment. The $A$ matrix and $b$ vector were then normalised (scaled to approximately unity) using the following relationship
$$\begin{array}{lll}\hfill \left(\frac{1}{\parallel A{\parallel}_{2}}A\right)\left(\frac{1}{\parallel b{\parallel}_{2}}\overrightarrow{x}\right)=\frac{1}{\parallel A{\parallel}_{2}\parallel b{\parallel}_{2}}\overrightarrow{b}& \phantom{\rule{2em}{0ex}}& \hfill \end{array}$$
The author attempted to speed up the Aﬃne Scaling method by warm
starting the solver with input coordinates and residuals that would be
close in value to the ﬁnal solution. These were obtained from the input
bath_syd_stn.xml ﬁle in GDA94 geodetic coordinates and converted to
GDA94 ECEF Cartesian coordinates using the Position class in
gpstk[29].
Although in small problems a warm start oﬀers no advantage over a cold
start (whereby there is a phase I problem initialisation), in very large
problems using a solver with polynomial time convergence properties a
warm start can assist with providing a correct initialisation and reduce the
phase I run time by a constant (usually between 0.3 and 0.5). For
problems that can take hours to days to solve this time saving is
signiﬁcant.
The next chapter will discuss an alternate formulation of the ${L}_{1}$ problem and will demonstrate the application of a faster solver with better convergence properties. Part II APPLICATIONS OF ${L}_{1}$ NORM MINIMISERS IN LARGE SURVEY NETWORKS
Further analysis of ${L}_{1}$ norm minimisation algorithms and results pertaining to survey networks and reliability
5 More Formulations of the ${L}_{1}$ Minimiser
The standard form of the ${L}_{1}$ problem from Chapter 3, replicated below, is appropriate for some simplex-based solvers but suﬀers from numerical instability issues using many open-source interior point method solvers^{1} .
$$\begin{array}{lll}\hfill \begin{array}{rl}\text{Minimise}z& ={\overrightarrow{e}}^{T}{\overrightarrow{x}}_{R}^{+}+{\overrightarrow{e}}^{T}{\overrightarrow{x}}_{R}^{-}+0{\overrightarrow{x}}_{P}^{+}+0{\overrightarrow{x}}_{P}^{-}\\ \text{Subjectto}{B}^{T}\overrightarrow{x}& =\left[\begin{array}{ccccccc}\hfill I\hfill & \hfill \vdots \hfill & \hfill -I\hfill & \hfill \vdots \hfill & \hfill A\hfill & \hfill \vdots \hfill & \hfill -A\hfill \end{array}\right]\left[\begin{array}{c}\hfill {\overrightarrow{x}}_{R}^{+}\hfill \\ \hfill {\overrightarrow{x}}_{R}^{-}\hfill \\ \hfill {\overrightarrow{x}}_{P}^{+}\hfill \\ \hfill {\overrightarrow{x}}_{P}^{-}\hfill \end{array}\right]=b\\ {\overrightarrow{x}}_{P}^{+},{\overrightarrow{x}}_{P}^{-},{\overrightarrow{x}}_{R}^{+},{\overrightarrow{x}}_{R}^{-}& \ge 0\\ {\overrightarrow{e}}^{T}& =\left[1\cdots 1\right]\end{array}& \phantom{\rule{2em}{0ex}}& \hfill \text{(17)}\end{array}$$
It is necessary to remodel the problem such that it is "easier"^{2} for an IPM solver to ﬁnd a solution. One approach is to ﬁnd a solution on the dual problem.^{3} By ﬁrst converting the primal problem from above to a form that allows negative parameters (as allowed by some solvers because of automatic remodeling), below
$$\begin{array}{lll}\hfill \begin{array}{rl}\text{Minimise}z& =0{\overrightarrow{x}}_{P}^{\pm}+{\overrightarrow{e}}^{T}{\overrightarrow{x}}_{R}^{+}\\ \text{Subjectto}{B}^{T}\overrightarrow{x}& =\left[\begin{array}{ccc}\hfill A\hfill & \hfill \vdots \hfill & \hfill I\hfill \\ \hfill -A\hfill & \hfill \vdots \hfill & \hfill I\hfill \end{array}\right]\left[\begin{array}{c}\hfill {\overrightarrow{x}}_{P}^{\pm}\hfill \\ \hfill {\overrightarrow{x}}_{R}^{+}\hfill \end{array}\right]\ge \left[\begin{array}{c}\hfill \overrightarrow{b}\hfill \\ \hfill -\overrightarrow{b}\hfill \end{array}\right]\\ {\overrightarrow{x}}_{R}^{+}& \ge 0\\ {\overrightarrow{x}}_{P}^{\pm}& \text{isunbounded}\\ \text{where}{\overrightarrow{e}}^{T}& =\left[1\cdots 1\right]\end{array}& \phantom{\rule{2em}{0ex}}& \hfill \text{(18)}\end{array}$$
Note that the row constraints are no longer bound by a strict equality $=$. This allows for residuals that are negative to be speciﬁed exactly in one of a pair of complementary constraints, with the other permitting the former equality with a positive surplus variable. This means
$$\begin{array}{llll}\hfill {a}_{i}{x}_{i}+{a}_{j}{x}_{j}+{r}_{k}& \ge {b}_{k}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\\ \hfill -{a}_{i}{x}_{i}-{a}_{j}{x}_{j}+{r}_{k}& \ge -{b}_{k}\phantom{\rule{2em}{0ex}}& \hfill & \phantom{\rule{2em}{0ex}}\end{array}$$
becomes
Subsequently, by using the Tucker Diagram (see Appendix I Table 7) the primal problem in Equation 18 can be converted to the dual problem
$$\begin{array}{lll}\hfill \begin{array}{rl}\text{Maximise}v& =\left[\begin{array}{cc}\hfill {\overrightarrow{b}}^{T}\hfill & \hfill -{\overrightarrow{b}}^{T}\hfill \end{array}\right]\overrightarrow{y}\\ \text{Subjectto}{B}^{T}\overrightarrow{x}& =\left[\begin{array}{ccc}\hfill {A}^{T}\hfill & \hfill \vdots \hfill & \hfill -{A}^{T}\hfill \\ \hfill {I}_{n}\hfill & \hfill \vdots \hfill & \hfill {I}_{n}\hfill \end{array}\right]\overrightarrow{y}\le \left[\begin{array}{c}\hfill {0}_{m}\hfill \\ \hfill {\overrightarrow{e}}_{n}\hfill \end{array}\right]\\ \overrightarrow{y}& \ge 0\\ \text{where}{\overrightarrow{e}}^{T}& =\left[1\cdots 1\right]\end{array}& \phantom{\rule{2em}{0ex}}& \hfill \text{(20)}\end{array}$$
Note that, according to the rules deﬁned by the Tucker diagram (Appendix I Table
6), the $\ge $
inequality in the row constraints becomes a
$y\ge 0$
lower bound for the dual variables. This problem is far "easier" to solve
using Mehrotra’s primal-dual solver.
The above approach to remodeling the problem in the dual form, although
mathematically correct, demonstrates nothing of the relevance of the dual
formulation to the original problem in Equation 18. To see it another way,
consider the constraints in the primal problem as being bounded from
below by the measurements. The primal problem objective is to minimise
the residual parameters such that this lower bound still holds. In the
dual formulation we ﬁrst substitute every original parameter with
${x}_{i}=1$ and then scale
each constraint $j$
by a parameter ${y}_{j}$
such that for each column (corresponding to the coeﬃcient
${x}_{i}$ in
the primal) in the constraint matrix the scalar product of the coeﬃcients
with ${y}_{1}\cdots {y}_{n}$
is bounded above by the corresponding coeﬃcient
${c}_{i}$ of
${x}_{i}$ in
the objective function. It intuitively follows that by maximising the values
${b}_{j}{y}_{j}$ subject to the
constraints ${a}_{1,i}{y}_{1}+\cdots +{a}_{n,i}{y}_{n}\le {c}_{i}$
this eﬀectively maximises the overall "utility" of the measurements whilst
being constrained by the "utility" of each parameter.
As shown by Dantzig and Von Neumann[8], if the primal problem
has a feasible solution then the dual also has a feasible solution
and the value of the objective function of the dual problem
$v$ in an
optimal feasible solution is the same as the value of the objective function
$z$
in the primal problem when it is optimal, i.e.
$z=v$ for
optimal solutions of the primal and dual. This is stated more formally in
the Strong Duality Theorem.
Strong Duality Theorem 5.0.1. If the primal system min $z={c}^{T}x$, $Ax\ge b$, $x\ge 0$ has a feasible solution and the dual system max $v={b}^{T}y$, ${A}^{T}y\ge c$, $y\ge 0$ has a feasible solution, then there exist optimal feasible solutions $x={x}^{\ast}$ and $y={y}^{\ast}$ to the primal and dual systems such that ${b}^{T}{y}^{\ast}=\text{max}v=\text{min}z={c}^{T}{x}^{\ast}$.
It remains to determine how to recover the residuals from the optimal
solution to the dual problem. In summary, by the Complementary
Slackness Theorem (below) the reduced costs of the slack (or surplus)
variables in the optimal solution to the dual problem are equal to the
values of the slack variables in the optimal solution to the primal
problem.
Complementary Slackness Theorem 5.0.1. Assume a primal problem ($P$) has a solution ${x}^{\ast}$ and the dual problem ($D$) has a solution ${y}^{\ast}$.
- If ${x}_{j}>0$ in ($P$), then the ${j}^{th}$ constraint in ($D$) is binding (this means the slack variable in the dual is zero).
- If the ${j}^{th}$ constraint in ($D$) is not binding, then ${x}_{j}=0$.
- If ${y}_{i}>0$, then the ${i}^{th}$ constraint in ($P$) is binding.
- If the ${i}^{th}$ constraint in ($P$) is not binding, then ${y}_{i}=0$.
By the exclusive slackness property of the primal problem as shown in
Equation 19, assuming that one of the constraints has a zero-value
surplus value (i.e. it is a binding residual constraint) and thus the
positive value of the residual for that constraint is also binding, the
complement constraint has a non-zero-valued surplus value of exactly the
negative value of the binding residual in the ﬁrst constraint. By item
$4$ in
the Complementary Slackness Theorem, a non-zero positive surplus
variable in the primal problem (i.e. the constraint is not binding) infers
that the corresponding value of the variable in the dual problem is zero,
which means it is not in the dual basis and thus has a non-zero reduced
cost.
Now, when the dual solution is optimal, the value of all of these reduced
costs will be positive or zero (by the Optimality Test Theorem). The cost of
a basis exchange to add one of these variables with a positive reduced cost
to the basis of the dual solution is precisely the magnitude of the
non-zero-valued surplus variable in the corresponding row constraint in
the primal problem. Put more simply, the surplus value representing the
residual in the primal problem is exactly equal to the reduced cost of this
surplus variable in the dual problem.
This has the side eﬀect that all residuals will be reported as absolute
values by this method since the eﬀect of the exclusive slackness constraints
19 represents the familiar model for an absolute value in the dual
problem. For the purposes of outlier detection, this is not an issue because
only the magnitude of the normalised residual is considered in the outlier
test.
5.1 ${L}_{1}$ Minimiser Model in the Gnu Linear Programming Kit
The Gnu Linear Programming Kit (GLPK)[21] is an open source library for linear programming that contains two separate solvers:
- A simplex-method (SM) solver.
- An interior-point-method (IPM) solver.
The SM solver has support for primal and dual problems. This is
demonstrated to be very robust and resilient to numerical instability.
The IPM solver uses a modiﬁed implementation of Mehrotra’s primal-dual
method for interior point problems[25]. It is more exposed to numerical
instability than the simplex method as demonstrated by the problem in
this chapter.
Both solvers have pre-processing stages that remodel the problem to a standard form. This allows the user to specify inequality constraints and negative bounds on parameters which is usually disallowed by the underlying algorithms. Extra constraints are added to the model to permit the problem speciﬁcation in this way, but care must be taken to appropriately specify a model such that it will have a feasible solution and is not unbounded. Using the above dual formulation of the ${L}_{1}$ problem on the entire set of more than 66,000 GNSS observations, GLPK was able to converge to an optimal solution in 30 seconds on a standard laptop computer.
6 ${L}_{1}$ Norm Analysis on NSW GNSS Network
The focus of an argument for using
${L}_{1}$
Norm minimisation for outlier detection hinges on reliability. As discussed
by Baarda[1], Pope[26] and subsequently many others[27][15], the internal
reliability of a model design is measured by how large an error must be
before it is detectable. As Pope notes[26]:
$\cdots \phantom{\rule{0.3em}{0ex}}$ the traditional objections to least-squares adjustments of triangulation; that they cannot be relied upon to localize the error and instead "Spread it around" $\cdots \phantom{\rule{0.3em}{0ex}}$
which identiﬁes two critical aspects in data snooping:
- The initial detection of an error
- The localisation of the error.
While an examination of measurement cycle mis-closes will reveal the existence of most gross errors, the localisation of these errors requires more than a mis-close analysis.
6.1 Residuals
The ﬁt of residuals to the Laplace probability density function (described
in Chapter 3) was carried out in Mathematica using Pearson’s Chi-Square
ﬁt test. The histogram of residuals for the initial set of observations is in
Figure 10 below. Note that there is always a "spike" in the histogram at the
zero level. This correlates exactly to the number of parameters in the basis
$u$, with
the remainder of the histogram representing the degrees of freedom
$n-u$. By
removing the residuals corresponding to all basis parameters the
histogram of residuals more closely ﬁts the Laplace probability density
function.
The Pearson Chi-Square ﬁt of the Laplace probability density function was calculated for the histogram of non-zero standardised residuals. The parameters of this distribution, with an assumed mean of zero, are $\mu =0,\beta =0.665766$ (or instead of $\beta $, the $h=\frac{1}{\beta}=1.502$ using the model deﬁned by Branham[3]). The sigma threshold was calculated at the 99% conﬁdence level on this distribution was solved from the cumulative density function
$\begin{array}{c}\begin{array}{cc}\hfill 1-{e}^{\genfrac{}{}{0.1ex}{}{1}{2}(-1.502\genfrac{}{}{0.1ex}{}{\widehat{v}}{\sigma})}& =0.995\hfill \\ \hfill \widehat{v}& =3.06\sigma \hfill \end{array}\hfill \end{array}$
By performing a cursory removal of measurements demonstrating outlier
behavior from the initial set of data in two passes the resultant reduced
data set still contained outliers that "reappeared" after the ﬁrst two
passes, but the maximum magnitude the standardised residuals
($\frac{\left|v\right|}{\left|s\right|}$) of outliers
decreased from $\pm $181.2
to $\pm $8.6.
The number of residuals above the sigma threshold of
$\widehat{v}=\pm 3.06\sigma $
decreased from 4,486 to 1,698.
The histogram of "degrees of freedom only" residuals of the "cleaner" data set is visually no diﬀerent to that of the initial data set due to the sheer amount of data. Additionally, the parameters for the Laplace probability density function for these standardised residuals are essentially identical, which demonstrates that the outliers removed from the "dirty" data set imparted no bias on the network model in a minimally constrained ${L}_{1}$ adjustment. A visual comparison of the network graph between the initial data set and the "cleaner" set is in Figure 11.
(a) ${L}_{1}$
adjustment
of
all
measurements
(b) After
two
passes
and
the
removal
of
1475
outlier
measurements
To properly address outliers in such a large data set, it is advisable to construct an automated procedure to iteratively report on and test the measurement with the largest outlier on a network adjusted with and without the said measurement until no standardised residuals test above a speciﬁed sigma threshold. This course of action is recommended for further research and development.
6.2 Reliability Tests
In a continuation of the method proposed by Marshall [23] a Monte Carlo based simulation was executed on the "cleaner" data set obtained in the above section. The test was conducted by adding random uniformly distributed artiﬁcial gross errors in the range ${\Delta}_{i}\in \left[-1000mm,1000mm\right]$ to 100 randomly selected measurements in the data set across 1000 passes. The results from each pass were analysed in Matlab using the code in Appendix D which classiﬁed the residuals of each adjustment into four groups:
- True negative: No artiﬁcial bias was added and measurement was not ﬂagged as an outlier.
- True positive: Artiﬁcial bias was added and measurement was ﬂagged as an outlier.
- False negative: Artiﬁcial bias was added and measurement was not ﬂagged as an outlier.
- False positive: No artiﬁcial bias was added and measurement was ﬂagged as an outlier.
The true negatives are displayed in Figure 12, the false positives are
displayed in Figure 13 and the false negatives are displayed in Figure
14. Note that the rate of false positives is relatively low except for
standardised residuals near the sigma threshold which identiﬁes that there is
a concentration of marginally detectable errors in the simulation
and thus implies the propensity for reliability errors in the NSW
network.
A visual comparison of the true negatives and false negatives in Figures 12
and 14 would indicate that the distribution of the false negatives
with respect to the value of the artiﬁcial gross error is reﬂected in
observations not present in the true negatives, where the latter would
ideally display a ﬂat uniform distribution over the entire spectrum
of gross errors. The shape of the distribution of false negatives
demonstrates that, above a "noise ﬂoor" of an approximate failure rate of
detection of between 5% and 10%, the failure rate of detection increases
sharply once the magnitude of the bias drops below 0.1 metres.
Between a bias magnitude of 0.1 and 0.2 metres the failure rate of
detection does also demonstrably increase relative to the range outside
$\pm 0.2m$.
6.2.1 Localisation of False Negatives
The localisation of the propensity for failure to detect gross errors as outliers has not been addressed in this thesis. However, this would be a viable avenue as a topic of further research. Perhaps the network neighborhood aggregation discussed in Chapter 7 could be of use in this area.
7 Datum Network Segmentation
As postulated by Marshall [22] there is little known
about the pre-analysis of robust estimators such as the
${L}_{1}$
norm.
One solution to reduce the overhead time complexity of an
${L}_{1}$
adjustment is to reduce the problem size (or dimension) by dividing the
problem into a series of sub-problems that have the property that the
union of the outliers detected across all sub-problems is equivalent to the
outliers detected in the initial problem.
The problem of dividing a network of measurements in to a set of sub-networks (this term will be interchangeable with the term segments from here-on) is made diﬃcult by the constraint mentioned above, which in the context of this problem can be expanded into the following:
Every measurement contained in the original network that has redundancy is contained in at least one of the sub-networks with redundancy.
This thesis will investigate the eﬃcacy of dividing the network by the
detection of a basis of cycles (or loops) of measurements and to group
these cycles into segments based on the overlapping observations shared
between cycles.
The hypothesis of correctness of the above proposal lies in the property that we wish to preserve, that is, the set of outlier observations. The magnitude of the error of these outliers is not as important as whether or not these observations are ﬂagged as such in all cases. In order to test this hypothesis, a series of simulations will be run in order to ascertain the statistical likelihood of correctness. These simulations will consist of simple permutations of outliers in across permutations of network geometries.
7.1 Cycle Detection
The problem of detecting cycles lies in the properties of the cycles we wish
to obtain. Since a complex network can contain many more cycles than
there are edges (or measurements) in the network, we wish to obtain a
minimally spanning cycle basis for the network in order to eﬀectively
minimise the number of adjustments performed. This task can be
complicated depending on how much we want to minimise this cycle
basis.
The chosen cycle detection algorithm was achieved using the following basic process:
- Find a spanning forest in the network.
- For each tree in the forest
- For each back-edge not in the spanning tree
- Find the smallest path joining the start and end nodes of the back-edge using A* search. This path cannot contain the back-edge.
- For each back-edge not in the spanning tree
7.2 Search Algorithms
It was desired that a set of cycles be obtained from a network such that:
- Every measurement in the network that can belong to a cycle does belong to at least one cycle
- The cycles must be as small as possible.
The cycles are then grouped to ensure high commonality between measurements and stations in a segment.
7.3 The $A\ast $ search algorithm
The most eﬀective and commonly-used search algorithm for scenarios
deﬁned by a single goal or target in a search space that can always
allow a reasonable (and admissible) method of predicting the
distance from a potential solution candidate to the goal state is the
$A\ast $
(pronounced A-star) search. Initially proposed in 1968 by Nils Nilsson
[citation needed] as a faster algorithm to select paths for robot navigation,
the $A\ast $
search has persisted in popularity due to much the same reason as
Dantzig’s simplex method - the good average-case computational time
complexity and the real-world nature of the conditions under which it can
be applied.
$A\ast $ is
eﬀectively an iterative state-based path-traversal search whereby
a queue of possible solution candidates at various points in the
search space is maintained and ordered by the cost. This cost is
calculated as the sum of the cost (or distance) the candidate has
traversed already plus an estimate or heuristic of the minimum cost
that is expected to be incurred for this candidate to reach the goal
state.
In the context of a cycle search in a geodetic network, this search algorithm can be applied using the following conditions and metrics:
- Nodes are stations in the geodetic network, and edges are the measurements.
- Each edge is required to belong to at least one cycle, hence every edge will incur a cycle search for that edge.
- The start node is the ﬁrst node of the edge being searched. The goal node is the second node of the same edge.
- The edge being cycle-searched is removed from the set of edges that can be selected as candidates in the A* search.
- The additional cost of a traversal is the Euclidean distance $\left|\right|\Delta X\left|\right|$ from the last node to the next node.
- The admissible heuristic is the straight line distance from the last node in the candidate to the goal node.
The algorithm for this search is summarised in pseudo-code in 15. This algorithm was eﬀectively implemented in the program LDNet. See Appendix F for a C++ implementation of the below pseudocode algorithm.
2: Let $Q$ be an ascending-cost-sorted queue of edges remaining to be searched
3: Let $C$ contain a list of edges that have been searched already
4: Remove $e$ from $G$
5: ${e}_{previous}\leftarrow END$
6: Enqueue $e$ onto $Q$ with cost MAXCOST
7: while $Q$ contains edges to be searched do
8: $d\leftarrow dequeue\left(Q\right)$
9: if $d.nod{e}_{2}=e.nod{e}_{2}$ then
10: return Path traced by ${d}_{previous}$ until $END$ is reached.
11: end if
12: for All adjoining edges $h\in G$, $h\ni C$ do
13: ${h}_{previous}\leftarrow d$
14: ${h}_{pathlength}\leftarrow {d}_{pathlenth}+{h}_{length}$
15: ${h}_{cost}\leftarrow {h}_{pathlenth}+distance\left(h.nod{e}_{2},e.nod{e}_{2}\right)$
$\u22b3$Path cost plus estimated remaining distance as the admissible heuristic
16: $Q\leftarrow enqueue\left(h\right)$ by ${h}_{cost}$
17: end for
18: end while
19: end procedure
7.4 Grouping Cycles
Initial testing of the L1 adjustment was done on networks of naively
grouped cycles. This yielded results that conﬁrmed that the algorithm
performed correctly but ultimately did not provide an easily identiﬁable
way of analysing the resultant residuals of the measurements in the
network as a whole.
It was hypothesised that by intelligently grouping a
subset of measurements from the whole network, an
${L}_{1}$ norm
analysis of these measurements could yield reliable information on the residuals
of a subset of these measurements. The author proposes that for each measurement
${o}_{xyz}$ in the whole
network $G$,
(i.e. for ${o}_{xyz}\in G$,
an ${L}_{1}$
adjustment on a neighbourhood of measurements
$V$ about
${o}_{xyz}$
of appropriate depth would yield reliable residuals for
${o}_{xyz}$
relative to the whole network.
See Appendix G of the thesis
for an implementation of
the aggregation algorithm in C++.
The algorithmic robustness of neighbourhood aggregation would be
compromised if a station has a particularly high degree (number of
measurements to it). This phenomenon exists in the provided set of GNSS
baseline data, particularly around TS12042 which has more than 30,000
measurements to it. To allow the simplex method to solve within
reasonable time, the number of measurements should not exceed 1000 per
segment. Therefore in such cases it may be necessary to further split the
data into overlapping neighborhood partitions by taking a comb of
measurements at stations with a degree more than 500. When testing for
outliers in each neighborhood partition it is required that there are no
false negatives, therefore a rigorous method must be devised for this
neighborhood partitioning.
7.5 Future Research
As noted earlier, testing of internal reliability on a ﬁxed size neighborhood relative to the whole network did not commence during the term of research. Should another researcher be interested in pursuing this avenue then all available source code will be provided by the author.
8 Concluding Remarks
The application of fast linear solvers to the ${L}_{1}$ minimisation problem in a large network has demonstrated the viability of such a device in the quality analysis stage of datum determination. It is not diﬃcult to see the automation possibilities to achieve near-real-time datum quality analysis. As noted by Haasdyk et al [12] to achieve the measurement density required for a dynamic datum the use of crowd-sourced survey data is almost inevitable and as such a rigorous quality analysis system would be required as a ﬁrst level assessment to provide immediate feedback to both the contributing and receiving parties.
8.1 Further Development
Currently the software developed for this thesis is capable of processing
GNSS baselines but there is no support for terrestrial observations. The
latter would require the implementation of ellipsoidal model equations as
described in Harvey’s Monograph 13[14] in addition to support for a
geoid-ellipsoid separation description such as AusGeoid09[9].
Further to the above, the process of iteratively down-weighting or
removing outliers by ﬁtting the Laplace distribution via the Pearson
Chi-Square test is, at present, performed manually. With potentially
thousands of outliers in a set of hundreds of thousands or millions of
measurements it would be wise to automate this procedure by the
speciﬁcation of decision parameters for outlier removal.
Lastly, multi-epoch support should be addressed in the context of a dynamic datum. Given the propensity for localised deformation on the Australian plate[12] the datum deﬁnition could itself be used as a large-scale monitoring survey to identify marks that have potentially moved between consecutive epochs. The likelihood of movement of marks would be statistically tested and the basis for such tests would depend on the quality of surrounding measurements at the local level.
8.2 Further Research
The result of initial internal reliability testing revealed the potential for a
signiﬁcant proportion of type 1 and type 2 errors in the identiﬁcation of
outliers as measurement errors. As such it is recommended that further
testing of internal reliability should be conducted on the network
geometry of the NSW network of class 2A0 marks. Of particular relevance
is the reduction of the rate of false negative ﬂagging in areas of very low
redundancy.
Although GNSS baseline vector components are correlated and this was
considered during the term of this thesis, it was beyond the scope of this
thesis to investigate the eﬀect of this correlation with respect to
${L}_{1}$
norm minimisation in detail.
In general, the network geometry and redundancy still primarily determine the ability to detect gross errors in survey networks. As demonstrated by the internal reliability testing in this thesis, even with suﬃcient redundancy there is still a chance that gross errors will not be localised exactly using an ${L}_{1}$ adjustment. However, when compared with other existing methods such as the Baarda test[1] in least squares, the same techniques on an ${L}_{1}$ adjusted data set provide signiﬁcant information that when used in conjunction with existing data snooping methods can greatly assist the process of gross error identiﬁcation. Part III APPENDIX
A. Glossary
A.1 Glossary of terms
Depending on academic background, some terminology used in this document may not be immediately familiar to the reader. Since this document spans both aspects of computer science, mathematical graph theory and traditional survey concepts, there are some terms that have diﬀerent meanings in diﬀerent contexts and some others that can only be adequately explained using the most rigorous mathematical terminology. The author has made an attempt to describe all terms in this section with references and examples in the relevant areas of study.
A.1.1 Basis
In a linear system, the basis is the set of linearly independent vectors upon which all vectors in the system are represented componentwise.
A.1.2 Cycle
In graph theory, a cycle is speciﬁcally a path through a graph (or survey network) in which the start node (or point) is also the end node. This is often called a loop in the context of a survey network, however the term loop in graph theory speciﬁcally refers to a single edge (or measurement) that links a single node to itself. This latter property is never exhibited in a survey network.
A.1.3 Over-determined
As a basic property of linear algebra: If A is
$n\times n$,
there exists exactly one feasible solution for
$z$. However,
typically $A$ is
$n\times u$ and hence the
system of equations $Ax\le b$
is over-determined, which means that there is no single solution for
$z$, but
there exists as many feasible solutions as there are degrees of freedom in
$A$, that
is $n-u$,
where $u$
is the number of parameters in the system.
A.1.4 Convexity and a Convex Set
An aspect of a geometric ﬁgure that is a requirement of the solution space of the Simplex method that can be mathematically determined. In a vector space, convexity can be visualised by the following statement: A vector space is convex if there does not exist any pair of vectors between which a straight line passes outside the bounds of the set.
A.1.5 Sentinel Value
In computer programming, a sentinel value is a special value whose presence guarantees termination of a loop that processes structured (especially sequential) data. In more complex algorithms it is also a term used to describe a variable that is used to decide the ﬂow control of the execution based on certain values.
A.1.6 Norm
A Norm is a type of measure or length. Typically it is used in the context of the measurement of the size of a vector or matrix. In the case of the L1 Norm of a geodetic network, the set being normed is the set of residuals.
A.1.7 Degeneracy
A linear program is said to be degenerate or to become degenerate when basic variables of zero value can result in pivot operations for which there is no improvement in the objective value. This would then cause the next pivot operation to use
A.1.8 Cycling
A linear program can encounter conditions whereby it will run indeﬁnitely and never reach an optimal solution. This occurs when a particular set of basic variables is encountered more than once. Since linear programming is deterministic, the pivot operations on a particular basis will always be the same, hence encountering a basis more than once means that the program will "cycle" through the same sequence of sets of basic variables in an inﬁnite loop.
A.1.9 Degree
In a graph, the degree of a node refers to the number of edges adjoining that node.
A.1.10 Non-sparsity
In the context of a network (or graph) of vectors, the author has chosen the term Non-Sparsity to represent the likelihood of a single gross error in a measurement near stations of very high degree being hidden or spread across the residuals of many neighbouring measurements.
A.1.11 Hyper-cube
An $n$-dimensional cube. The deﬁning characteristic when considering a hyper-cube problem-space of a linear-programming problem is that each constraint does not bypass any other constraint and that the correct pivot choice is often not the ﬁrst one chosen by the simplex method, hence increasing computational time complexity.
A.1.12 Undirected Graph
The mathematical representation of a collection of nodes and adjoining edges whereby edges do not have a direction of ﬂow, i.e. paths that are traversed in this graph can ﬂow in either direction across an edge.
A.1.13 Spanning Tree
A spanning tree is a single connected tree in a graph, rooted at any node, whereby every node in the graph is visited once and only once by a path connecting the root node to that node. It is thus a property of a spanning tree that the spanning tree contains no cycles. A graph whereby a path can be found between any pair of nodes is connected, and thus a single spanning tree can be found for this graph.
B. Aﬃne Scaling Solver in Matlab
H. LDNet I/O Formats
H.1 Input Measurements
The below is an example of a single measurement input to LDNet.
I. Duality Tables and Equivalences
Primal | Dual |
Minimize Primal Objective | Maximize Dual Objective |
Objective Coeﬃcients | RHS of Dual |
RHS of Primal | Objective Coeﬃcients |
Coeﬃcient Matrix | Transposed Coeﬃcient Matrix |
Primal Relation: | Dual Variable: |
(${i}^{th}$) Inequality: $\ge $ | ${y}_{i}\ge 0$ |
(${i}^{th}$) Inequality: $\le $ | ${y}_{i}\le 0$ |
(${i}^{th}$) Equation: $=$ | ${y}_{i}$ unrestricted in sign |
Primal Variable: | Dual Relation: |
${x}_{j}\ge 0$ | (${j}^{th}$) Inequality: $\le $ |
${x}_{j}\le 0$ | (${j}^{th}$) Inequality: $\ge $ |
${x}_{j}$ unrestricted in sign | (${j}^{th}$) Equation: $=$ |
Variables | ${x}_{1}\ge 0$ | $\cdots \phantom{\rule{0.3em}{0ex}}$ | ${x}_{n}\ge 0$ | Relation | Constants | |
${y}_{1}\ge 0$ | ${a}_{11}$ | $\cdots \phantom{\rule{0.3em}{0ex}}$ | ${a}_{1n}$ | $\ge $ | ${b}_{1}$ | |
Dual | $\vdots $ | $\vdots $ | $\ddots $ | $\vdots $ | $\vdots $ | $\vdots $ |
${y}_{1}\ge 0$ | ${a}_{m1}$ | $\cdots \phantom{\rule{0.3em}{0ex}}$ | ${a}_{mn}$ | $\ge $ | ${b}_{m}$ | |
Relation | $\le $ | $\cdots \phantom{\rule{0.3em}{0ex}}$ | $\le $ | $\le $ Max $v$
| ||
Constants | ${c}_{1}$ | $\cdots \phantom{\rule{0.3em}{0ex}}$ | ${c}_{n}$ | $\ge $ Min $z$
| ||
________________________________________________________________
Bibliography
[1] W. Baarda. A Testing Procedure for Use in Geodetic Networks. Number v. 2, no. 5 in A testing procedure for use in geodetic networks. Netherlands Geodetic Commission, 1968. ISBN 9789061322092. URL http://books.google.com.au/books?id=4Ho-SwAACAAJ.
[2] I. Barrodale and F. D. K. Roberts. Solution of an overdetermined system of equations in the l1 norm [f4]. Commun. ACM, 17(6):319–320, June 1974. ISSN 0001-0782. doi: 10.1145/355616.361024. URL http://doi.acm.org/10.1145/355616.361024.
[3] R.L. Branham. Scientiﬁc data analysis: an introduction to overdetermined systems. Springer-Verlag, 1990. ISBN 9780387972015. URL http://books.google.com.au/books?id=USLvAAAAMAAJ.
[4] W.F. Caspary and J.M. Rueger. Concepts of Network and Deformation Analysis. Monograph (University of New South Wales. School of Surveying). School of Surveying, The University of New South Wales, 1987. ISBN 9780858390447. URL http://books.google.com.au/books?id=i-W5SAAACAAJ.
[5] I. Castillo and E. Barnes. Chaotic behavior of the aﬃne scaling algorithm for linear programming. SIAM Journal on Optimization, 11(3):781–795, 2001. doi: 10.1137/S1052623496314070. URL http://epubs.siam.org/doi/abs/10.1137/S1052623496314070.
[6] M. Cen, Z. Li, X. Ding, and J. Zhuo. Gross error diagnostics before least squares adjustment of observations. Journal of Geodesy, 77(9): 503–513, 2003. ISSN 0949-7714. doi: 10.1007/s00190-003-0343-4. URL http://dx.doi.org/10.1007/s00190-003-0343-4.
[7] G. Dantzig. Linear programming and extensions. A Rand Corporation research study. PRINCETON University Press, 1965. ISBN 9780691059136. URL http://books.google.com.au/books?id=2j46uCX5ZAYC.
[8] George B Dantzig and Mukund N Thapa. Linear programing: 1: Introduction, volume 1. Springer, 1997.
[9] WE Featherstone, JF Kirby, Christian Hirt, MS Filmer, SJ Claessens, NJ Brown, Guorong Hu, and GM Johnston. The ausgeoid09 model of the australian height datum. Journal of Geodesy, 85:133–150, 2011.
[10] geodesy@ga.gov.au. Common ellipsoid parameters, May 2013. URL http://www.ga.gov.au/earth-monitoring/geodesy/geodetic-datums/historical-datums-of-australia/common-ellipsoid-parameters.html.
[11] Michael Grant and Stephen Boyd. CVX: Matlab software for disciplined convex programming, version 2.0 beta. http://cvxr.com/cvx, September 2013.
[12] Joel Haasdyk and Craig Roberts. Monitoring station movement using a state-wide simultaneous adjustment of everything - implications for a next-generation australian datum. In Proceedings of International Global Navigation Satellite Systems Society IGNSS Synopsysm 2013, page 15, July 2013.
[13] Joel Haasdyk and Tony Watson. Data-mining in nsw: Towards a new and improved australian datum. In Proceedings of Association of Public Authority Surveyors Conference, pages 85–102, March 2013.
[14] B. R. Harvey. Practical least squares and statistics for surveyors. School of Surveying, University of New South Wales, Kensington, N.S.W. :, 1st ed. edition, 2011. ISBN 0858390590.
[15] Bruce R Harvey. Survey network adjustments by the l1 method. Australian Journal of Geodesy, Photogrammetry and Surveying, 59:39–52, 1993.
[16] Narendra Karmarkar. A New Polynomial-Time Algorithm for Linear Programming. Combinatorica, 4:302–311, 1984. doi: 10.1007/BF02579150.
[17] SamwelSimon Katambi, Guo Jiming, and Kong Xiangyuan. Applications of graph theory to gross error detection for gps geodetic control networks. Geo-spatial Information Science, 5: 26–31, 2002. ISSN 1009-5020. doi: 10.1007/BF02826471. URL http://dx.doi.org/10.1007/BF02826471.
[18] A. Khodabandeh and A. Amiri-Simkooei. Recursive algorithm for l1 norm estimation in linear models. Journal of Surveying Engineering, 137(1):1–8, 2011. doi: 10.1061/(ASCE)SU.1943-5428.0000031. URL http://ascelibrary.org/doi/abs/10.1061/%28ASCE%29SU.1943-5428.0000031.
[19] V. Klee and G. J. Minty. How Good is the Simplex Algorithm? In O. Shisha, editor, Inequalities III, pages 159–175. Academic Press Inc., New York, 1972.
[20] D.C. Lay. Linear Algebra and Its Applications. World student series. Addison-Wesley Longman, Incorporated, 1997. ISBN 9780201824780. URL http://books.google.com.au/books?id=4OanQgAACAAJ.
[21] Andrew Makhorin. Gnu linear programming kit, version 4.53, 2013. URL http://www.gnu.org/software/glpk/glpk.html.
[22] J. Marshall. L1-norm pre-analysis measures for geodetic networks. Journal of Geodesy, 76(6-7): 334–344, 2002. ISSN 0949-7714. doi: 10.1007/s00190-002-0254-9. URL http://dx.doi.org/10.1007/s00190-002-0254-9.
[23] John Marshall and James Bethel. Analysis of residuals from l 1 norm estimation. ISPRS Archives, pages 38–43, 1996.
[24] John Marshall and James Bethel. Basic concepts of l 1 norm minimization for surveying applications. Journal of surveying engineering, 122(4):168–179, 1996. doi: 10.1061/(ASCE)0733-9453(1996)122:4(168). URL http://ascelibrary.org/doi/abs/10.1061/%28ASCE%290733-9453%281996%29122%3A4%28168%29.
[25] S. Mehrotra. On the implementation of a primal-dual interior point method. SIAM Journal on Optimization, 2(4):575–601, 1992. doi: 10.1137/0802028. URL http://epubs.siam.org/doi/abs/10.1137/0802028.
[26] A.J. Pope, Geodetic Research, Development Laboratory, National Geodetic Survey (U.S.), and National Ocean Survey. The statistics of residuals and the detection of outliers. NOAA technical report. U.S. Dept. of Commerce, National Oceanic and Atmospheric Administration, National Ocean Survey, Geodetic Research and Development Laboratory, 1976. URL http://books.google.com.au/books?id=tHUeAQAAIAAJ.
[27] Chris Rizos. Principles and practice of gps surveying, January 2000. URL http://www.gmat.unsw.edu.au/snap/gps/gps_survey/principles_gps.htm.
[28] Conrad Sanderson. Armadillo c++ linear algebra library, June 2013. URL http://arma.sourceforge.net/.
[29] Brian Tolman, R. Benjamin Harris, Tom Gaussiran, David Munton, Jon Little, Richard Mach, Scot Nelsen, and Brent Renfro. The GPS Toolkit: Open Source GPS Software. In Proceedings of the 16th International Technical Meeting of the Satellite Division of the Institute of Navigation, Long Beach, California, September 2004.
[30] Henry Wolkowicz. C&o 350 linear programming course notes, fall semester 1995, 1995. URL http://orion.math.uwaterloo.ca/~hwolkowi/henry/teaching/f95/350.f95/matlabfiles/affine.m.gz.
[31] Y. Zhang. Primal-dual interior point approach for computing l-1-solutions andl l-inﬁnity-solutions of overdetermined linear systems. Journal of Optimization Theory and Applications, 77: 323–341, 1993. ISSN 0022-3239. doi: 10.1007/BF00940715. URL http://dx.doi.org/10.1007/BF00940715.
1 Tested on a subset of the NSW LPI data of 195 measurements and 92 stations
2 Gnu Linear Programming Kit
1 For further information on the formulation of the Jacobian matrix $A$, see Harvey’s Monograph 13[14].
2 by referring to random error, this excludes systematic error or bias which can skew a least squares solution as we know from practice.
3 Algorithms such as the Simplex algorithm are made much simpler when all parameters are non-negative.
4 The notation ${\overrightarrow{x}}^{\pm}$ is used throughout this thesis to represent the pair ${\overrightarrow{x}}^{+}\ge 0$ and ${\overrightarrow{x}}^{-}\ge 0$. Whenever ${\overrightarrow{x}}^{\pm}$ is referred to in an equation, relation or assignment such as ${x}_{i}^{\pm}={b}_{i}$ this is symbolically equivalent to ${x}_{i}^{+}=max\left(0,{b}_{i}\right)$ and ${x}_{i}^{-}=max\left(0,-{b}_{i}\right)$.
1 Such as GLPK, discussed later in this chapter
2 As noted in repeated experiments by the author, Mehrotra’s primal-dual method does not often converge on small-medium ($n>50$) problems using the primal formulation in Equation 17. Large problems always resulted in numerical instability.
3 Before continuing, it should be noted that there is a large amount of theory behind this approach that will not be covered in this thesis. For reference, the very readable Linear Programming Volume 1 by Dantzig and Thapa [8] contains all of the relevant deﬁnitions and theorems and some very good examples on duality theory.