NSW LPI GNSS Vectors - Central West Network Segments
A subset of an L1 adjustment on the LPI NSW GNSS network is displayed to the right.
Click on a baseline in the map to see residual information.
If the map does not appear, refresh the page or contact the webmaster
For a complete kml of the NSW GNSS network as provided by LPI NSW click here.
Note that all measurements in the kml are precise only to eight significant figures and thus should not be relied upon for any geodetic surveys.
This dissertation investigates computational methods based on the minimisation of the norm of residuals for the identiﬁcation of outlier observations in a datum network comprised of GNSS baselines. The Barrodale & Roberts algorithm based on Dantzig’s Simplex method was examined in depth and the computational complexity issues that arose from large data sets necessitated an investigation into a network segmentation-based compromise and the application of interior point solvers for linear programming in the minimisation of the norm of residuals in very large data sets. The dual of the norm minimiser primal model was ﬁnally investigated and successfully applied to yield a solution in thirty seconds using an implementation of Mehrotra’s Primal-Dual Interior Point Solver as opposed to more than six hours for the same problem model on the same hardware using a simplex solver. Further analysis of the segmentation-based solution was not deemed necessary once a fast interior point method was successfully applied.
Gaussian parametric weighted least squares (WLS) is still the most widely
known method of obtaining both a best ﬁt for a survey network
adjustment and reliable quality parameters for that ﬁt. In the context of a
very large set of measurements such as that for a national datum, where
thousands of parameters are to be estimated, WLS is still the ﬁrst choice
for obtaining coordinate and quality parameters. However, when
the volumes of data are large and the quality of the meta-data is
sometimes uncertain, the result from a WLS adjustment can be
equally uncertain. The analyst will often be required to employ
supplementary quality control techniques to assist in cleaning a set of
Of the existing methods for the detection of outliers, the norm minimiser using both simplex and an interior point algorithms was assessed for the ability to detect real and simulated outlier observations in large data sets. This was compared against the other more conventional least squares Baarda test and misclose analysis as explored by Katambi et al.
The mechanism of the simplex method as applied to a
minimiser in geodetic networks is explained in Chapter4 followed by
a discussion on an adaptation of an interior-point optimisation
algorithm which in practice can be more computation-time eﬃcient
on large sets of data but has the unfortunate aspect of being less
The ongoing eﬀort to source historical geodetic measurement data from archives for the next Geodetic Datum of Australia has resulted in a large set of measurement data containing 66,000+ GPS baseline measurements that will estimate the Earth-Centred, Earth Fixed (ECEF) positions of 20,000+ class A survey marks. Characteristic gross errors in measurements have been encountered and continue to arise as outliers in existing data as new data is added.
A more rigorous pre-processing step prior to the ﬁnal least squares
adjustment of the GPS baseline network was requested to reduce the
repetitive aspect of individually testing each outlier as it appears in a large
adjustment. The author trialled several methods for simplifying the
problem and succeeded with a residual analysis technique using a
proposed adjustment-per-measurement method that approximates
redundancy in the whole network with a "redundancy neighbourhood" of
ﬁxed depth from the measurement in question. The method is detailed in
The focus of an argument for using
Norm minimisation for outlier detection hinges on reliability. As discussed
by Baarda, Pope and subsequently many others, the internal
reliability of a model design is measured by how large an error must be
before it is detectable. As Pope notes:
the traditional objections to least-squares adjustments of triangulation; that they cannot be relied upon to localize the error and instead "Spread it around"
which identiﬁes two critical aspects in data snooping:
- The initial detection of an error
- The localisation of the error.
While an examination of measurement cycle miscloses will reveal the existence of most gross errors, the localisation of these errors requires more than a misclose analysis. This is explored in Chapter 6.
Although the norm minimiser has often been suggested as an auxiliary quality control step  there is currently no widely known commercial solution for dealing with large networks of survey data.
Harvey also notes the application of norm minimisation to identify non-measurement errors in a survey adjustment such as datum defect (also known as rank defect). This results from the robustness of the simplex method in optimising an under-determined system of linear equations.
- Provide a solid theoretical background for newcomers to the norm minimisation model.
- Evaluate the time performance of various norm minimiser algorithms on survey data.
- Find a well-balanced practical solution to the computational time intractability issue arising in algorithm performance on large data sets such that the desired output information on measurement residuals is preserved.
The theoretical basis for L1 norm is a large area which encompasses the mathematical model, the various computational methods and the proposed application of the output. This task will be broken down into the following subsections
- The formulation of a linear program of the norm minimiser.
- A worked example using modiﬁed simplex method analogous to Barrodale and Roberts’ algorithm.
- The same example problem using a simpliﬁed interior point method based on Dikin’s aﬃne scaling algorithm.
The performance in time, correctness and convergence of the below
solvers was tested. The source of the data is the complete set of
NSW GNSS baselines provided by LPI unless otherwise stated. See
footnotes1 2 3 4
|Software||Solver Type||Time Complexity||Time Taken|
|Primal Simplex||Exponential||Very slow1: > 1 second|
|Mathematica||Simplex||Exponential||Slow1: > 1 second|
|Mathematica||Primal-Dual IP||Polynomial||Fast1: < 0.01 seconds|
|Matlab CVX||Primal-Dual IP||Polynomial||Fast1: < 0.01 seconds|
|Dikin Aﬃne Scaling||Primal Aﬃne IP||Not known||Did not converge1|
|GLPK||Primal-Dual IP4||Polynomial||30 seconds3|
The objective of research in this thesis was to ﬁnd a norm minimiser algorithm to operate on a very large data set where the number of parameters exceeds . Therefore the very real problem of computational time intractability was examined. Prior to ﬁnding an acceptable solution using GLPK, a proposal for a segment-based solution with the intention to preserve the validity of the output norm residuals with respect to the whole of a large data set was investigated in Chapter 7. The naïve approach was to evaluate a complete subset of data from the larger set with the hope that all of the information on residuals (observed - calculated) relative to the subset is completely relevant to the original large data set. This approach was abandoned once the 30 second solver was found and applied but the investigation and the work on ﬁnding a complete method of segmenting the network was preserved for the possibility of future research.
It is assumed that the reader is familiar with the principles of weighted least squares as applied to a geodetic network adjustment, with speciﬁc knowledge of the formulation of the Jacobian matrix and the (observed minus calculated) vector . More importantly, it is assumed that the reader understands the deﬁnition of a residual with respect to a least squares solution. For more information, the recommended reference text is Harvey’s Monograph 13. A brief review of this material is provided in the early sections of chapter3
The appearance of outlier measurements in a network are often in one or more of the following forms:
- Wrong standpoint errors are where observations are taken from a witness mark near a trig station but this detail is missing from meta-data.
- Height booking errors describe incorrectly booked instrument height meta-data.
- Station name errors are measurements where one or both of
the station numbers in the meta-data are incorrect, causing
- Skew in the network as a function of the measurement length (similar to wrong standpoint errors).
- Separation of the station into two nearby points that therefore reduces redundancy and weakens the network at that point
- Deformation of the network over consecutive epochs.
- Incorrect ambiguity resolution where range estimates to satellites are incorrect in all coordinate components. A common cause is multi-path bias which is localised to individual stations.
The eﬀect of gross errors on a least squares adjustment has been shown to
"spread" the error across two or more measurements when there is enough
redundancy to do so. In the case where gross errors can be
small in magnitude relative to the magnitude of the measurement,
such as a height booking error that is incorrect by 100mm over a
GPS baseline of 50 km, it is possible for a signiﬁcant proportion of
that error to be spread over the residuals of several neighbouring
Haasdyk et al note examples of incorrect ambiguity resolution,
where the length of the range to satellite(s) is therefore incorrect, and
aﬀects all coordinate components. These manifested in very small
(sub-metre and often sub-decimetre) gross errors that are sometimes
undetectable in constrained local-neighborhood adjustments that rely on a
"perfect" Class 2A0 control network.
In a related issue, multiple processing of the same observations
necessitated by observing outliers when ﬁtting a local neighborhood
adjustment to a larger neighborhood using a wider control network often
resulted in several diﬀerent "versions" of the same baseline that are
signiﬁcantly diﬀerent. When using such measurements in a very large
unconstrained network adjustment, the ability to "choose" the correct
measurement is compounded by the quality of the surrounding
Observations with station name errors were sometimes easy to spot when the measurement appeared as a signiﬁcant outlier that traversed an unlikely distance. For example in Figure 1 the measurements highlighted in blue clearly shows measurements where one or more station names have been taken from the incorrect trigonometric station, often several hundred kilometres away.
In contrast, a station naming error that does not refer to an existing station but instead "creates" a new, nearby station in the adjustment can be diﬃcult to identify in a large adjustment. For example, in Figure 2 the station TS5110 appears as two stations — one at the apex of most measurements and the lower line is drawn terminating in its nominated location at TS5110-1. When cross-referencing with existing station names it appears that it is equivalent to TS5110-2.
Other examples related to the above as noted by Haasdyk in correspondence and in  are:
|(Destroyed) TS10533||TS10985||20mm apart|
Caspary  notes that there is a diﬃculty in using the least squares vector of residuals
is the gross
error on the
measurement) in the detection of gross errors as outliers
because the whole vector of residuals is contaminated by
Roughly speaking, given a large system of parameters where the
likelihood of gross error contamination is high, the number of marginally
detectable errors can also be proportionally high.
The sensitivity of an adjustment in detecting outliers is referred to as the
"internal reliability" . As stated by Rizos , the internal reliability of
an adjustment is quantiﬁed by the set of marginally detectable
errors. A large set of MDEs indicates low internal reliability, and
conversely the smaller the set of MDEs, the higher the internal
A common method of outlier detection in a least squares adjustment is
Baarda’s data snooping method . This method assumes the quality of
the measurements is known reliably and, more importantly, it assumes
that only one outlier exists in the set of observations. In everyday use on
small networks this method works well because there are generally
relatively few outliers present. However, when considering large and
highly interconnected networks, this data-snooping method can
sometimes fail due to bias introduced by outliers on neighbouring
Another commonly used method is Pope’s Tau test, which is primarily used when measurement quality is not reliably known. Instead of normalising residuals using a priori weights, the Tau test ﬁrst computes the standardised residuals using weights derived from the variance co-factor () matrix of the residuals computed from a least squares adjustment once it has converged. Each standardised residual is
scaled by the square root of the variance factor of the adjustment, which
where is the square root of the diagonal entry in the co-factor matrix of the residuals .
The Tau test then compares each normalised residual with the Tau value
The need for a robust method to detect more than one outlier in a
large set of data, as justiﬁed above, has already been addressed
in the form of an alternative adjustment. Rather than use the
norm, that is the square root of the sum of squares of residuals, researchers
such as Branham argue that a statistically more likely method
of revealing outlier observations reliably is by minimising the
norm of the residuals, that is the sum of the absolute values of the vector of
The choice of the norm minimiser over the least squares (that is, the norm) minimiser is precisely because it does not spread the magnitude of the residuals over the neighbouring measurements as evenly as possible. This poses the question: Why does the norm minimiser not spread residuals? The answer can be roughly explained by looking at the eﬀect of each on large and small residuals.
Say in a loop of height measurements, one measurement has a gross error of . All measurements have an expected variance of and neighboring measurements have no gross errors. Mathematical method aside, a least squares adjustment would "consider" the magnitude of a single residual alone versus the possibility of assuming that this error magnitude was divided equally between and a neighboring measurement such that the error in each is assumed to be and . In the latter case the norm is
which is clearly smaller than the ﬁrst case. A least squares minimiser will not stop there though - it would then consider spreading the error in and to their neighboring measurements, and so on until all error has been spread as evenly as possible such that the norm is at a minimum.
In the discipline of mathematical optimisation this phenomenon where many smaller residuals are preferred over few large residuals would be referred to as penalising large residuals. This is precisely what least squares does as a by-product of obtaining a best ﬁt because it assumes that there are no outliers in the data. However, data containing several outliers necessitates a minimiser that does not penalise residuals by magnitude.
Using the above example, say an norm minimiser "considers" the error as and all other residuals as negligible , . It then considers spreading the error to a neighboring measurement such that . The norm of the latter is
Here we can see that "spreading" the error across two residuals will result in a norm that is greater than or equal to the norm of restricting the error to the residual of one measurement. Hence, any norm minimiser would prefer the case where the residual error is assigned to as few measurements as possible.
The next chapter will elaborate on the analysis of the residuals of an norm minimiser with some example using geodetic observations to illustrate the relevance to this problem.
The introduction to estimator models is discussed in Chapter 3 of the thesis.
Monte Carlo experiments conducted by Marshall indicate quite clearly that the sampling distribution of norm residuals is not the Gaussian (normal) distribution but instead exhibits a very large spike at the ﬁrst moment. Branham states that this sampling distribution corresponds to a modiﬁed exponential (or alternatively, Laplace) probability density function
This is a modiﬁed form of the exponential distribution, which in general form is
For the function is graphed in Figure 5.
By maximising the log density function , the ﬁrst and second moments are
The distortion of the ﬁrst moment with respect to outliers is restricted to movement by one "data point" rather than any proportion of the magnitude of the outlier. Assuming an equal distribution of positive and negative outliers and residuals, the ﬁrst moment will be zero. If the distribution of residuals of a minimiser happen to be biased such that they are mostly all positive or all negative, this would imply the model assumptions are incorrect. In a minimally constrained adjustment, this should not occur.
As noted by Harvey, outlier detection is often done on a minimally constrained model. Ideally, no distance scale would be enforced by the measurements such that the resultant residuals would contain minimal bias. A minimally constrained adjustment is more likely to have an average error of zero.
Marshall investigated the bounds of internal reliability (the relative
magnitude of measurement error before it is statistically ﬂagged as an
outlier) using Monte Carlo methods. By using a consistent a priori
standard deviation on all measurements Marshall notes that the primary
deviation in internal reliability is caused by variations in network
geometry. In his horizontal network experiment of 6 stations (12
parameters) and 15 distance measurements he observed that across
100,000 Monte Carlo simulations where random bias was introduced
into the measurements, one particular measurement (observation
) was never ﬂagged
as an outlier in an
adjustment. Figure 6 below shows measurement
Marshall’s trilateration network which clearly shows that there is
enough redundancy for the four parameters (coordinates of stations
which it is estimating. It was the view of this author that there was
perhaps another reason for the lack of outlier information output by the
adjustment, which judging by the degrees of freedom (redundancy) in the
network is likely due to there being not enough redundancy to adequately
locate outliers for all measurements.
It is hypothesised by the author but not proven that the internal reliability of a network in the norm increases with an increase in redundancy. For measurements that can have a reverse or check measurement e.g. an EDM distance from to and another from to it is trivial to see a gross error in one of these measurements (although not trivial to know which measurement contains the gross error without further measurements). This presents the analyst with a dilemma: An adjustment on the whole network would likely have better redundancy than an adjustment on part of the network (given that the whole network is connected) but there are far more computations to perform in an adjustment of the whole network than a part of it. In fact, as discussed in chapter 4 the relationship between number of measurements and number of adjustment computations is in some cases exponential, which is troublesome when the network is very large.
Due to measurement noise there is always the need to weight the importance of a measurement based on an a priori assumption of precision. GPS vectors at the NSW LPI are typically given a standard deviation of (10mm + 7ppm) for least squares adjustments. This a priori assumption is carried into norm models but the scheme for weighting the model diﬀers in the method.
In least squares, the method allows a priori weighting of observations by inserting the square of each weight corresponding to measurement as entries in a diagonal matrix .
In the model, the objective function is
Due to the non-linearity of the objective function, the formulation of the problem must instead rely on a linear program, and thus the inverse matrix of squared weights would be applied as described by Harvey as follows
The ﬁrst step in formulating the norm minimisation problem as a linear program is to recognise that the objective function is not linear. To make it linear, and to also satisfy the non-negativity constraint that linear programs typically require3 the objective function can be reformulated to be
Where a residual is such that and . These constraints satisfy .
The normal (or model) equation for each parameter would be the same formulation as for least squares problems but with the residual included as a model parameter. For example, a height diﬀerence would be
Similarly for least squares, each element in the Jacobian matrix for the norm minimisation is a partial derivative of the model equation (in order to linearise the problem). Hence for the height example the Jacobian matrix is already linear, that is
For non-linear models a Taylor series of one or two terms is usually suﬃcient to achieve convergence given good enough a priori coordinates or parameters .
Since the use of the
norm adjustment in the context of this thesis is to highlight outlier
observations, the question on whether to weight the observations prior
to the adjustment must be speciﬁcally asked with regards to the
desired outcome. In a GPS baseline network, there will exist very
short lines of only a few metres and very long lines of hundreds of
The separate issue of weighting the residuals should always be done according to the expected quality of the observations. An norm adjustment will typically be done prior to a least squares adjustment, and so the scaled residuals that are an output of "tuning" a least squares adjustment will often be unavailable to the analyst. A parts-per-million component is often the a crucial normaliser for residuals because of the large variation in accumulated error over long distances, for example the scale in long GNSS baseline vectors.
As stated already, the beneﬁt of the
norm minimisation is primarily the analysis of residuals. In a survey
network adjustment with equally weighted observations (all
direct comparison of the residuals to loop mis-closes yields strong
Using an example of just height data from a NSW RMS survey on the Sydney
Anzac Bridge (see Figure 7), the results obtained from B. Harvey’s FIXIT4
raw residuals versus mis-closes is shown in Table 3. It can be seen
that in this example, the loop mis-closes and the magnitude of the
for particular observations on the loops co-incide, speciﬁcally the residuals
on lines 2-3 and 5-2 respectively.
|From||To||dH (m)||sdv (mm)||(mm)|
An minimiser identiﬁes the measurements which have the propensity to introduce the largest residuals and excludes those measurements from the computation of the parameters. In this example height network, these residuals have the same magnitude as loop mis-closes because there are no correlations between input observations and input standard deviations are being ignored.
Continuing from Equation 11, each model equation takes the form
Note that and are vectors of parameters any of which can be positive or negative. This results in the problem being unbounded in the negative direction and as such this formulation needs modiﬁcation before it can be used in a linear solver. By taking two positive variables we can say that each of the residual parameters can represented by . This also applies to the coordinate parameters , . By generalising, with coordinate parameters represented by and residual parameters represented by , a linearised formulation of the norm estimator is
This is very similar to the formulation in Equations on page in Dantzig et al  except that the basic and non-basic variables and have been replaced with the parameters and residuals and respectively.4
Note also that the objective function has the coeﬃcients eﬀectively switched, so that instead of
This is an explicit formulation of a linear program but it is still solvable by any variation of Dantzig’s simplex methods (primal, revised, primal-dual etc) and by interior point methods for linear programming problems (explored in Chapter 4).
This chapter will begin with Section 4.1 describing the Simplex algorithm followed by an example demonstrating how it works on survey height data. Subsequently the concept of interior point methods will be introduced and applied to the same example.
Use the standard form of the problem from Chapter 3, replicated below:
To re-iterate, the notation
refers to the strictly positive nature of the parameters. This is especially
pertinent in non-linear problems where at each iteration the delta
is estimated and
is often negative. When a coordinate (or delta) has a negative value, the parameter
absolute value and .
As noted by Dantzig the primal simplex method ﬁnds the optimal solution by a sequence of pivot steps from the original system of equations. Initialisation of the algorithm is achieved by converting the problem from standard form to canonical form. This is where the current solution is a basic feasible solution and:
- The objective coeﬃcients to all the basic variables are zero, that is, the objective function contains only non-basic variables;
- The constraint matrix for the basic variables form an identity matrix (with some permutation if necessary). It’s easy to see whether or not the current Basic Feasible Solution (BFS) is optimal or not, if the LP is in canonical form
The objective function
permits the initial basic feasible solution
Before continuing, it is necessary to discuss the computational time
complexity of the algorithm proposed by Barrodale and Roberts and the
simplex method in general.
Brieﬂy, computational time complexity of an algorithm is generally
assessed with regards to the rate of growth of computational time with
respect to the size of the input. This time complexity is typically described
using three metrics - minimum time or best-case time complexity
, average time complexity
time complexity .
represents the lower bound function for the run time of an algorithm
denotes the size of the input (typically an array).
represents the run
time of an algorithm
when averaged over all permutations of an input of size
This metric is a good indication of how the algorithm will
behave in the real world using real measurements, however
we should always consider the worst-case time complexity
is the upper bound of the rate of growth of computational time of
latter metric describes just how poorly the algorithm can perform in the
When considering the simplex method to solve for the
estimate as proposed by Barrodale and Roberts and its application in
this thesis in ﬁnding outliers in a large set of GPS observations, the size of
the input will always be large (over 100,000 measurements). Thus if the
rate of growth of computational time in Barrodale and Robert’s
method  is non-linear, it is of great importance to consider time
An unfortunate aspect of the simplex method on which the Barrodale and
algorithm is based is, while it performs well in most cases, there
exists a set of inputs which make the algorithm run in exponential
time. Using the notation described above, the average case
is polynomial and often linear but the worst-case
i.e. for some
Exponential time complexity was shown by Klee and Minty 
to exist in cases where a hyper-cube of constraints can lead to
is the dimension of the hyper cube. See the Appendix A.
As was raised brieﬂy in Chapter 3, there is a desire for the analyst to
preserve as much redundancy as possible in order to increase the
likelihood of properly detecting all gross errors in an adjustment. Since the
problem of exponential time complexity is very real in the application of
Dantzig’s simplex method on large sets of data (as experienced by the
author in practice) this presents a dilemma.
To balance the desire for maximum redundancy with the desire for reasonable computation time, two approaches can be taken. The ﬁrst is to ﬁnd a way to reduce the size of an adjustment whilst preserving maximum redundancy. The second approach is to ﬁnd a way to reduce the time complexity of the algorithm used to perform the adjustment. The next section and Chapter 5 address the latter approach, and Chapter 7 attempts to address the former. The two approaches need not be exclusive.
Recent developments in the ﬁeld of optimisation and operations research
have yielded several new methods for solving linear programs other than
Dantzig’s simplex method. Since the publication of Klee and Minty’s
 examples of worst-case exponential computational time, researchers
have made developments in the ﬁeld of ellipsoid algorithms employed by
Leonid Khachiyan in 1979 that run in worst-case polynomial time.
Although in the average case the simplex method far outperforms
early ellipsoid methods, the idea of iteratively moving through the
interior of a problem space to converge at the optimal solution at
a boundary yielded a ground-breaking result less than 30 years
In 1984 Karmarkar published a paper on an interior point method where
feasible solutions are ’projected’ from within a convex solution space to the
best feasible solution. The worst-case computational time was determined to
and that in practice it often performed faster than the simplex method on
large sparse problems.
The method described below is is an earlier interior point algorithm known as Dikin’s Aﬃne Scaling Interior Point Method, ﬁrst published by Dikin in 1967 with a proof of convergence in 1974. This algorithm does not possess the convergence properties of Karmarker’s algorithm but does demonstrate the concept of the central projections.
From  the following algorithm summarises the Primal Aﬃne Method
for ﬁnding an interior point solution to a linear program.
Assume that an initial feasible interior point solution satisfying is known. The steps of the algorithm are as follows:
- Initialize Counter: Set .
- Create : Set .
- Compute Centering Transformation: Compute .
- Determine the Projection Matrix: Compute .
- Compute Steepest Descent Direction: Set .
- Set .
- Test for Unbounded Objective. If report the objective as being unbounded and stop.
- Obtain :
where and is set to a number strictly between 0 and 1. Typically, is set to be between 0.5 and 0.95 for primal aﬃne methods (in most implementations it is set between 0.9 and 0.95).
- Transform Back to Original Space: Compute .
- Termination Check: If is "close" to , report as "close" to optimal and stop.
- Set and go to Step 2.
An interior point method such as the aﬃne scaling method described
above requires an initial solution on which to begin iterating. As
the reader may notice in step 3 the centering transform uses the
initial solution to project the entire solution space around the point
Since zero values in the initial solution could not be projected in such a
way (and also values on a boundary of the solution space, that is a
parameter that lies on a constraint hyperplane) the initial solution must be
interior to the solution space.
An interior point is not found trivially, and as such methods have
been developed to ﬁnd an interior point. One such method used
here is the Big M method, which adds an additional variable
to the problem
such that a
with an extra column of coeﬃcients for
. The objective function
coeﬃcient of is a very
large value denoted
(the namesake of the method). This large value of
as a penalty such that the solver will try to reduce the value of this
variable as much as possible in order to minimise the impact of
on the objective
value (which in the
norm problem is the sum of absolute values of the residuals).
The coeﬃcients of in are calculated such that, given an initial point of , all of the constraints are satisﬁed. As such, the
By setting ,
easily calculated using the above relationship to obtain the new problem
The Matlab implementation of the aﬃne solver in Appendix B was applied to
the RMS heights example used earlier in this chapter. The formulation of the
vector was the same as for the simplex method. The value of
set to 1200 for initialisation. The algorithm converged after 55 iterations,
thus it was not practical to show the values of the parameters at each
iteration in this section. However, the rate at which the parameters
converged to the accepted values are visible in Figure 8 for the station
heights and the residual parameters. It should be noted that this
algorithm yielded a diﬀerent result for the residuals as noted in Table
It can be seen in Figure 9 the value of the Big M variable drops from 1 to nearly zero very quickly, demonstrating a correct initialisation of the problem and convergence to the initial problem. The stopping criteria used was the primal-dual gap (see Appendix B for the deﬁnition in Matlab). The convergence of the solution is accepted when the value of the primal-dual gap drops below a speciﬁed tolerance (usually 0.0001).
Clearly convergence after 55 iterations is not an acceptable solution
for such a small problem. As an academic exercise, this method
was applied to larger problems and exhibited similarly slow
performance. Convergence was not achieved for a problem of
and stations (deﬁning
a problem size of )
using this algorithm.
Dikin’s aﬃne scaling algorithm was implemented in C++ using the Armadillo linear algebra library. A warm start was used whereby initial ECEF coordinates converted from geodetic coordinates (in GDA94) using Gpstk and the ellipsoid parameters of GRS80. ECEF coordinates were then translated to be centred at the apriori coordinates of the ﬁrst station in the adjustment. The matrix and vector were then normalised (scaled to approximately unity) using the following relationship
The author attempted to speed up the Aﬃne Scaling method by warm
starting the solver with input coordinates and residuals that would be
close in value to the ﬁnal solution. These were obtained from the input
bath_syd_stn.xml ﬁle in GDA94 geodetic coordinates and converted to
GDA94 ECEF Cartesian coordinates using the Position class in
Although in small problems a warm start oﬀers no advantage over a cold
start (whereby there is a phase I problem initialisation), in very large
problems using a solver with polynomial time convergence properties a
warm start can assist with providing a correct initialisation and reduce the
phase I run time by a constant (usually between 0.3 and 0.5). For
problems that can take hours to days to solve this time saving is
The next chapter will discuss an alternate formulation of the problem and will demonstrate the application of a faster solver with better convergence properties. Part II APPLICATIONS OF NORM MINIMISERS IN LARGE SURVEY NETWORKS
Further analysis of norm minimisation algorithms and results pertaining to survey networks and reliability
The standard form of the problem from Chapter 3, replicated below, is appropriate for some simplex-based solvers but suﬀers from numerical instability issues using many open-source interior point method solvers1 .
It is necessary to remodel the problem such that it is "easier"2 for an IPM solver to ﬁnd a solution. One approach is to ﬁnd a solution on the dual problem.3 By ﬁrst converting the primal problem from above to a form that allows negative parameters (as allowed by some solvers because of automatic remodeling), below
Note that the row constraints are no longer bound by a strict equality . This allows for residuals that are negative to be speciﬁed exactly in one of a pair of complementary constraints, with the other permitting the former equality with a positive surplus variable. This means
Note that, according to the rules deﬁned by the Tucker diagram (Appendix I Table
inequality in the row constraints becomes a
lower bound for the dual variables. This problem is far "easier" to solve
using Mehrotra’s primal-dual solver.
The above approach to remodeling the problem in the dual form, although
mathematically correct, demonstrates nothing of the relevance of the dual
formulation to the original problem in Equation 18. To see it another way,
consider the constraints in the primal problem as being bounded from
below by the measurements. The primal problem objective is to minimise
the residual parameters such that this lower bound still holds. In the
dual formulation we ﬁrst substitute every original parameter with
and then scale
by a parameter
such that for each column (corresponding to the coeﬃcient
the primal) in the constraint matrix the scalar product of the coeﬃcients
is bounded above by the corresponding coeﬃcient
the objective function. It intuitively follows that by maximising the values
subject to the
this eﬀectively maximises the overall "utility" of the measurements whilst
being constrained by the "utility" of each parameter.
As shown by Dantzig and Von Neumann, if the primal problem
has a feasible solution then the dual also has a feasible solution
and the value of the objective function of the dual problem
optimal feasible solution is the same as the value of the objective function
in the primal problem when it is optimal, i.e.
optimal solutions of the primal and dual. This is stated more formally in
the Strong Duality Theorem.
Strong Duality Theorem 5.0.1. If the primal system min , , has a feasible solution and the dual system max , , has a feasible solution, then there exist optimal feasible solutions and to the primal and dual systems such that .
It remains to determine how to recover the residuals from the optimal
solution to the dual problem. In summary, by the Complementary
Slackness Theorem (below) the reduced costs of the slack (or surplus)
variables in the optimal solution to the dual problem are equal to the
values of the slack variables in the optimal solution to the primal
- If in (), then the constraint in () is binding (this means the slack variable in the dual is zero).
- If the constraint in () is not binding, then .
- If , then the constraint in () is binding.
- If the constraint in () is not binding, then .
By the exclusive slackness property of the primal problem as shown in
Equation 19, assuming that one of the constraints has a zero-value
surplus value (i.e. it is a binding residual constraint) and thus the
positive value of the residual for that constraint is also binding, the
complement constraint has a non-zero-valued surplus value of exactly the
negative value of the binding residual in the ﬁrst constraint. By item
the Complementary Slackness Theorem, a non-zero positive surplus
variable in the primal problem (i.e. the constraint is not binding) infers
that the corresponding value of the variable in the dual problem is zero,
which means it is not in the dual basis and thus has a non-zero reduced
Now, when the dual solution is optimal, the value of all of these reduced
costs will be positive or zero (by the Optimality Test Theorem). The cost of
a basis exchange to add one of these variables with a positive reduced cost
to the basis of the dual solution is precisely the magnitude of the
non-zero-valued surplus variable in the corresponding row constraint in
the primal problem. Put more simply, the surplus value representing the
residual in the primal problem is exactly equal to the reduced cost of this
surplus variable in the dual problem.
This has the side eﬀect that all residuals will be reported as absolute
values by this method since the eﬀect of the exclusive slackness constraints
19 represents the familiar model for an absolute value in the dual
problem. For the purposes of outlier detection, this is not an issue because
only the magnitude of the normalised residual is considered in the outlier
The Gnu Linear Programming Kit (GLPK) is an open source library for linear programming that contains two separate solvers:
- A simplex-method (SM) solver.
- An interior-point-method (IPM) solver.
The SM solver has support for primal and dual problems. This is
demonstrated to be very robust and resilient to numerical instability.
The IPM solver uses a modiﬁed implementation of Mehrotra’s primal-dual
method for interior point problems. It is more exposed to numerical
instability than the simplex method as demonstrated by the problem in
Both solvers have pre-processing stages that remodel the problem to a standard form. This allows the user to specify inequality constraints and negative bounds on parameters which is usually disallowed by the underlying algorithms. Extra constraints are added to the model to permit the problem speciﬁcation in this way, but care must be taken to appropriately specify a model such that it will have a feasible solution and is not unbounded. Using the above dual formulation of the problem on the entire set of more than 66,000 GNSS observations, GLPK was able to converge to an optimal solution in 30 seconds on a standard laptop computer.
The focus of an argument for using
Norm minimisation for outlier detection hinges on reliability. As discussed
by Baarda, Pope and subsequently many others, the internal
reliability of a model design is measured by how large an error must be
before it is detectable. As Pope notes:
the traditional objections to least-squares adjustments of triangulation; that they cannot be relied upon to localize the error and instead "Spread it around"
which identiﬁes two critical aspects in data snooping:
- The initial detection of an error
- The localisation of the error.
The ﬁt of residuals to the Laplace probability density function (described
in Chapter 3) was carried out in Mathematica using Pearson’s Chi-Square
ﬁt test. The histogram of residuals for the initial set of observations is in
Figure 10 below. Note that there is always a "spike" in the histogram at the
zero level. This correlates exactly to the number of parameters in the basis
the remainder of the histogram representing the degrees of freedom
removing the residuals corresponding to all basis parameters the
histogram of residuals more closely ﬁts the Laplace probability density
The Pearson Chi-Square ﬁt of the Laplace probability density function was calculated for the histogram of non-zero standardised residuals. The parameters of this distribution, with an assumed mean of zero, are (or instead of , the using the model deﬁned by Branham). The sigma threshold was calculated at the 99% conﬁdence level on this distribution was solved from the cumulative density function
By performing a cursory removal of measurements demonstrating outlier
behavior from the initial set of data in two passes the resultant reduced
data set still contained outliers that "reappeared" after the ﬁrst two
passes, but the maximum magnitude the standardised residuals
() of outliers
decreased from 181.2
The number of residuals above the sigma threshold of
decreased from 4,486 to 1,698.
The histogram of "degrees of freedom only" residuals of the "cleaner" data set is visually no diﬀerent to that of the initial data set due to the sheer amount of data. Additionally, the parameters for the Laplace probability density function for these standardised residuals are essentially identical, which demonstrates that the outliers removed from the "dirty" data set imparted no bias on the network model in a minimally constrained adjustment. A visual comparison of the network graph between the initial data set and the "cleaner" set is in Figure 11.
To properly address outliers in such a large data set, it is advisable to construct an automated procedure to iteratively report on and test the measurement with the largest outlier on a network adjusted with and without the said measurement until no standardised residuals test above a speciﬁed sigma threshold. This course of action is recommended for further research and development.
In a continuation of the method proposed by Marshall  a Monte Carlo based simulation was executed on the "cleaner" data set obtained in the above section. The test was conducted by adding random uniformly distributed artiﬁcial gross errors in the range to 100 randomly selected measurements in the data set across 1000 passes. The results from each pass were analysed in Matlab using the code in Appendix D which classiﬁed the residuals of each adjustment into four groups:
- True negative: No artiﬁcial bias was added and measurement was not ﬂagged as an outlier.
- True positive: Artiﬁcial bias was added and measurement was ﬂagged as an outlier.
- False negative: Artiﬁcial bias was added and measurement was not ﬂagged as an outlier.
- False positive: No artiﬁcial bias was added and measurement was ﬂagged as an outlier.
The true negatives are displayed in Figure 12, the false positives are
displayed in Figure 13 and the false negatives are displayed in Figure
14. Note that the rate of false positives is relatively low except for
standardised residuals near the sigma threshold which identiﬁes that there is
a concentration of marginally detectable errors in the simulation
and thus implies the propensity for reliability errors in the NSW
A visual comparison of the true negatives and false negatives in Figures 12
and 14 would indicate that the distribution of the false negatives
with respect to the value of the artiﬁcial gross error is reﬂected in
observations not present in the true negatives, where the latter would
ideally display a ﬂat uniform distribution over the entire spectrum
of gross errors. The shape of the distribution of false negatives
demonstrates that, above a "noise ﬂoor" of an approximate failure rate of
detection of between 5% and 10%, the failure rate of detection increases
sharply once the magnitude of the bias drops below 0.1 metres.
Between a bias magnitude of 0.1 and 0.2 metres the failure rate of
detection does also demonstrably increase relative to the range outside
The localisation of the propensity for failure to detect gross errors as outliers has not been addressed in this thesis. However, this would be a viable avenue as a topic of further research. Perhaps the network neighborhood aggregation discussed in Chapter 7 could be of use in this area.
As postulated by Marshall  there is little known
about the pre-analysis of robust estimators such as the
One solution to reduce the overhead time complexity of an
adjustment is to reduce the problem size (or dimension) by dividing the
problem into a series of sub-problems that have the property that the
union of the outliers detected across all sub-problems is equivalent to the
outliers detected in the initial problem.
The problem of dividing a network of measurements in to a set of sub-networks (this term will be interchangeable with the term segments from here-on) is made diﬃcult by the constraint mentioned above, which in the context of this problem can be expanded into the following:
Every measurement contained in the original network that has redundancy is contained in at least one of the sub-networks with redundancy.
This thesis will investigate the eﬃcacy of dividing the network by the
detection of a basis of cycles (or loops) of measurements and to group
these cycles into segments based on the overlapping observations shared
The hypothesis of correctness of the above proposal lies in the property that we wish to preserve, that is, the set of outlier observations. The magnitude of the error of these outliers is not as important as whether or not these observations are ﬂagged as such in all cases. In order to test this hypothesis, a series of simulations will be run in order to ascertain the statistical likelihood of correctness. These simulations will consist of simple permutations of outliers in across permutations of network geometries.
The problem of detecting cycles lies in the properties of the cycles we wish
to obtain. Since a complex network can contain many more cycles than
there are edges (or measurements) in the network, we wish to obtain a
minimally spanning cycle basis for the network in order to eﬀectively
minimise the number of adjustments performed. This task can be
complicated depending on how much we want to minimise this cycle
The chosen cycle detection algorithm was achieved using the following basic process:
- Find a spanning forest in the network.
- For each tree in the forest
- For each back-edge not in the spanning tree
- Find the smallest path joining the start and end nodes of the back-edge using A* search. This path cannot contain the back-edge.
- For each back-edge not in the spanning tree
It was desired that a set of cycles be obtained from a network such that:
- Every measurement in the network that can belong to a cycle does belong to at least one cycle
- The cycles must be as small as possible.
The most eﬀective and commonly-used search algorithm for scenarios
deﬁned by a single goal or target in a search space that can always
allow a reasonable (and admissible) method of predicting the
distance from a potential solution candidate to the goal state is the
(pronounced A-star) search. Initially proposed in 1968 by Nils Nilsson
 as a faster algorithm to select paths for robot navigation,
search has persisted in popularity due to much the same reason as
Dantzig’s simplex method - the good average-case computational time
complexity and the real-world nature of the conditions under which it can
eﬀectively an iterative state-based path-traversal search whereby
a queue of possible solution candidates at various points in the
search space is maintained and ordered by the cost. This cost is
calculated as the sum of the cost (or distance) the candidate has
traversed already plus an estimate or heuristic of the minimum cost
that is expected to be incurred for this candidate to reach the goal
In the context of a cycle search in a geodetic network, this search algorithm can be applied using the following conditions and metrics:
- Nodes are stations in the geodetic network, and edges are the measurements.
- Each edge is required to belong to at least one cycle, hence every edge will incur a cycle search for that edge.
- The start node is the ﬁrst node of the edge being searched. The goal node is the second node of the same edge.
- The edge being cycle-searched is removed from the set of edges that can be selected as candidates in the A* search.
- The additional cost of a traversal is the Euclidean distance from the last node to the next node.
- The admissible heuristic is the straight line distance from the last node in the candidate to the goal node.
The algorithm for this search is summarised in pseudo-code in 15. This algorithm was eﬀectively implemented in the program LDNet. See Appendix F for a C++ implementation of the below pseudocode algorithm.
2: Let be an ascending-cost-sorted queue of edges remaining to be searched
3: Let contain a list of edges that have been searched already
4: Remove from
6: Enqueue onto with cost MAXCOST
7: while contains edges to be searched do
9: if then
10: return Path traced by until is reached.
11: end if
12: for All adjoining edges , do
Path cost plus estimated remaining distance as the admissible heuristic
17: end for
18: end while
19: end procedure
Initial testing of the L1 adjustment was done on networks of naively
grouped cycles. This yielded results that conﬁrmed that the algorithm
performed correctly but ultimately did not provide an easily identiﬁable
way of analysing the resultant residuals of the measurements in the
network as a whole.
It was hypothesised that by intelligently grouping a
subset of measurements from the whole network, an
analysis of these measurements could yield reliable information on the residuals
of a subset of these measurements. The author proposes that for each measurement
in the whole
(i.e. for ,
adjustment on a neighbourhood of measurements
of appropriate depth would yield reliable residuals for
relative to the whole network.
See Appendix G of the thesis
for an implementation of
the aggregation algorithm in C++.
The algorithmic robustness of neighbourhood aggregation would be
compromised if a station has a particularly high degree (number of
measurements to it). This phenomenon exists in the provided set of GNSS
baseline data, particularly around TS12042 which has more than 30,000
measurements to it. To allow the simplex method to solve within
reasonable time, the number of measurements should not exceed 1000 per
segment. Therefore in such cases it may be necessary to further split the
data into overlapping neighborhood partitions by taking a comb of
measurements at stations with a degree more than 500. When testing for
outliers in each neighborhood partition it is required that there are no
false negatives, therefore a rigorous method must be devised for this
As noted earlier, testing of internal reliability on a ﬁxed size neighborhood relative to the whole network did not commence during the term of research. Should another researcher be interested in pursuing this avenue then all available source code will be provided by the author.
The application of fast linear solvers to the minimisation problem in a large network has demonstrated the viability of such a device in the quality analysis stage of datum determination. It is not diﬃcult to see the automation possibilities to achieve near-real-time datum quality analysis. As noted by Haasdyk et al  to achieve the measurement density required for a dynamic datum the use of crowd-sourced survey data is almost inevitable and as such a rigorous quality analysis system would be required as a ﬁrst level assessment to provide immediate feedback to both the contributing and receiving parties.
Currently the software developed for this thesis is capable of processing
GNSS baselines but there is no support for terrestrial observations. The
latter would require the implementation of ellipsoidal model equations as
described in Harvey’s Monograph 13 in addition to support for a
geoid-ellipsoid separation description such as AusGeoid09.
Further to the above, the process of iteratively down-weighting or
removing outliers by ﬁtting the Laplace distribution via the Pearson
Chi-Square test is, at present, performed manually. With potentially
thousands of outliers in a set of hundreds of thousands or millions of
measurements it would be wise to automate this procedure by the
speciﬁcation of decision parameters for outlier removal.
Lastly, multi-epoch support should be addressed in the context of a dynamic datum. Given the propensity for localised deformation on the Australian plate the datum deﬁnition could itself be used as a large-scale monitoring survey to identify marks that have potentially moved between consecutive epochs. The likelihood of movement of marks would be statistically tested and the basis for such tests would depend on the quality of surrounding measurements at the local level.
The result of initial internal reliability testing revealed the potential for a
signiﬁcant proportion of type 1 and type 2 errors in the identiﬁcation of
outliers as measurement errors. As such it is recommended that further
testing of internal reliability should be conducted on the network
geometry of the NSW network of class 2A0 marks. Of particular relevance
is the reduction of the rate of false negative ﬂagging in areas of very low
Although GNSS baseline vector components are correlated and this was
considered during the term of this thesis, it was beyond the scope of this
thesis to investigate the eﬀect of this correlation with respect to
norm minimisation in detail.
In general, the network geometry and redundancy still primarily determine the ability to detect gross errors in survey networks. As demonstrated by the internal reliability testing in this thesis, even with suﬃcient redundancy there is still a chance that gross errors will not be localised exactly using an adjustment. However, when compared with other existing methods such as the Baarda test in least squares, the same techniques on an adjusted data set provide signiﬁcant information that when used in conjunction with existing data snooping methods can greatly assist the process of gross error identiﬁcation. Part III APPENDIX
Depending on academic background, some terminology used in this document may not be immediately familiar to the reader. Since this document spans both aspects of computer science, mathematical graph theory and traditional survey concepts, there are some terms that have diﬀerent meanings in diﬀerent contexts and some others that can only be adequately explained using the most rigorous mathematical terminology. The author has made an attempt to describe all terms in this section with references and examples in the relevant areas of study.
In graph theory, a cycle is speciﬁcally a path through a graph (or survey network) in which the start node (or point) is also the end node. This is often called a loop in the context of a survey network, however the term loop in graph theory speciﬁcally refers to a single edge (or measurement) that links a single node to itself. This latter property is never exhibited in a survey network.
As a basic property of linear algebra: If A is
there exists exactly one feasible solution for
and hence the
system of equations
is over-determined, which means that there is no single solution for
there exists as many feasible solutions as there are degrees of freedom in
is the number of parameters in the system.
An aspect of a geometric ﬁgure that is a requirement of the solution space of the Simplex method that can be mathematically determined. In a vector space, convexity can be visualised by the following statement: A vector space is convex if there does not exist any pair of vectors between which a straight line passes outside the bounds of the set.
In computer programming, a sentinel value is a special value whose presence guarantees termination of a loop that processes structured (especially sequential) data. In more complex algorithms it is also a term used to describe a variable that is used to decide the ﬂow control of the execution based on certain values.
A Norm is a type of measure or length. Typically it is used in the context of the measurement of the size of a vector or matrix. In the case of the L1 Norm of a geodetic network, the set being normed is the set of residuals.
A linear program is said to be degenerate or to become degenerate when basic variables of zero value can result in pivot operations for which there is no improvement in the objective value. This would then cause the next pivot operation to use
A linear program can encounter conditions whereby it will run indeﬁnitely and never reach an optimal solution. This occurs when a particular set of basic variables is encountered more than once. Since linear programming is deterministic, the pivot operations on a particular basis will always be the same, hence encountering a basis more than once means that the program will "cycle" through the same sequence of sets of basic variables in an inﬁnite loop.
In the context of a network (or graph) of vectors, the author has chosen the term Non-Sparsity to represent the likelihood of a single gross error in a measurement near stations of very high degree being hidden or spread across the residuals of many neighbouring measurements.
An -dimensional cube. The deﬁning characteristic when considering a hyper-cube problem-space of a linear-programming problem is that each constraint does not bypass any other constraint and that the correct pivot choice is often not the ﬁrst one chosen by the simplex method, hence increasing computational time complexity.
The mathematical representation of a collection of nodes and adjoining edges whereby edges do not have a direction of ﬂow, i.e. paths that are traversed in this graph can ﬂow in either direction across an edge.
A spanning tree is a single connected tree in a graph, rooted at any node, whereby every node in the graph is visited once and only once by a path connecting the root node to that node. It is thus a property of a spanning tree that the spanning tree contains no cycles. A graph whereby a path can be found between any pair of nodes is connected, and thus a single spanning tree can be found for this graph.
The below is an example of a single measurement input to LDNet.
|Minimize Primal Objective||Maximize Dual Objective|
|Objective Coeﬃcients||RHS of Dual|
|RHS of Primal||Objective Coeﬃcients|
|Coeﬃcient Matrix||Transposed Coeﬃcient Matrix|
|Primal Relation:||Dual Variable:|
|() Equation:||unrestricted in sign|
|Primal Variable:||Dual Relation:|
|unrestricted in sign||() Equation:|
 W. Baarda. A Testing Procedure for Use in Geodetic Networks. Number v. 2, no. 5 in A testing procedure for use in geodetic networks. Netherlands Geodetic Commission, 1968. ISBN 9789061322092. URL http://books.google.com.au/books?id=4Ho-SwAACAAJ.
 I. Barrodale and F. D. K. Roberts. Solution of an overdetermined system of equations in the l1 norm [f4]. Commun. ACM, 17(6):319–320, June 1974. ISSN 0001-0782. doi: 10.1145/355616.361024. URL http://doi.acm.org/10.1145/355616.361024.
 R.L. Branham. Scientiﬁc data analysis: an introduction to overdetermined systems. Springer-Verlag, 1990. ISBN 9780387972015. URL http://books.google.com.au/books?id=USLvAAAAMAAJ.
 W.F. Caspary and J.M. Rueger. Concepts of Network and Deformation Analysis. Monograph (University of New South Wales. School of Surveying). School of Surveying, The University of New South Wales, 1987. ISBN 9780858390447. URL http://books.google.com.au/books?id=i-W5SAAACAAJ.
 I. Castillo and E. Barnes. Chaotic behavior of the aﬃne scaling algorithm for linear programming. SIAM Journal on Optimization, 11(3):781–795, 2001. doi: 10.1137/S1052623496314070. URL http://epubs.siam.org/doi/abs/10.1137/S1052623496314070.
 M. Cen, Z. Li, X. Ding, and J. Zhuo. Gross error diagnostics before least squares adjustment of observations. Journal of Geodesy, 77(9): 503–513, 2003. ISSN 0949-7714. doi: 10.1007/s00190-003-0343-4. URL http://dx.doi.org/10.1007/s00190-003-0343-4.
 G. Dantzig. Linear programming and extensions. A Rand Corporation research study. PRINCETON University Press, 1965. ISBN 9780691059136. URL http://books.google.com.au/books?id=2j46uCX5ZAYC.
 WE Featherstone, JF Kirby, Christian Hirt, MS Filmer, SJ Claessens, NJ Brown, Guorong Hu, and GM Johnston. The ausgeoid09 model of the australian height datum. Journal of Geodesy, 85:133–150, 2011.
 email@example.com. Common ellipsoid parameters, May 2013. URL http://www.ga.gov.au/earth-monitoring/geodesy/geodetic-datums/historical-datums-of-australia/common-ellipsoid-parameters.html.
 Michael Grant and Stephen Boyd. CVX: Matlab software for disciplined convex programming, version 2.0 beta. http://cvxr.com/cvx, September 2013.
 Joel Haasdyk and Craig Roberts. Monitoring station movement using a state-wide simultaneous adjustment of everything - implications for a next-generation australian datum. In Proceedings of International Global Navigation Satellite Systems Society IGNSS Synopsysm 2013, page 15, July 2013.
 SamwelSimon Katambi, Guo Jiming, and Kong Xiangyuan. Applications of graph theory to gross error detection for gps geodetic control networks. Geo-spatial Information Science, 5: 26–31, 2002. ISSN 1009-5020. doi: 10.1007/BF02826471. URL http://dx.doi.org/10.1007/BF02826471.
 A. Khodabandeh and A. Amiri-Simkooei. Recursive algorithm for l1 norm estimation in linear models. Journal of Surveying Engineering, 137(1):1–8, 2011. doi: 10.1061/(ASCE)SU.1943-5428.0000031. URL http://ascelibrary.org/doi/abs/10.1061/%28ASCE%29SU.1943-5428.0000031.
 D.C. Lay. Linear Algebra and Its Applications. World student series. Addison-Wesley Longman, Incorporated, 1997. ISBN 9780201824780. URL http://books.google.com.au/books?id=4OanQgAACAAJ.
 Andrew Makhorin. Gnu linear programming kit, version 4.53, 2013. URL http://www.gnu.org/software/glpk/glpk.html.
 J. Marshall. L1-norm pre-analysis measures for geodetic networks. Journal of Geodesy, 76(6-7): 334–344, 2002. ISSN 0949-7714. doi: 10.1007/s00190-002-0254-9. URL http://dx.doi.org/10.1007/s00190-002-0254-9.
 John Marshall and James Bethel. Basic concepts of l 1 norm minimization for surveying applications. Journal of surveying engineering, 122(4):168–179, 1996. doi: 10.1061/(ASCE)0733-9453(1996)122:4(168). URL http://ascelibrary.org/doi/abs/10.1061/%28ASCE%290733-9453%281996%29122%3A4%28168%29.
 S. Mehrotra. On the implementation of a primal-dual interior point method. SIAM Journal on Optimization, 2(4):575–601, 1992. doi: 10.1137/0802028. URL http://epubs.siam.org/doi/abs/10.1137/0802028.
 A.J. Pope, Geodetic Research, Development Laboratory, National Geodetic Survey (U.S.), and National Ocean Survey. The statistics of residuals and the detection of outliers. NOAA technical report. U.S. Dept. of Commerce, National Oceanic and Atmospheric Administration, National Ocean Survey, Geodetic Research and Development Laboratory, 1976. URL http://books.google.com.au/books?id=tHUeAQAAIAAJ.
 Chris Rizos. Principles and practice of gps surveying, January 2000. URL http://www.gmat.unsw.edu.au/snap/gps/gps_survey/principles_gps.htm.
 Conrad Sanderson. Armadillo c++ linear algebra library, June 2013. URL http://arma.sourceforge.net/.
 Brian Tolman, R. Benjamin Harris, Tom Gaussiran, David Munton, Jon Little, Richard Mach, Scot Nelsen, and Brent Renfro. The GPS Toolkit: Open Source GPS Software. In Proceedings of the 16th International Technical Meeting of the Satellite Division of the Institute of Navigation, Long Beach, California, September 2004.
 Henry Wolkowicz. C&o 350 linear programming course notes, fall semester 1995, 1995. URL http://orion.math.uwaterloo.ca/~hwolkowi/henry/teaching/f95/350.f95/matlabfiles/affine.m.gz.
 Y. Zhang. Primal-dual interior point approach for computing l-1-solutions andl l-inﬁnity-solutions of overdetermined linear systems. Journal of Optimization Theory and Applications, 77: 323–341, 1993. ISSN 0022-3239. doi: 10.1007/BF00940715. URL http://dx.doi.org/10.1007/BF00940715.