I suppose the big problem is knowing when data is bad.
Strictly, data is data. Period. The thing we are most interested in is accuracy (and its associated partner, precision - but leave that for another day).
In a scientific experiment we design the experiment to focus on the phenomenon we are interested in. The design, the execution, the construction of the apparatus all have a direct bearing on accuracy even though we have previously defined the parameters we are interested in. Statistical methods are therefore employed to be able to state (with an attached degree of confidence) that this hypothesis or this outcome seems to be a satisfactory explanation of the phenomenon under investigation.
In mapping we are at the mercy of trees, rain, snow, the gps equipment itself and so on for obtaining raw data. There is no way there is any parallel with the design of an experiment and so we cannot say that any particular measurement of a position is within a particular degree of accuracy. We are at the receiving end, not the defining end.
So how can we tell if this data is of any use whatsoever? We cannot say that the median of multiple tracks is probably the “correct” route because everyone who passses that point is shadowed from a part of the sky by this forest, those trees, that building, or whatever - the result is that the trace can be consistently out in one particular direction. So the median isn’t a guide to “best fit”.
Only when we have recourse to an alternative source of verification can we even begin to attach any kind of accuracy to the raw data. Satellite is the obvious; other maps is another, sketches made during a personal survey are a third (though in the case of the latter, features can only ever be relative to one another).
Even if I have a very clear idea of the gps trace in relation to the track I have followed I cannot say I am on solid ground: the zig-zag trace I obtain clearly does not follow the straight road I walked - so the data is already circumspect. But even if I draw the line of best fit through the data points, how do I know if that line should overall be shifted 10m NSEW? I don’t.
So we do the best we can, we clean up the worst of the most obvious exceptions and we consign to the future the job of sorting the wheat from the chaff.
Tools that would help: a preload analysis of the gps trace compared to those already uploaded. A bit like astronomical image-stacking software - we know roughly where it is - and “roughly” will have to do for now.
The ultimate arbiter? Probably satellite - but we don’t have the rights to use that data freely. At least not yet and not to the precision we’d probably like.
Ric