Edit GPX tracks b4 upload

Thingomy · April 7, 2008, 9:55am

I’m new to this OSM thing, and have been asking myself the same question – I have files with 2500+ points, and need to identify the bits where I was indoors and remove them.

The long term solution I suppose is to turn the kit off when inside, but it would make the system more consumer friendly and “open” if there was a more elegant solution to this.

I spent half a day yesterday in ex$el doing statistical analysis on the data, and was still unable to identify the cloud of values around the building (without puling the coordinates out and cheating) – note that for this trace specifically I only have GPX data to go on (long story)

One idea that did strike me is – when using NMEA logs, which I assume most people are, the HDOP/VDOP, number of satalites and some function of SNR should give some good indicators of signal quality, it may be worth investigating a way to store some of this data in the tracks that are uploaded, then give an option in JOSM to only plot points that are over a certain quality, or show them in a lighter colour.

I don’t know how important this is, but some tracks are going to have some trash in them somewhere.

Ideas?

Ben · April 7, 2008, 1:15pm

What can consistently be said about the pointless bits of data, that could be used to filter it?

Are the majority of people storing NMEA logs? I’m not, and never have.

I have thought the same though in the past about lighter data/bad data. Data which was made from a low amount of satalite signals could be lighter or more transparent.

In the future I wonder if the amount of gpx points in small areas will be come so high that it’s easier to generate images from the data, and update these on uploading a new route, which people then download to programs like JOSM. Removing rubbish from theses as a moderators tool would be as simple as a rubber tool in the most basic graphics programs then.

emj · April 7, 2008, 9:38pm

I upload everything I have, this has its drawbacks but I think it’s ok.

Lambertus · April 8, 2008, 9:49am

The easiest way is to find an GPX editor that can show a map on which the GPX points are plottes (like the MapSource application that comes with Garmin GPS’s with OSM background maps, but there are many other applications around). By zooming in on the GPX traces I can easily determine where I stayed at one place for a long time, where I was indoors or where the tracks had poor receiption. I usually try to filter all those ‘bad spots’ out before uploading and it won’t take long when performed every time after tracking.

emj · April 8, 2008, 10:45am

Am I the only one who thinks this “bad” data actually should be uploaded? It’s always a good idea to have more data.

Lambertus · April 8, 2008, 11:38am

I don’t see the ‘good idea’ on data that is clearly wrong.

E.g. for some reason my track is out (compared to at least 5 other traces) by about 20 meters kriscrossing around while the road is perfectly straight… that track is only usable for saying ‘that track is bad aparently’ which doesn’t add anything to the project other then space requirements on the db server.

emj · April 8, 2008, 12:40pm

Well if you have two GPX tracks on the same spot and both are bad, then you can probably guess something about the environment around it. Of course I understand your problem with it, it would be good to be able to hide this kind of data, but deleting is so final.

Lambertus · April 8, 2008, 1:12pm

That’s why I have two directories to store my tracks in: ‘raw’ and ‘sanitized’.

sparky_lad · April 8, 2008, 1:18pm

Bad data is pointless - it adds to server-space requirements, adds clutter and can mask perfectly good data. I have a track from last year I’d seriously like to delete (somehow it’s become orphaned - no longer appearing as one of my tracks in my list) and all it does is mask the better and more recent data.

Three things:

First, step 1, if you know data is poor - for whatever reason - it should be filtered out. Mapsource, Google Earth can all import gps tracks and it’s possible to see whether the gps track is any good or not. If it’s bad, don’t upload it.

Second (step 2), how bad is bad? More to the point is what you have better than what is there? OK, so you can filter out the worst offenders in your data set (step 1), but the next stage might be to see what (public) tracks are have been iuploaded to OSM already. If there are tracks in OSM that appear nice and tight and yours are all over the place (zig-zags on a perfectly straight road, for example), then I see no point in adding noise which can only mask those perfectly good tracks. So keep them private or edit them further before making public.

Third, It would be useful to be able to switch tracks on and off selectively and individually on the Potlatch Editor because the tracks are so thick. Multiple tracks from multiple contributors can create an area of pale blue some 20m wide in places. Here JOSM wins - neat little grey dots - much better. But the ability to turn on and off the tracks in an area would be useful in ANY editor.

I use NoniGPSPlot (free) to capture data, then use GPSTrackMaker (free version) to delete data that is clearly wrong or just plain flaky. I never edit points, never run track-reduction or optimisation, I only ever delete trackpoints. The cleaned up gpx file(s) are then uploaded.

Lambertus gets my vote.

Richard · April 8, 2008, 3:38pm

Shift+click the GPS icon to show only your tracks.

Or, alternatively, use the ‘edit’ link by each trace (in the GPS traces display) to preload the whole of that trace.

Ben · April 8, 2008, 5:55pm

emj: I don’t see any need for bad data. I remeber seeing some town in the cotswolds/england where somebody seems to have uploaded what appears like them playing a football match with a gps in there pocket and it’s just a pain, and has no benefit.

I use mapsource as Lambertus said about and it’s very easy to filter off the rubbish which isn’t beneficial to anyone/osm. It takes seconds to do, so I don’t see much issue with it. It’s also ratehr easy to turn a gps on and off, so when I stand around for a while it goes off…likewise for when I start/finish a journey. Not really hard to do.

sparky_lad: In josm you can just turn tracks on and off.

emj · April 9, 2008, 5:56am

At least Lambertus sees why I want to save those logs, and I am slowly acknowledging why it’s a good idea to edit before upload. So I would like to ask everyone. don’t delete the raw logs they can be usefull.

Ben · April 9, 2008, 8:43am

What data people have on there own computer is completely up to them obviously, so of course I don’t object to them keeping hold of there raw data. I do as you said, and have the files saved from gps, then have a folder of files which are sorted ready for osm.

emj · April 10, 2008, 7:36am

Yes but that is my point if you only store it on your own computer it’s completely useless, the “bad” data is only good if you have it in large quantities. One track isn’t enough.

So what would it take to get you people to upload you GPX logs unedited?

Lambertus · April 10, 2008, 7:54am

Ofcourse there are bits of road that I frequently visit while tracking which results in dozens of traces for that road. When displaying all those traces a thick ‘white band’ having a width of about 10m appears where the road approximately is. Normally one would interpret the tracks and put the road in the middle of the white band, right?

Now, please tell me what use there is for uploading (part of) a track that has an offset of e.g. >20m (thus falling at least 10m outside the so called white band). The track appears to go through houses perhaps almost become part of the white band that indicates another road. Why and how would the project benefit from such bad data?

Ben · April 10, 2008, 6:36pm

emj: I’m not talking about bad data, in the sence that it’s data that is a generated when driving fast through a forest in a gauge in September time. I’m referring to data accumulated when people sit down at a park bench and doze for 20 minutes leaving a dence collection of dots with no advantage to OSM.

In fact if it is woodland in a gauge (As iin first example) then I would encourage more data than usual, since most people would get bad reception there, so you require the entire range of bad data to create a good map from the mean route.

emj · April 11, 2008, 9:25pm

Well my motto is, better bad data than no data, and best of all would be data tagged as bad.

First off, bad data that might be usefull, created by bad fixes, device bugs, signal noise. This kind of data is good to have because it can help in telling you what error level the GPS might give you. So I can say that even though I was on this road, the GPS told me I was on this other road, if you have these sort of things you could theoretically improve your software to give you a better location.

Second, bad data created by human error, taking a brake, going into a house etc. This kind of data would be wonderful to have in a central repository because you would be able to find places where people stopped to have a break. And if you tagged it you should be able to recognise this in newly uploaded data, and semi-automatically remove it from the DB.

Tired…

sparky_lad · April 12, 2008, 6:31pm

I suppose the big problem is knowing when data is bad.

Strictly, data is data. Period. The thing we are most interested in is accuracy (and its associated partner, precision - but leave that for another day).

In a scientific experiment we design the experiment to focus on the phenomenon we are interested in. The design, the execution, the construction of the apparatus all have a direct bearing on accuracy even though we have previously defined the parameters we are interested in. Statistical methods are therefore employed to be able to state (with an attached degree of confidence) that this hypothesis or this outcome seems to be a satisfactory explanation of the phenomenon under investigation.

In mapping we are at the mercy of trees, rain, snow, the gps equipment itself and so on for obtaining raw data. There is no way there is any parallel with the design of an experiment and so we cannot say that any particular measurement of a position is within a particular degree of accuracy. We are at the receiving end, not the defining end.

So how can we tell if this data is of any use whatsoever? We cannot say that the median of multiple tracks is probably the “correct” route because everyone who passses that point is shadowed from a part of the sky by this forest, those trees, that building, or whatever - the result is that the trace can be consistently out in one particular direction. So the median isn’t a guide to “best fit”.

Only when we have recourse to an alternative source of verification can we even begin to attach any kind of accuracy to the raw data. Satellite is the obvious; other maps is another, sketches made during a personal survey are a third (though in the case of the latter, features can only ever be relative to one another).

Even if I have a very clear idea of the gps trace in relation to the track I have followed I cannot say I am on solid ground: the zig-zag trace I obtain clearly does not follow the straight road I walked - so the data is already circumspect. But even if I draw the line of best fit through the data points, how do I know if that line should overall be shifted 10m NSEW? I don’t.

So we do the best we can, we clean up the worst of the most obvious exceptions and we consign to the future the job of sorting the wheat from the chaff.

Tools that would help: a preload analysis of the gps trace compared to those already uploaded. A bit like astronomical image-stacking software - we know roughly where it is - and “roughly” will have to do for now.

The ultimate arbiter? Probably satellite - but we don’t have the rights to use that data freely. At least not yet and not to the precision we’d probably like.

Ric

Ben · April 12, 2008, 10:32pm

The mean of all the tracks may not be correct as you said, but it’s going to be correct for people using gps’s as it’s the average gps track. Although the intention should just be to get it ‘really correct’ rather than just suitable for gps’s. You also would have to consider the value of each track though, so that tracks recorded from a slow moving source where more influential on averaging.

But… What I tend to find is as I go through an area of bad reception, is that the path becomes more and more off, but retains the general shapes/curves of the route. If I did the route many times from the same point the tracks would gradually fan out.

Then If it is particularly bad I will take that route from the other angle. This will give good reception at both ends. I can draw the shape from the routes from each angle. Rotate them to line up with the accurate readings at both ends, and draw the mean. (I wonder if the electronic compass helps here?)

Another give away is the spacing of the dots. Firstly how frequent they are all the way along…i.e. in a car or walking. Walking will give better data. But secondly looking back to see if it has kept a fixed signal. So If you have 10 routes scattered around where the road goes, stick with the route that consistently retains it’s usual frequency.

This is all guess work, it’s not guaranteed, or a perfect formula which your (ric) post seem to suggest your looking for, but using this method I think most roads are relatively accurate.

For areas like woods though which I have found to be the worst, I think the best option is just to map them around this time of year. Avoid having to understand a splodge of dots, by getting the data before the leaves come out.

emj · April 14, 2008, 7:05am

I agree, but we don’t know that… Since we have so few tracks, you can only say that it’s correct for people passing at the times when the tracks where recorded. But in someway almost all of my tracks have been ok, I’ve been able to draw good and accurate maps even though the tracks weren’t good. That’s possible to judge now when we have satellite images.

If there were more bad data though…