Feelings around OpenStreetCam?

Street-view Armchair Mapping.

In the case of Mapillary, three quarters of the images come from sources completely unrelated to OSM. This means the OSM community gets access to a massive amount of imagery for mapping.

Edit: I probably misunderstood your post. Sorry.

So we would only have to set up a service for those people that do not like the license, or any other aspect of Mapillary/OpenStreetCam ? That’s why I host the pictures I take on SmugMug. I do not need any other service.

In practical terms the same reason it is valuable now and the same reason Google use their StreetView imagery for reCAPTCHA; automation of feature tagging using computer vision (CV). I looked for the CV part of the OSC website source code (after you upload and it goes “processing” to blur things out) but I could not see it. E.g. topographic detail such as road widths, house numbers.

If you’re asking why OSM should bother trying to retain some level of control over the dataset I think that unless there is some legal agreement to suggest otherwise, there is a risk Telenav could start to close & profit from the uploaded data using the same models as Mapillary. Or the development team could die in a plane crash on the way to a conference or some other such scenario. If the project died for any reason and the data died with it it could p**s off a lot of contributors who decided to engage with the project because of some brand association with non-profit OpenStreetMap. Especially as they may not have backed-up their data, assuming it was in safe hands.

Regardless of methodology (e.g. surveying vs arm chair mapping) I think, philosophically, the reason people like to contribute to projects like OSM and Wikipedia is because of a desire to make information more widely and freely available, would you not agree? If I am correct in my assumption then it seems like a no brainer that such a project should seek to maximise those goals because it would improve adoption and potentially open up avenues unknown for people with similar open data aspirations outside of the OSM project, e.g. Wikimedia Commons.

This is a fairly weird question. What is the value of recent street and path photography to OSM? We create maps by using them. It is usually not possiblye for a given individual to cover whole countries, therefore we habitually use remote sensing, including datasets acquired by others (who, in turn, do not intend to do the mapping). We may use it as a wholesale armchair mapping, or more often to fix up already surveyed data, like missing surfaces, missing house numbers, identifying object on satellite imagery, etc.

The whole point of such services (like the discontinued panoramio, mapillary, or osc in question) is to have a cental, georeferenced, searchable and retrievable collection of imagery.

You may upload your imagery wherever you want, it’s useless for us, since we cannot possibly download a planet worth of image collection and start looking them one-by-one. Even using tools to geosearch may be challanging (I am using digikam to filter a 100000+ image collection and it requires a nice chunk or RAM to get it done, and I don’t know what would happen when I tried to filter the few ten millions of images uploaded to mapillary just in the surrounding area).

Mapillary offers both legal and technical means to retrieve imagess inside a bounding box, and I would say it works pretty well. Lots of their code is open. Their issue handling is pretty good, bugs usually get fixed fast. Has a useful JOSM plugin.

OSC, well, I do not yet have much experience, the code seems to be quite simple yet and there is no API. The android client seems to be mostly open source, but nothing more, really, just some small tools. It does have a JOSM plugin as well. But OSC do not seem to provide any technical mean to retrieve imagery en masse.

While the idea of having a collection of street imagery available not only for mapping, but also to write your own Mapilllary-“killer” is appealing, I wonder whether that should be done under the OpenStreetMap brand or umbrella. Wikipedia also splits up all data in different projects like Commons, Wikivoyage, Wikidata, etc.

When I wrote “What would be the value of such an image collection to OSM ?”, I meant why should OSM or OSMF release such dataset? I think it is better to have a separate organisation doing this. Let OSM/OSMF focus on map data, not on street imagery.
OSM/OSMF does not deal with aerial imagery, navigation apps, etc.

p.s. Any of those image collection should protect the privacy of the submitters better, by offering the availability to make it possible to hide your name from the pictures one contributed.

I may be wrong I believe Escada meant to ask, “What is the value of replicating the database of Mapillary/OSC”.

Ah, I see what you mean now. I think there is a sort of existential difficulty in concerning what is ‘map data’… in some senses an image could be seen as embryonic map data in the same way that a GPS trace is. Though I can understand if OSM would not want to concern itself with non-geometric sources of map data.

It’s a fair point about the scale of the data and the ability to process it. Considering the power of the new JavaScript APIs (Video, WebGL, Canvas) I wonder if processing could be shifted from servers to web clients (view this example in Firefox), metadata could be isolated from the video source and videos could be uploaded anywhere.

E.g. infrastructure for a decentralised project could be something like this:

• OSM could host geocoded & timecoded metadata for dashcam recordings using a standard like https://w3c.github.io/webvtt/
• Dashcam recordings could be uploaded to some service like YouTube or WikiMedia Commons.
• Coders could create web tool process the above for specific feature data (e.g. “traffic island”), they can be hosted on OSM site, github.io or wherever.
• Users could process areas of their interest.
• Could potentially use man-in-the-middle or customised app clients to complement rather than compete with OSC and Mapillary (e.g. upload goes to two places instead of one).

As a general trend in the information age the ability to processing and storage store large amounts of data is becoming less problematic over time but commercial ‘gatekeeping’ is becoming more problematic. Things like API limits and IP laws can be inhibitive to the development of FOSS projects. Also, sometimes large companies will buy-out competitors for the sake of eliminating competition; it worries me that a lot of people could upload a lot of data and then one day everyone may have to start again because it was only stored in one place and that place is taken offline or paywalled.

Oh! I sense some serious terminology problem here. You meant OSMF instead of OSM then.

So the original question was whether the OSM community want to mirror any or both of the services. To that I’d say from the longevity standpoint external mirrors are definitely a bonus, provided both the licenses and the technical possibilities allow it. It may be the case for Mapillary; so far OSC seem not to allow such activity even on theoretical level. The storage requirements are right now around 590 TB for Mapillary by my guesses so it’s not a trivial task to accomplish [approximately $100,000 - $150,000 to do it at home {geeky details: I’d use Supermicro NR X10 4U72 Ceph Data N 432TB times 3}, plus electricity bills, which is definitely doable for a larger community]. Basically I’d say partial local mirrors could be created, but I wouldn’t hold my breath.

As for doing it under OSMF I’d say no. This is a fairly different project, with different goals and requirements. I would be glad if either mapillary or telenav would earn huge amounts of money by using the imagery and in exchange they would make it as simple and open as possible to retrieve the imagery. I am gladly helping them to get rich if they provide the service for that.

So, generally, I would say we rather convince these companies to keep it open and cooperative rather than replicating them: they have their stakes in making the service better to get more images.

As a safety net I would really prefer some assurance that in case of closing, bankrupcy or boredom they would guarantee the possibility for the community to retrieve the imagery before it gets annihilated. That’s not a simple legal task, though.

For images there’s not much processing required, and video indexing can be done at client side right before uploading.

The biggest problem I see is in, as always, the safety and longevity of storage. Storing on YouTube means absolutely no assurance of anything: they may delete it without even thinking about it. Commons isn’t fit for massive amounts of WP-unrelated content. So, basically, the first step is to get a reasonably safe storage to point to.

You are right that separate metadata may be manageable… to some extent. For 150 million images (Mapillary is around that) it’d need around 32 bytes for basic indexing per image, which is 4.8GB by itself. It is not for users, though, but any server can handle lookups in no time. Image retrieval may not be trivial either, considering average image size (for my uploads at least) are around 4MB per image; thumbnails either require massive server processing power or definite amount of additional storage. And for videos it’s even larger.

MITM upload and download is really simple provided all the legal and technical obstacles are removed. We’re not quite there yet…

Definitely an acute problem, especially since both companies acquire rights well beyond cc_by-sa. We would need assurances that they’re not allowed to take away the content without means to freely access and retrieve it.

That’s interesting to know – not as bad as one might think but still non-trivial. I also thought about some sort of BitTorrent based network of seedboxes but that would bring complications of its own and probably raise barriers to participation. I think there are be potential to reduce storage requirements by using the quite beautiful BPG compression algorithm which doesn’t take the space of PNG and doesn’t create the same sort of potentially troublesome compression artifacts that JPEG does.

Yeah, it does seem any project would be a non-starter without storage. If the goal of the project was only CV for attribute data rather than street photography ephemeral uploads may not be so bad. At least if metadata was centralised any interested party would be able to rip from YT – albeit probably not legally – videos as and when they are uploaded.

Yeah, exactly. I would think a gentleman’s agreement would be find if there exists a relationship of trust… that’s partly why I was wondering what community relations were like. There’s certainly not been an tsunami of bitterness from the forum members but then there hasn’t been overwhelming indications of support either.

If no one from Telenav chimes in I might e-mail them to ask if we can have up-to-date code repos. If the repos were up-to-date it would make it easier to hack in things like multiple upload destinations. Alternatively, I could look at that which Mapillary has opened up and see if there is something to work with there.

You do realize that Mapbox is also collaborating with the OSM foundation quite closely? For example, the default iD editor was developed (mainly?) by Mapbox, and I believe a large chunk of donations to the OSMF is from Mapbox.

:roll_eyes:

On one side, I think that this is too far outside the prime mission of OSMF that it should probably not be investing large sums in it. On the other hand, unless there is a rock solid contract giving some sort of escrow access, if the business gets into difficulties, its financial advisers will tell it to try and make money out of all of its assets. This has been evident with software patents, where companies in difficulty suddenly start enforcing their patents,including some quite widely used algorithms.

In terms of the discussion of automated mapping from the images, I think anything that makes the on the ground mapper redundant is likely to lose OSM a lot of its core supporters, and just leave those who want a map without paying for it. A purely commercial business might well do that, but a community, crowd sourced, project really should be loyal to the people that made it possible.

Sorry, I don’t understand the comparison; their repos are up to date and the ‘supply chain’ of data does not create a gatekeeper out of them.

IP law strikes again.

I don’t see it making the ground mapper redundant I just see it as a way of automating the tedium. I could go out in my town and measure the width of every road or write down every house number but it would take me hours. Give me a tool to automate it though and I would gladly start taking suburban routes around town just to gather more data. Similarly I would be happy to code a game or something which allows people do reCAPTCHA-like training. Richer data would allow for more beautiful cartography.

Strangely enough, walking around recording house numbers and verifying street names is often my daily exercise routine. :slight_smile:

My gripe with OpenStreetCam is that their app and upload mechanism just hasn’t worked on either my old or my new phone. After lots of tries I have basically given up trying to make it work.

Also, the reason that ReCAPTCHA use street number images is that they are difficult for machines to read. It wouldn’t surprise me if what they are actually doing is throwing in the occasional unrecognized number and actually learning how people recognize it. People expect to get a proportion of these images wrong., and maybe it even trusts some low risk users to do the initial recognition. (There may also be an element of Google having vastly better image recognition than organised crime, or other enterprises, such as OSM.)

In the UK, as well as the large variety of presentation of the numbers, in my experience only about one in three houses still has a number on display.

I’d certainly agree that some people would find cataloguing street numbers much more interesting than blinding taking geotagged photographs.

And at the same time you could map parks, playgrounds, benches, one way streets, waste bins, small paths, new developments, etc. etc.
:slight_smile:

Yes, you are quite correct there but even fragmented data is useful because in some cases, e.g. terraced housing, the rest can be interpolated (useful for geocoding, for instance). I am just throwing house numbers out there as an example but are heaps of features which could be recognised using CV to improve attribute data, thus cartography, thus the map. E.g. whether a tree is coniferous or deciduous.

I am not sure I understand?

when you are out collecting data about house numbers, why don’t you collect data about all the other items as well ?

Sorry, I am still struggling to understand… I was trying to suggest automation data collection could both increase the efficiency of volunteered labour and lower the barriers of participation.