The syntax is standard JSON and your sample appears fine. I should add a blank template to make this easier.
The script checks all matches. If something matches both a variant and a brand, variant wins. If something matches two unrelated branches or two variants, errors are thrown and my manual intervention is required.
Matching is completely case and whitespace insensitive. bool and Bool are considered the same.
I am considering writing the “matching algorithm” explanation in a more friendly manner.
if a POI’s amenity=* or shop=* does not already equal the matched brand’s amenity or shop tag, then the POI is left alone and a warning is put in a log for me to manually inspect.
I put considerable thought into these edge cases, but the documentation is probably a bit tricky to read as of now.
Russian names are based on my personal transcription rules, which may not be optimal
Shufersal Big - according to news articles, this variant is canceled and converted to “Deal”, so added “Big” as alt_find for “Deal”.
Tiv Ta’am “in the city” - on their website, the branches in Hebrew are named with only a “סיטי” suffix, so that’s what I used.
Yesh - it appears (from their website, I have no personal knowledge) that the brand is Yesh BaShkhuna. It has a variant - Yesh Hesed. There doesn’t seem to be only “Yesh”.
Eden Teva Market - on their website, the branding is “Eden Teva” only, without the “market”, so that’s what I used.
When searching for “Mister Zol”, I came across multiple news articles that it was sold to and became “Coop Shop”, so I added it as alt_find.
Some of your alt_finds are redundant: e.g. “SuperPharm”, “Super Pharm”. There is a normalization for template values, which ignores upper/lower case spaces, dashes.
I just added all the spelling/formatting variations that I saw being used, I wasn’t aware that spaces and dashes are ignored. Redundant is better than missing, right?
Excellent job, @tdctdctdc. I tried a test run and it seems to work well, and I can see no major problems. Would you like to make further changes before I run this?
I wouldn’t let the bot automatically assume those are pharmacies. Some of them are probably not. Perhaps we should drop the Makkabi and Clalit pharmacies for now.
If you think it’s best. I agree about Clalit - since they don’t have a separate brand for their pharmacies, there is a potential for errors. As for Maccabi - I think if the POI is named “Maccabi Pharm” - it’s very likely it’s really the pharmacy and not the clinic.
I think the algorithm needs several improvements and simplification before further runs. It’s a bit bulky, and it cannot handle certain cases (e.g. AM:PM is either marketplace or convenience, and Clalit can be a clinic or a pharmacy).
You could exclude problematic cases (like Clalit and AM:PM) completely, or perhaps apply only name changes and not touch the existing value of amenity=*.
There are more issues. e.g. the bot insists on modifying names that shouldn’t be modified and I manually intervene. And the code is hard to follow.
Since brand editing volume is low, a human can manually track them. I’m considering a simpler CSV scheme, where the bot a generates a CSV with changes and with suggested default values according to templates, and the humans decide which ones to copy. The decisions are saved in the CSV and fed back to the bot.
This is just one of several ideas I’m brainstorming. I might just fix the code.