OSM Hack Weekend October 2023

Just two weeks after the OSM Hack Weekend in Karlsruhe I found myself at yet another OSM Hack Weekend, in Berlin hosted by Wikimedia Germany. This time with a bit more focus on infrastructure and integration/interoperability topics for me, rather than features immediately visible in KDE Itinerary.

Introduction round at the OSM hack weekend.

OSM editor integration

When working on the indoor map renderer for KDE Itinerary it’s often useful to open a specific element in an OSM editor, for inspecting certain details or for fixing issues in the data.

That’s not particularly complicated, but it’s a multi-step process that requires you to manually find the right element in the editor again. Not anymore though, we can now open the currently selected element and the currently visible area directly in an OSM editor.

Details sheet for a selected map element in Itinerary showing buttons for editing in iD and JOSM. — Edit actions for a selected map element.

While originally mostly meant as a convenience/efficiency improvement for developers we could also decide to have this in the regular app for the OSM contributors among our users. Whether on by default or behind a dedicated contributor setting remains to be discussed.

So far this supports:

iD: always and on all platforms, as that just requires opening a website.
JOSM: on desktop platforms. This also requires JOSM to be installed with a .desktop file (e.g. via Flatpak, not just using the raw JAR file) and its remote control support needs to be enabled (Edit > Preferences… > Remote Control > Enable remote control).
Vespucci on Android.

Standalone raw tile downloader and re-assembler

I finally managed to finish the standalone raw data download and reassembly tool started at the hack weekend in February. The tile downloading part in this is fairly straightforward, the complicated part is putting the split up geometry back together correctly.

We can now produce a fully statically linked binary on the CI which takes the area to retrieve as argument (geographic bounding box or tile index) and then provides the result in OSM XML, o5m or OSM PBF format.

This then can be used to feed the OSM2World 3D viewer and glTF generator for example.

Raw data tile generator testing

Work on the raw data tiles also uncovered an issue with our tile server, which sometimes loses the original node order in lines and polygons during clipping. While we usually don’t care in cases of polygons (there is no defined winding order in the OSM data model), this is crucial for directional lines (e.g. one way streets, escalators, cliffs/embankments, etc).

This finally forced me to address one of the most painful aspects of working on the tile server, the lack of automated tests. So far testing was only possible in the full setup with the continuously updating 850+GB planet-wide OSM database.

There’s now a work branch with an approach feeding handcrafted input to the tile creator and running that without the Tirex environment in a “single shot” mode. This is still not quite CI-ready yet due to needing ~1GB of landcover data to run, but that’s probably also solvable. And this of course still needs many more test scenarios for various complex clipping cases.

Nevertheless this made it much more comfortable and efficient to identify the bug at hand and test possible solutions already.

Statistics on set-like tag values

Taginfo is a invaluable tool for checking which tag values appear in the OSM data. It however doesn’t have support for set-like values. For Itinerary’s indoor map this is relevant for creating translation tables for e.g. the type of food served at restaurants (the cuisine tag).

A discussion about that at the previous hack weekend in Karlsruhe made me aware that Taginfo has an API, and using that is still significantly faster than extracting tag values from a full planet dump locally.

The result is a small script which can produce the needed statistics for a set-like tag. Use with extreme care though, despite the aggressive rate limiting in there this might still produce an unreasonable amount of API calls against Taginfo, ie. this is mostly a stop-gap solution until Taginfo gains support for set-like tags itself.

For the cuisine tag this shows 15k distinct values in sets, compared to 62k distinct tag values. With more normalization and more tolerant splitting it’s even just 12k. The occurrence distribution doesn’t fundamentally change though, there is a limited amount of common values and large amount of rarely used values.

Occurrence	Number of values (strict)	Number of values (tolerant)
100000+	4	4
10000+	32	33
1000+	94	94
100+	242	247
10+	876	906
2+	4090	3882
1+	14966	12666

Anything occurring more than a few hundred times is interesting for translations, most things on the other end of the scale are probably mistakes in the data (translated values, typos, syntax errors, overly specific values, values referring to something else/in the wrong tag, etc).

Practically this filled a few gaps in our translation table and resulted in a few typo fixes in the OSM data already. Also, according to this data Pizza is the world’s most wide-spread food, with quite a margin.

Exchange with other projects

Hack weekends are very productive as you can draw on the expertise of everyone around. Equally valuable is the exchange with people working on other projects though, as that often finds overlap or room for collaboration even when working on quite different topics.

The Straßenraumkarte implements high-detail rendering of urban environments, particular around road and cycling infrastructure. This also contains elements highly relevant for Itinerary, such as the exact placement of bus stops and how to get to those in case of complex multi-lane roads/crossings.
The LevelOut project deals with extracting OSM indoor building geometry from BIM data which exists for most somewhat more recent buildings. That could be a great help with getting more public buildings mapped on the inside as well.
There are people doing micro-climate simulations for urban environments, and it turns out the data needed for this overlaps with what is also needed for a 3D display of buildings (surface material and color, amount of windows, etc).

You can help!

All this needs infrastructure that doesn’t come for free, be it server hosting or meeting rooms. That’s where your donations help, and all involved organizations here run on donations:

Thank you :)