ISO Codes API for KDE Frameworks
As mentioned in a previous post I’m looking into collecting, extending and unifying various APIs we have for dealing with countries, country subdivisions, timezones, languages, etc in a single library in KDE Frameworks. While a lot of this is still work in progress, at least some features are ready for a closer look.
What is this about?
A number of our applications rely on knowledge about geospatial features (ie. properties of a location). In some cases that is fairly obvious, like KDE Itinerary needing to know the timezone of your travel destination to accurately show times. More often this is more indirect though, e.g. the initial device setup suggesting the most likely language and timezone, to avoid the user having to search through lists with possibly hundreds of entries. And of course we want to have all this properly translated.
Qt provides some of this via QLocale
and in the 4 era there were additional features in KLocale
and its associated
classes in kdelibs. On top of that various libraries and applications carrying own code for this.
There’s a KF6 Phabricator task for the goals and requirements, and there’s now a Gitlab work branch with the ongoing work, aiming at integration in the KI18n framework. A lot of this isn’t even new code but merely another iteration of things that already exist in other Frameworks, Plasma or applications.
I’ll try to present the features in there in a few blog posts, as they become ready for testing.
Features
The first set of features is about translating various codes used to identify countries (ISO 3166-1) or country subdivisions (ISO 3166-2), and about obtaining the translated name for a country or country subdivision. Bits of this are currently found in KContacts (ISO 3166-1 alpha 2 -> name), KItinerary (ISO 3166-1 alpha3 -> ISO 3166-1 alpha2) and KHolidays (translated country and subdivision names). There’s also half a dozen sets of translations for country names in all our translation catalogs, coming from various applications.
In the new API this is provided by the KCountry
and KCountrySubdivision
classes. Those types are very lightweight
(2 and 4 bytes respectively, and allocation free) and introspectable by Qt’s property system for use in QML.
Ultimately those types just represent ISO 3166-1/2 codes, all their functionality is implemented as lookups into
internal data tables.
Besides the mapping to and from ISO codes and getting translated names, there’s
also mapping to and from the QLocale::Country
enum, to provide interoperability with QLocale
, something that
Qt has built in but isn’t exposing in its API unfortunately.
A new feature that none of the existing APIs was providing is access to country and country subdivision hierarchies. That is listing subdivisions of a country (or another subdivision, for countries where this has more than one level), and vice versa. This was originally needed internally for more efficient storage, but it is also useful e.g. for an address editor offering the right subdivisions for a selected country.
Another new method is KCountry::emojiFlag()
, which returns the Unicode sequence for the corresponding country flag.
Data Source
Fortunately there is an existing set of data that provides all of what is needed for the above, and that we already depend on in a few places even, iso-codes. Technically this consists of a number of JSON files containing the ISO codes, subdivision hierarchies and their corresponding English text, and Gettext catalogs with the translations of those.
So to get started all we need is reading those JSON files and putting their content in a few maps.
Performance
A naive implementation like this however comes at a price. For just the country data it’s costing about 125kB of non-shared heap memory and takes about 100ms to do. That might not seem like a lot, but this multiplies per application needing this, and that doesn’t even cover the country subdivision data yet (which has about 20x more entries).
At the same time, this is code which is used on performance-sensitive paths, such as locale aware sorting in a (larger) model.
This is not a new problem though, KContacts for example solved this by compiling in an optimized representation of the iso-codes data. This is only about 5kB in size and requires no allocations and due to ending up in the read-only section of the library is automatically shared between all applications using this.
This approach was rightfully criticised for its higher maintenance and the risk of the library and the installed iso-codes translation catalogs going out of sync. So I tried a different solution here.
On first use we now create a compact binary lookup table out of the iso-codes JSON files, which then is mmap’ed as shared/read-only data. If the iso-codes data changes, the cache files are rebuilt. Ignoring the first run, this achieves the same performance characteristics as KContacts.
We could also optimize this further for fully bundled applications (AppImage, APK, etc), and allow those to ship pre-generated cache files rather than the JSON source files, as we fully control the updates of the iso-codes data there.
Outlook
Feedback for this is very welcome, on the implementation but also regarding use-cases and requirements
you have in your application. Check the corresponding Phabricator task
and the Gitlab branch for this,
or find me in the #kde-devel
channel on Matrix, the
weekly KF6 meetings Saturday 15:00 CEST
or the kde-frameworks-devel mailing list.
The next part is probably going to be about querying the timezone for a country or country subdivisions.