Countering string bloat (addendum)

As last weeks post on countering string bloat has triggered some interest (and a few misunderstandings) here are a few more details on that topic. Nevertheless this isn’t going to be a comprehensive discussion of string handling in Qt (and was never meant to be), there’s plenty of posts and talks on that subject already out there if you want to dig deeper.

When to use QLatin1String

The KWeatherCore example in the last post was mainly discussing the use of QLatin1String in combination with Qt’s JSON API, as that has corresponding overloads. And that is a very important point, preferring QLatin1String over QStringLiteral for constant strings is only generally advisable if such overloads exist.

Things will work either way of course (as long as we are only dealing with 7bit ASCII strings), but at a cost. QLatin1String is implicitly convertible to a QString, which involves a runtime memory allocation of twice its size and a text codec conversion to UTF-16. QStringLiteral exists to avoid precisely that, and saving that runtime cost is generally preferable over a few bytes in storage.

So this is always a case-by-case decision. QLatin1String overloads only exist in places where they can actually be implemented more efficiently, and it only makes sense to add them in those cases.

Note that “overload” is to be understood a bit more loosely than in the strict C++ sense here. Examples:

String comparison and searching.
JSON keys.
QLatin1String::arg as an alternative for QString::arg.
String concatenations, in particular in combination with QStringBuilder.

Yes, exceptions might exist where the reduced binary size trumps the additional runtime cost, e.g. for sparsely used large data tables. But there might be even better solutions for that, and that’s probably worth a post on its own.

Tooling

With the right choice being a case-by-case decision, there’s also understandably demand for better tooling to support this, search/replace isn’t going to cut it. While I am not aware of a tool that reliably identifies places where QLatin1String overloads should be used instead, there are tools that can at least support that work.

Clazy has the qstring-allocations static check to identify QString uses that can potentially be optimized to avoid memory allocations. This is actually the reverse of what is discussed here, so it’s a good way to catch overzealous QLatin1String uses. It has the second to lowest reliability rating regarding false positives though, so this is also not something to apply without careful review.

Clazy’s qlatin1string-non-ascii check is another useful safety net, finding QLatin1String literals that cannot actually be represented in the Latin-1 encoding.

Enabling QT_NO_CAST_FROM_ASCII also helps a bit as it forces you to think about the right type and encoding when interfacing with QString API.

The other aspect of tooling is looking at binary size impact of code changes. A simple but effective tool is bloaty, which is what produced the size difference table in the previous post. Make sure to strip binaries beforehand, otherwise the debug information will drown everything else.

For a more detailed look, there is also the size tree map in ELF Dissector.

Size tree map showing a large area pointing to QRC data compiled into KPublicTransport. — ELF Dissector's size tree map showing KPublicTransport, the big block in the center being QRC data.

Expected savings

How much savings to expect varies greatly depending on a number of circumstances. It’s also worth looking at absolute and relative savings separately. In the previously mentioned KWeatherCore example this were 16kB or 7% respectively.

This is due:

Few if any QLatin1String overloads were used, so a lot of room for optimization.
The library is very small and a significant part of it is JSON handling.
Other significantly more impactful optimizations to its static data tables had been applied previously (see e.g. this MR).

Let’s look at another example to put this into perspective, this change in KPublicTransport. Just like the KWeatherCore change this also changes QStringLiteral to QLatin1String in places where corresponding overloads exist, primarily in JSON API.

    FILE SIZE        VM SIZE
 --------------  --------------
  -0.1%     -16  -0.1%     -16    .eh_frame_hdr
  -0.1%    -144  -0.1%    -144    .eh_frame
  -0.1%    -430  -0.1%    -430    .text
  -0.9% -3.97Ki  -0.9% -3.97Ki    .rodata
  -0.3% -4.00Ki  -0.3% -4.54Ki    TOTAL

The savings here are lower though, just 4kB or 0.3%. This is due:

The majority of this code already uses QLatin1String overloads, the change only fixes a few cases that slipped through.
Unlike with KWeatherCore the QString overloads remain in use for generic code not using literal keys. We therefore see no reduction due to fewer used external symbols (.plt remains unchanged).
The library is much larger in total.
The data size is dominated by compiled in resources, primarily polygons in GeoJSON format (the big red box in the center of the above screenshot).

The latter would be the much more relevant optimization target here, as GeoJSON isn’t the most efficient way neither regarding space nor regarding runtime cost.

In general, the absolute amount of size reduction should be somewhat proportional to the amount of QStringLiteral changed to QLatin1String. If the relative change is surprisingly low, it’s worth checking what else is taking the space.

An even more extreme example that came up in discussions on this is Tokodon, where the relative reduction was just a fraction of a percent. A view in ELF Dissector reveals the reason for that, its giant compiled-in Emoji tables overshadowing everything else

Size tree map showing several large areas pointing to static Emoji data tables in Tokodon. — Size tree map for Tokodon, the large top left and center blocks are all related to static Emoji data.

Besides the data size (which wont be entirely avoidable here) this also involves a significant amount of static construction code, which is the even more interesting optimization target as it also impacts application startup and application runtime memory use.

Conclusion

As always with optimizations there is no silver bullet. Occasionally looking into the output of various profiling and analysis tools for your application or library usually turns up a few unexpected details that are worth improving.

Nevertheless I stand by my recommendation from last time to keep the seemingly minor details like the use of the right string API in mind. It’s an essentially free optimization that adds up given how widely applicable it is.