Finding insecure network connections

One obvious aspect of KDE’s privacy goal is eliminating all network connections that are not using transport encryption. That’s however not as straightforward to ensure as it may sound, it’s easy to have a long forgotten HTTP link in a rarely used dialog that should have been changed to HTTPS many years ago already. How do we find all these cases?

Scope

First of all, this is not about intentional or malicious attempts to bypass or weaken transport encryption at any level, but about finding our own mistakes. That is, simple typos forgetting the crucial ‘s’ after ‘http’, or legacy code from a time before the corresponding service even supported transport encryption.

This is also not about identifying servers that only offer weak or otherwise broken transport security, and communicating that to the user as we are used to from web browsers. All of that needs to be looked at too of course, but that’s a different task.

Source Code

Naively, searching the source code for http: and replacing it with https: would be a good start. However, it’s a bit more complicated than that, mainly due to http URIs being used both as resource identifiers and as addresses for resources. A common case for the former are XML namespaces, those are not actually retrieved via the network, so http URIs are actually not a problem (on the contrary, changing them might confuse the code dealing with them). In the latter case the URIs are actually URLs and are used for network operations, those we need to fix.

Still, a bit of crude grep work can turn up quite some useful results already. This can be clickable links in code (D16904, R209:33447a) or documentation (D17262), downloaded content (D7414, D17223) or access to online services (D7408, D16925, D16946). But many hits are part of code documentation, automatic tests or license information (R1007:aff001), which are less severe in their impact. Sometimes URLs also appear in translations, so those would need to be checked too.

Compiled Files

Another place to look for http: strings is in the compiled binaries. That’s less complete, but seems to have a much smaller false positive rate. A simple approach is grep-ing through the output of strings and strings -l e (the latter decodes 16bit little endian Unicode strings as used by QStringLiteral), and filtering out common URIs as needed in the source code search too.

Runtime Monitoring

An entirely different approach for identifying clear text connections is observing network activity at runtime. Tools like tcpconnect of the bcc suite seems to be a good starting point, as it allows continuous monitoring without noticeable impact, unlike e.g. capturing the entire network communication with Wireshark.

With this we can also find non-HTTP connections, as well as entirely unnecessary network activity (D7410, D7438). However, we wont notice anything from code paths that aren’t run.

Contribute

This is a perfect topic to get started with I think, fixing http: links is as easy as it gets, and yet that has relevant impact on the privacy or our users. But it doesn’t stop there, as we also need to build the tools to identify these issues more reliably. There isn’t much yet in terms of tooling, so a simple script would already be an improvement (e.g. to automatically check the KIO search provider desktop files), but if you are into elaborate runtime tracing techniques like used by the bcc tools, here’s a real-world problem to use this for :)