October/November in KDE Itinerary

    It’s already two month since I last wrote a summary on recent developments
in KDE Itinerary, so here is what happened in October and November. With the 18.12 application
release coming up shortly, that’s also largely what you can expect in there.

New Features

The probably biggest visible change is the introduction of automatic trip grouping. That is, multiple
elements (flights, train reservations, hotel bookings, etc) belonging to the same trip are grouped
together in the timeline and can be collapsed for a better overview.

This might sound simple (and for a human it is), but finding a reliable way to automatically group
things, and to automatically name the resulting trip, is actually quite tricky. Some of the challenges
include:

  Trips that don’t return to the starting point. It’s not uncommon for larger urban areas to have multiple
airports that all are viable “home airports”. E.g. TXL -> CDG -> SXF should be detected as a trip group,
despite technically not being a loop. A distance threshold addresses this to some extend, but would need
to be fairly large to also work e.g. for all of London’s airports.
  Distinguishing between changeover stops and the actual destinations. This matters for picking good default
names for trips. We use hotel bookings as an indication for this, the length of a stay between two location
changes might also be interesting to look at. Assuming a trip is symmetric however isn’t working.
  Incomplete data. This can be just not yet imported or booked elements of a trip, or elements that simply don’t
exist. The Randa demo data set shows this for example with the unidirectional bus trip from Randa to Zermatt,
which is followed by a (missing) hike back. A strict connectivity search would detect that gap and wrongly
interrupt the trip there. By also looking for the matching reservation numbers of the enclosing flights we
can still make this work.
  Missing city names. This is again mostly a problem for naming, as cities are often the best level of detail
for describing the destination. That is “Munich” makes sense while “Franz Joseph Strauss” (fallback to airport name)
or “Germany” (fallback to country name) are sub-optimal, the latter particularly if we started in that country.

As you can imagine this needs a lot of fine tuning to produce useful results, and possibly additional search
and naming strategies for cases not yet considered. So this is a good place to help, if you have particularly
complex trips or other cases where the current approach could produce better results, or just ideas on how to
improve this, please let us know.

That’s not all that is new though, Nicolas started to work on adjusting the screen brightness when
getting ticket barcodes scanned on your phone. So far we have Linux/Solid support for Plasma Mobile,
Android support is still to come. This makes it less likely that you are the unpopular person to stall the
boarding queue ;-)

Infrastructure Work

  Post-processing of extracted data can now employ libphonenumber.
Given a partial address e.g. of a hotel or a restaurant, this allows us to determine the country from an international
phone number, or the other way around make a local phone number internationally dial-able when we know the country.
There is more that libphonenumber can do, such as determining the city or timezone from a given number, that still
has to be integrated though.
  Custom extractors can now also specify filter expressions on proprietary barcode content. This allows us
for example to select the SNCF ticket extractor without having the email context for such a PDF document. So far this
was only possible on standard barcodes like IATA BCBP or UIC 918.3.
  A new convenience method for extractor scripts allows to turn the often found Google Maps URLs into
JSON-LD geo coordinate objects.

Important Fixes

  We switched to a newer variant of the ZXing barcode decoding library.
This fixes a few cases of failed detection of PDF417 codes with small module sizes (common in boarding pass PDF files),
and as a very welcome  side-effect increases performance and fixes a memory leak on failed barcode detections.
  The airport identification code now considers a few more alternative transliterations of non-ASCII characters.
This allows us to properly identify more airports from their (translated) human-readable names and look up
information from the Wikidata knowledge database about them.
  Nicolas implemented generic airline name extraction from Apple Wallet boarding passes.
  Weather forecast in KDE Itinerary now also works for negative coordinates.

Performance Improvements

  The schema.org based value classes got a shared null state optimization, and avoid detaching their
internal state when their property setters are called with unchanged values (see separate
post
on this). This saves a significant amount of memory allocations during post-processing of extracted data.
  We reduced the need for expensive image scaling operations in barcode extraction from PDF files and improved the
heuristics on which graphics found in a PDF could even be a valid barcode, which combined with the better ZXing
variant mentioned above cuts the runtime of the extractor test suite by almost half. PDF extraction performance
is important as the KMail plug-in applies this much more aggressively then in the previous release.

UI Improvements

  Importing and pasting content into KDE Itinerary got streamlined by supporting remote URLs too, and by
auto-detecting content. So there’s only a single workflow for all supported content types now.
  The timeline model got a vastly improved test system, which enabled us to fix a number of merge fails when
importing data, resulting in duplicated or otherwise wrong elements in the timeline.

Contribute

Again a big thanks to everyone who donated test data,
this helps immensely, please keep it coming!

If you want to help in other ways too, see our Phabricator workboard
for what’s on the todo list, for coordinating work and for collecting ideas. For questions and suggestions, please feel free
to join us on the KDE PIM mailing list or in the #kontact channel on Freenode.