Wikidata Data Reuse Days 2022 Recap

Over the past two weeks I attended a number of sessions of the Wikidata Data Reuse Days 2022 and presented KDE’s use of Wikidata in our travel apps there. Here are some of the things I found particularly interesting from a KDE perspective.

Data

With Wikidata modeling the whole world and then some, several sessions focused on available datasets in specific domains and their use cases.

Language Models

Wikidata’s lexeme database allows creating language models that we simply did not have available for FOSS software a couple of years ago. This can be useful for text input and text correction, but also for education software around language learning.

Scribe is an example for using that to build a virtual keyboard specifically for second language users helping you to use the right plural form, conjunctives, prepositions or gender forms (talk details).

Could be worth looking into for Sonnet, Plasma Mobile’s virtual keyboard and/or KDE Edu.

Consumer Advice

A very different but no less interesting dataset is Open Food Facts (talk details). That contains information about food related products and their ingredients, healthiness and environmental impact.

This allows building apps that based on a product barcode help you check whether a specific product is compatible with allergies or other food intolerances (particularly useful when the packing is labeled in a language you don’t understand), or to help you pick healthier or more sustainable alternatives while shopping.

Plasma Mobile’s barcode scanning app Qrca already offers to open the corresponding Open Food Facts entry when encountering an EAN code, but there is quite likely room for a deeper integration.

Open Food Facts also has two similar sister projects, Open Beauty Facts for cosmetics, and the even more general Open Product Facts.

Tooling

Besides datasets, tools or techniques to work with Wikidata were equally important topics.

One new tool that could be interesting for us is the Mismatch Finder which allows to report potential data quality issues found by automatic checks against other sources (talk details). That’s something we do for example when bringing Wikidata and OSM data together for generating the public transport line meta data in KPublicTransport.

Adapting the debug output of our data processing tools to produce a machine readable format consumable by the Mismatch Finder shouldn’t be a big deal, and certainly promises more quality improvements than posting issues in blog posts.

Wikidata use in KDE Itinerary and KTrip

While there is no recording of my talk, at least the slides are available here.