GNOME Software

Profiling & optimizing GNOME Software

Over the past few months, the Endless OS Foundation has been putting focus on improving GNOME Software’s reliability and performance. Endless OS is an OSTree-based immutable OS, and applications are entirely distributed as Flatpaks. GNOME Software is the frontend to that, and since our target userbase is definitely not going to use a terminal, we need to make sure GNOME Software delivers a good experience.

This focus has been materializing with Philip’s long effort to switch GNOME Software to a different threading model, more suitable to what Software does these days; and more recently, I’ve been tasked to look at GNOME Software’s performance from a profiling perspective.

Profile, Profile, Profile

I’ve been looking at performance-related tasks for many years now. It’s a topic of investigation that I personally enjoy, even if it’s a bit annoying and frustrating at times. My first real, big performance investigation was some good six years ago, on GNOME Music. Back then, the strategy to profiling was: guess what’s wrong, change that, see if it works. Not exactly a robust strategy. Luckily, it worked for that particular case.

Then over the next year I started to contribute to Mutter and GNOME Shell, and one of my first bigger performance-related investigations in this area involved frame timings, GJS, and some other complicated stuff. That one was a tad bit more complicated, and it still relied on gut feelings, but luckily some good improvements came out of that too.

Those two instances of gut-based profiling are successful when put like this, but that’s just a trick with words. Those two efforts were successful due to a mix of ungodly amounts of human effort, and luck. I say this because just a few months after, the GNOME Performance Hackfest took place in Cambridge, UK, and that was the beginning of a massive tsunami that took the GNOME community.

This tsunami is called profiling.

In that hackfest, Christian and Jonas started working on integrating Sysprof and Mutter / GNOME Shell. Sysprof was already able to collect function calls without any assistance, but this integration work enabled, for example, measuring where in the paint routines was GNOME Shell taking time to render. Of course, we could have printed timestamps in the console and parsed that, but let’s be honest, that sucks. Visualizing data makes the whole process so much easier.

So much easier that even a fool like myself could contribute. After quickly picking up the work where Jonas left, GNOME Shell was able to produce profiling data like below:

GNOME Shell frame timing marks on Sysprof, as it was in 2018

That integration, alone, allowed us to abandon the Old Ways of profiling, and jump head first into a new era of data-based optimizations. Just looking at the data above, without even looking at function calls, we can extract a lot of information. We can see that the layout phase was taking 4ms, painting was taking 5ms, picking was taking ~2ms, and the whole process was taking ~12ms. Looking at this data in depth is what allowed, for example, reducing pick times by 92%, and rendering times by a considerable factor too.

The most important lesson I could derive from that is: optimizing is borderline useless if not backed by profiling data. Always profile your app, and only optimize based on profiling data. Even when you think you know which specific part of your system is problematic, profile that first, just to confirm.

With these lessons learned, of course, I used this approach to profile GNOME Software.

Profiling an App Store

Naturally, the problem domain that GNOME Software – an app store – operates into is vastly different from that of GNOME Shell – a compositor. GNOME Software is also in the middle of a major transition to a new threading model, and that adds to the challenge of profiling and optimizing it. How do you optimize something that’s about to receive profound changes?

Fortunately for us, GNOME Software already had a basic level of integration with Sysprof, which gave some introspection on how it was spending its time, but not enough. My initial reaction was to extend this Sysprof integration and cover more events, in a more hierarchical manner. The goal was to improve whatever was making navigation through categories slow.

This proved to be a successful strategy. Here’s what profiling GNOME Software gave us:

Zoom in in the image. Take a few seconds to analyse it. Can you spot where the issue is?

If not, let’s do this together.

This slice of profiling shows that the GsPluginJobListApps operation (first line) took 4 whole seconds to execute. That’s a long time! This operation is executed to list the apps of a particular category. This means people have to wait 4 entire seconds, just to see the apps of a category. Certainly a usability killer, exploring these apps is supposed to be fun and quick, and this issue hurts that.

You may notice that there are other sub-events beneath GsPluginJobListApps. The first line is the timing of the whole operation, and beneath that, we have the timings of each individual sub-operation it does to end up with a list of apps.

Skimming through the entire profiler window, the widest black bars beneath the first line are, in order of appearance:

  • GsPluginJobListApps:flatpak taking 600ms
  • GsPluginJobRefine taking 3.6 seconds
    • GsPluginJobRefine:flatpak taking 600ms
    • GsPluginJobRefine:icons taking about 3 seconds

What that tells us is that the GsPluginJobListApps operation runs a GsPluginJobRefine sub-operation, and the icons are what’s taking most of the time there. Icons!

Refining is GNOME Software terminology for gathering metadata about a certain app, such as the name of the author, origin of the application, description, screenshots, icons, etc.

Icons!

Contrary to all my expectations, what was clear in these profiling sessions is that loading icons was the worst offender to GNOME Software’s performance when navigating through categories. This required further investigation, since there’s a fair bit of code to make icon management fast and tight on memory.

It didn’t take long to figure out what was happening.

Some applications declare remote icons in their metadata, and GNOME Software needs to download these icons. Turns out, in Flathub, there are a couple of apps that declare such remote icons, but the icons don’t actually exist in the URL they point to! Uh oh. Obviously these apps need to fix their metadata, but we can’t let that make GNOME Software slow to a crawl.

Knowing exactly what was the problem, it wasn’t difficult to come up with potential solutions. We always have to download and cache app icons, but we can’t let that block loading categories, so the simplest solution is to queue all these downloads and continue loading the categories. The difference is quite noticeable when comparing with the current state:

Most of the time, GNOME Software already has access to a local icon, or a previously downloaded and cached icon, so in practice it’s hard to see them downloading.

Final Thoughts

I hope to have convinced you, the reader, that profiling your application is an important step when working on optimization. This article is focused on the timing marks, since they’re the easiest to understand on a quick glance, but these marks have little meaning when not accompanied with the function calls stack. There’s a lot to unpack on the subject, and sometimes this feels more like an art than simply a mechanical evaluation of numbers, but it sure it a fun activity.

Sadly, until very recently, even profiling was still a major pain – although much less painful than guess-based profiling – since you’d need to build at least your app and some of its dependencies with debug symbols to get a meaningful call stack. Most of the profiling I’ve mentioned above required building all dependencies up to GLib with debug symbols.

However, seems like a game changing decision has been made by the Fedora community to finally enable frame pointers on their packages by default. And that means that the setup overhead to perform profiling like the above is brutally reduced, it’s almost trivial even, and I’m hopeful with the prospects of democratizing profiling like this. At the very least, I can say that this is a massive improvement to desktop developers.

Buy Me a Coffee at ko-fi.com

10 responses to “Profiling & optimizing GNOME Software”

  1. Have you considered also loading the icons separately from the text, leaving a blank space until the icon is loaded?

    1. How’s that any different from what was done in the merge request?

      1. I might have misunderstood the work done here.

        My impression was that, currently, downloading icons from the web was adjusted to happen asynchronously in the background, BUT that loading icons from cache was still being done synchronously.

        My question was whether this loading could perhaps also be done asynchronously, with the loading of text taking precedence over that of icons.

        In any case, thanks a bunch for the work you’re doing — this was a big pain point and it’s great to see GNOME Software improving in this regard!

        1. It could be done, but as I demonstrated with the profilings above, synchronous icon loading is not a performance problem right now.

    2. If you see the MR first screencast, that’s exactly what he did, looks like there’s still delay when loading the app list but that might be less related to icons now. Unless there’s some blockage when loading the local icons as well, I remember I had to fix a Gtk+ app that loaded a lot of icons (remote and local), there was a lot of seconds wasted on waiting to load the already cached icons, I had to load the remote AND the local ones in a separate thread and queue for render those after they were ready. So the list widget loaded almost instantaneously.

  2. […] Georges Basile Stavracas Neto ☛ Profiling & optimizing GNOME Software […]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.