My Perspective on This Year’s GUADEC

Greetings GNOMEies

This year, I had the pleasure to attend GUADEC at Almeria, Spain. Lots of things happened, and I believe some of them are important to be shared with the greater community.

GUADEC

This year’s GUADEC happened in Almería, Spain. It turns out Almería is a lovely city! Small and safe, locals were friendly and I managed to find pretty good vegan food with my broken Spanish.

I was particularly happy whenever locals noticed my struggle with the language, and helped and taught me some handy words. This alone was worth the entire trip!

Getting there was slightly complicated: there were no direct flights, nor single-connection routes, to there. I ended up having to get a 4 connection route to there, and it was somewhat exhausting. Apparently other people also had troublesome journeys there.

The main accommodation and the main venue could have been closer, but commuting to there was not a problem whatsoever because the GUADEC team schedule a morning bus to there. A well handled situation, I must say — turns out, commuting with other GNOME folks sparked interesting discussions and we had some interesting ideas there. The downside is that, if anyone wanted the GNOME Project to die, we were basically in a single bus 😛

Talks

There were quite a few interesting talks this year. My personal highlights:

BoFs

To me, the BoFs were the best part of this year’s GUADEC. The number of things that happened, the hard talks we’ve made, they all were extremely valuable. I think I made a good selection of BoFs to attend, because the ones I attended were interesting and valuable. Decisions were made, discussions were held, and overall it was productive.

I was particularly involved in five major areas: GNOME Shell & Mutter, GJS, GTK, GNOME Settings, and GNOME To Do.

GNOME Shell & Mutter

A big cleanup was merged during GUADEC. This probably will mean small adaptations in extensions, but I don’t particularly think it’s groundbreaking.

At the second BoF day, me and Jonas Ådahl dived into the Remote Desktop on Wayland work to figure out a few bugs we were having. Fortunately, Pipewire devs were present and we figured out some deadlocks into the code. Jonas also gave a small lecture on how the KMS-based renderer of Wayland’s code path works (thanks!), and I feel I’m more educated in that somewhat complex part of the code.

As of today, Carlos Garnacho’s paint volume rework was merged too, after extensive months of testing. It was a high-impact work, and certainly reduces Mutter’s CPU usage on certain situations.

At the very last day, we talked about various ideas for further performance improvements and cleanups on Mutter and GNOME Shell.  I myself am on the last steps of working on one of these ideas, and will write about it later.

As I sidenote, I would like to add that I can only work on that because Endless is sponsoring me to do that. Because

banner-down

Exciting times for GNOME Shell ahead!

GJS

The git master GJS received a bunch of memory optimizations. In my very informal testing, I could measure a systematic 25~33% reduce in the memory usage of every GJS-based application (Maps, Polari and GNOME Shell). However, I can’t guarantee the precisions of these results. They’re just casual observations.

Unfortunately, this rework was making GNOME Shell crash immediately on startup. Philip Chimento tricked me into fixing that issue, and so this happened! I’m very happy with the result, and looks like it’ll be an exciting release for GJS too!

Thanks Philip for helping me deep dive into the code.

GTK

Matthias already wrote an excellent write-up about the GTK BoF, and I won’t duplicate it. Check his blog post if you want to learn more about what was discussed, and what was decided.

GNOME Settings

At last, a dedicate Settings BoF happened at the last day of the conference. It had a surprisingly higher number of attendees than what I was expecting! A few points on our agenda that were addressed:

  • Maintainership: GNOME Settings has a shared maintainership model with different levels of power. We’ll add all the maintainers to the DOAP file so that anyone knows who to ping when opening a merge request against GNOME Settings.
  • GitLab: we want to finish the move to GitLab, so we’ll do like other big modules and triage Bugzilla bugs before moving them to GitLab. With that, the GitLab migration will be over.
  • Offloading Services to Systemd: Iain Lane has been working on starting sessions with systemd, and that means that we’ll be able to drop a bunch of code from GNOME Settings Daemon.
  • Future Plans: we’ve spent a good portion of this cycle cleaning up code. Before the final stable release, we’ll need to do some extensive testing on GNOME Settings. A bit of help from tech enthusiasts would be fantastic!

We should all thank Robert Ancell for proposing and organizing this BoF. It was important to get together and make some decisions for once! Also, thanks Bastien for being present and elucidating our problems with historical context – it certainly wouldn’t be the same without you!

GNOME To Do

Besides these main tracks, me and Tobias could finally sit down and review GNOME To Do’s new layout. Delegating work to who knows best is a good technique:

Tobias' GNOME To Do mockups in my engineering notebook.
Tobias’ GNOME To Do mockups in my engineering notebook.

I was also excited to see GNOME To Do stickers there:

gnome-todo stickers
Sexy GNOME To Do stickers, a courtesy of Jakub

It’s fantastic to see how GNOME To Do is gaining momentum these days. I certainly did not expect it three years ago, when I bootstrapped it as a small app to help me deal with my Google Summer of Code project on Nautilus. It’s just getting out of control.

Epilogue

Even though I was reluctant to go, this GUADEC turned out to be an excellent and productive event. Thanks for all the organizers and volunteers that worked hard on making it happen – you all deserve a drink and a hug!

I was proudly sponsored by the GNOME Foundation.

Sponsored by the GNOME Foundation

Advertisements

The Infamous GNOME Shell Memory Leak

Memory graph

Greetings GNOMErs,

at this point, I think it’s safe to assume that many of you already heard of a memory leak that was plaguing GNOME Shell. Well, as of yesterday, the two GitLab’s MRs that help fixing that issue were merged, and will be available in the next GNOME version. The fixes are being considered for backporting to GNOME 3.28 – after making sure they work as expected and don’t break your computer.

First, I’d like to thank the GJS maintainer, Philip C., for all the hand-holding, the reviews, and the incredibly insightful discussions we had. Secondly, to my employer, Endless, for the support they gave me to fix this issue. And last but not least, to the Ubuntu folks, which made a public call for testing with the changes – this will give us confidence that the fix is working, and that backporting it will be a relatively safe and smooth process.

banner-down
As always, great new features and fixes are a courtesy of Endless

I’m writing this blog post with three goals in mind:

  1. Explain in greater details what is the issue (or at least, what we think it is), the journey to find it, and how it was fixed.
  2. Give more exposure to important extra work from other contributors that absolutely deserve more credits.
  3. Expose a social issue that showed up during this time, and open a discussion about it.

Memory Leak

To me, it all started when I saw GitLab’s ticket #64 passing by in the IRC channels. It was challenging enough, I was curious to dig into GNOME Shell/Mutter/GJS internals, perfect match. Of course, when you’re not familiar with a given codebase, the first step to fixing a bug is being able to reproduce it, so I started to play around with GNOME Shell to see if I could find a reliable way to reproduce it.

Well, I found a way and wrote a very simple observation: running animations (showing and hiding the Overview, switching applications using Alt+Tab, etc) was reliably increasing memory usage. Then a few people came in, and dropped bits of useful information here and there. But at this point, it was still pointing to a wide range of directions, and definitely there was not actionable task there. This is when OMG! Ubuntu first wrote about it.

Carlos Garnacho then came in and wrote a pretty relevant comment with important information. It was specially insightful because he put numbers on the guts of GNOME Shell. His comment was the first real solid step to uncover what was going on.

A week passed, and I experimented different toys tools in order to have a better understanding of memory management inside GNOME Shell. This is the kind of tedious work that nobody talks about, but I learned tons of new stuff, so in the end it was worth the hassle. I even wrote about my crazy experiments, and the results of this long week are documented in a long comment in GNOME/gnome-shell#64. I kept experimenting until I reached heapgraph, an interesting tool that allowed generating the following picture:

Memory graph
Notice the sudden drops of memory at x=42 and x=71

Well, as stated in the comment, GJS’ garbage collect was indeed collecting memory when triggered. Problem is, it wasn’t being triggered at all. That was the leading clue to one of the problems that was going on. One idea came to my mind, then, and I decided to investigate it further.

A Simple Example

Consider that we have a few objects in memory, and they have parent/child relationships:

Example 1
The root object is “1”

Lets suppose that we decided that we don’t need the root object anymore, so we drop a reference to it, and it is marked for garbage collection.

Example 2
The root object is now marked for garbage collection

If we destroy the root object, we would need to destroy the other objects related to it, and go destroying everyone that depended, directly or indirectly, on the root object. Traditionally, JavaScript objects track who they own, so the garbage collector can clean up every other dependent object. Here’s the problem: C objects don’t track who owns them; instead, they only track how many owners they have. This is the traditional reference counting mechanism, and it works fine in C land because C is not garbage collected. To the garbage collector, however, the C objects would look like this:

Example 3
The garbage collector has no means to know the relationships between C objects.

The garbage collector, then, will go there and destroy the root one. This object will be finalized, and the directly dependent objects will be marked for garbage collection.

Example 4
Only the directly dependent objects are marked for the next garbage collection.

But… when will the next GC happen? Who knows! Can be now, can be in 10 minutes, or tomorrow morning! And that was the biggest offender to the memory leak – objects were piling up to be garbage collected, and these objects had child objects that would only be collected after, and so it goes. In other words, this is not really a memory leak – the memory is not being lost. I’d label it as a “misbehavior” instead.

The Solution

While people might think this was somehow solved, the patches that were merged does not fix that in the way it should be fixed. The “solution” is basically throwing a grenade to kill ants. We now queue a garbage collection every time an object is marked for destruction. So every single time an object becomes red, as in the example, we queue a GC. This is, of course, a very aggressive solution.

But it is not all bad. Some early tests shows that this has a small impact on performance – at least, it’s much smaller than what we were expecting. A very convincing explanation is that the higher frequency of GCs is reducing the number of things that are being destroyed each GC. So now we have smaller and more frequent garbage collections.

EDIT: Looks like people need more clarification here, since the comments about it are just plain wrong. I’ll be technical, and precise – if you don’t understand, please do some research. The garbage collector is scheduled every time a GObject wrapped in GJS has its toggle reference gone from >1 to 1. And scheduled here means that a GC is injected into the mainloop as an idle callback, that will be executed when there’s nothing else to be executed in the mainloop. The absolute majority of the time, it means that only one GC will happen, even if hundreds of GObjects are disposed. I’ve spotted in the wild it happening twice. This fix is strictly specific to GObjects wrapped by GJS; all other kinds of memory management, such as strings and whatever else, aren’t affected by this fix. Together with this patch, an accompanying solution landed that reduces the number of objects with a toggle reference.

This obviously needs more testing on a wider ranger of hardwares, specially on lower ends. But, quite honestly, I’m personally sure that this apparently small performance penalty is compensated by the memory management gains.

Other Improvements

While the previous section covered my side of this history, there are a few other contributors that did a great job, and I think it would be unfair with them if their work was not properly highlighted.

Red Hat’s Carlos Garnacho published two merge requests for GJS that, in my testing, substantially improved the smoothness of GNOME Shell. The first one changes the underlying data structure of JS objects, which allows us to stop using an O(n) algorithm and starting an O(1) one. The second one is particularly interesting, and it yields the most noticeable improvements in my computer. Gross, it vastly reduces the number of temporary memory allocations. He also has a number of patches on Mutter and GNOME Shell.

Another prominent contributor regarding performance is Canonical’s Daniel van Vugt, which helped early testing the GJS patches, and is doing some deep surgeries in Mutter to make the rendering smoother.

And for every great contributor, there is a great reviewer too. It would be extremely unfair if those relevant people haven’t had their work valued by the community, so please, take a moment to appreciate their work. They deserve it.

Final Thoughts

At this point, hopefully the cautious reader will have at least a superficial knowledge on the problem, the solution, and other relevant work around the performance topic. Which is good – if I managed to communicate that well enough, by the time you finish reading this blog post, you’ll have more knowledge. And more knowledge is good.

You can stop here if you want nothing more than technical knowldedge.

Still around?

Well, I’d like to raise an interesting discussion about how people reacted to the memory leak news, and reflect upon that. By reading the repercussions of the news, I found it quite intriguing to read comments like these:

weird comment 1

Captura de tela de 2018-04-20 22-51-53

Captura de tela de 2018-04-20 22-52-17

As a regular contributor for the last few years, this kind of comment sound alien to me. These comments sound completely disconnected to the reality of the development process of GNOME. It completely misses the individuality of the people involved. Maybe because we all know each other, but it is just plain impossible to me to paint this whole community as “they”; “GNOME developers”; etc. To a deeper degree, it misses the nuances and the beauty of community-driven development, and each and every individual that make it happen.

To some degree, I think this is a symptom of users being completely disconnected to GNOME development itself.

It almost feels like there’s a wall between the community and the users of what this community produces. Which is weird. We are an open community, with open development, no barriers for new contributors – and yet, there is such a distance between the community of users and the community of developers/designers/outreachers/etc.

Is that a communication problem from our side? How can we bridge this gap? Well, do we want to bridge this gap? Is it healthy to reduce the communication bandwidth in order to increase focus, or would it be better to increase that and deal with the accompanying noise?

I would love to hear your opinions, comments and thoughts on this topic.