at this point, I think it’s safe to assume that many of you already heard of a memory leak that was plaguing GNOME Shell. Well, as of yesterday, the two GitLab’s MRs that help fixing that issue were merged, and will be available in the next GNOME version. The fixes are being considered for backporting to GNOME 3.28 – after making sure they work as expected and don’t break your computer.
First, I’d like to thank the GJS maintainer, Philip C., for all the hand-holding, the reviews, and the incredibly insightful discussions we had. Secondly, to my employer, Endless, for the support they gave me to fix this issue. And last but not least, to the Ubuntu folks, which made a public call for testing with the changes – this will give us confidence that the fix is working, and that backporting it will be a relatively safe and smooth process.
I’m writing this blog post with three goals in mind:
Explain in greater details what is the issue (or at least, what we think it is), the journey to find it, and how it was fixed.
Give more exposure to important extra work from other contributors that absolutely deserve more credits.
Expose a social issue that showed up during this time, and open a discussion about it.
To me, it all started when I saw GitLab’s ticket #64 passing by in the IRC channels. It was challenging enough, I was curious to dig into GNOME Shell/Mutter/GJS internals, perfect match. Of course, when you’re not familiar with a given codebase, the first step to fixing a bug is being able to reproduce it, so I started to play around with GNOME Shell to see if I could find a reliable way to reproduce it.
Well, I found a way and wrote a very simple observation: running animations (showing and hiding the Overview, switching applications using Alt+Tab, etc) was reliably increasing memory usage. Then a few people came in, and dropped bits of useful information here and there. But at this point, it was still pointing to a wide range of directions, and definitely there was not actionable task there. This is when OMG! Ubuntu first wrote about it.
A week passed, and I experimented different toys tools in order to have a better understanding of memory management inside GNOME Shell. This is the kind of tedious work that nobody talks about, but I learned tons of new stuff, so in the end it was worth the hassle. I even wrote about my crazy experiments, and the results of this long week are documented in a long comment in GNOME/gnome-shell#64. I kept experimenting until I reached heapgraph, an interesting tool that allowed generating the following picture:
Well, as stated in the comment, GJS’ garbage collect was indeed collecting memory when triggered. Problem is, it wasn’t being triggered at all. That was the leading clue to one of the problems that was going on. One idea came to my mind, then, and I decided to investigate it further.
A Simple Example
Consider that we have a few objects in memory, and they have parent/child relationships:
Lets suppose that we decided that we don’t need the root object anymore, so we drop a reference to it, and it is marked for garbage collection.
The garbage collector, then, will go there and destroy the root one. This object will be finalized, and the directly dependent objects will be marked for garbage collection.
But… when will the next GC happen? Who knows! Can be now, can be in 10 minutes, or tomorrow morning! And that was the biggest offender to the memory leak – objects were piling up to be garbage collected, and these objects had child objects that would only be collected after, and so it goes. In other words, this is not really a memory leak – the memory is not being lost. I’d label it as a “misbehavior” instead.
While people might think this was somehow solved, the patches that were merged does not fix that in the way it should be fixed. The “solution” is basically throwing a grenade to kill ants. We now queue a garbage collection every time an object is marked for destruction. So every single time an object becomes red, as in the example, we queue a GC. This is, of course, a very aggressive solution.
But it is not all bad. Some early tests shows that this has a small impact on performance – at least, it’s much smaller than what we were expecting. A very convincing explanation is that the higher frequency of GCs is reducing the number of things that are being destroyed each GC. So now we have smaller and more frequent garbage collections.
EDIT: Looks like people need more clarification here, since the comments about it are just plain wrong. I’ll be technical, and precise – if you don’t understand, please do some research. The garbage collector is scheduled every time a GObject wrapped in GJS has its toggle reference gone from >1 to 1. And scheduled here means that a GC is injected into the mainloop as an idle callback, that will be executed when there’s nothing else to be executed in the mainloop. The absolute majority of the time, it means that only one GC will happen, even if hundreds of GObjects are disposed. I’ve spotted in the wild it happening twice. This fix is strictly specific to GObjects wrapped by GJS; all other kinds of memory management, such as strings and whatever else, aren’t affected by this fix. Together with this patch, an accompanying solution landed that reduces the number of objects with a toggle reference.
This obviously needs more testing on a wider ranger of hardwares, specially on lower ends. But, quite honestly, I’m personally sure that this apparently small performance penalty is compensated by the memory management gains.
While the previous section covered my side of this history, there are a few other contributors that did a great job, and I think it would be unfair with them if their work was not properly highlighted.
Red Hat’s Carlos Garnacho published two merge requests for GJS that, in my testing, substantially improved the smoothness of GNOME Shell. The first one changes the underlying data structure of JS objects, which allows us to stop using an O(n) algorithm and starting an O(1) one. The second one is particularly interesting, and it yields the most noticeable improvements in my computer. Gross, it vastly reduces the number of temporary memory allocations. He also has a number of patches on Mutter and GNOME Shell.
Another prominent contributor regarding performance is Canonical’s Daniel van Vugt, which helped early testing the GJS patches, and is doing some deep surgeries in Mutter to make the rendering smoother.
And for every great contributor, there is a great reviewer too. It would be extremely unfair if those relevant people haven’t had their work valued by the community, so please, take a moment to appreciate their work. They deserve it.
At this point, hopefully the cautious reader will have at least a superficial knowledge on the problem, the solution, and other relevant work around the performance topic. Which is good – if I managed to communicate that well enough, by the time you finish reading this blog post, you’ll have more knowledge. And more knowledge is good.
You can stop here if you want nothing more than technical knowldedge.
Well, I’d like to raise an interesting discussion about how people reacted to the memory leak news, and reflect upon that. By reading the repercussions of the news, I found it quite intriguing to read comments like these:
As a regular contributor for the last few years, this kind of comment sound alien to me. These comments sound completely disconnected to the reality of the development process of GNOME. It completely misses the individuality of the people involved. Maybe because we all know each other, but it is just plain impossible to me to paint this whole community as “they”; “GNOME developers”; etc. To a deeper degree, it misses the nuances and the beauty of community-driven development, and each and every individual that make it happen.
To some degree, I think this is a symptom of users being completely disconnected to GNOME development itself.
It almost feels like there’s a wall between the community and the users of what this community produces. Which is weird. We are an open community, with open development, no barriers for new contributors – and yet, there is such a distance between the community of users and the community of developers/designers/outreachers/etc.
Is that a communication problem from our side? How can we bridge this gap? Well, do we want to bridge this gap? Is it healthy to reduce the communication bandwidth in order to increase focus, or would it be better to increase that and deal with the accompanying noise?
I would love to hear your opinions, comments and thoughts on this topic.
Last week, when I upgraded to GNOME 3.28, I was sad to notice an extremely annoying bug in Mutter/GNOME Shell: every once in a while, a micro-stuttering happened. This was in additions to another bug that was disappointing me for quite a while: the tiling/maximize/unmaximize animations were not working on Wayland too.
About the former, it may not look like the end of the world, but trust me when I say that a split-second delay every ~10s is the perceived difference of a butter smooth and a trashy experience.
Of course, this is free software, we are free people, and I have this habit of fixing up whatever is bothering me. Naturally, I decided to fix them. I decided to document my journey for people that want to try Mutter/GNOME Shell development be less scared.
While testing these changes, I obviously needed to run Mutter and see what was happening. Since we’re talking about Wayland, I was specifically interested in seeing which messages were being sent by the application and which message Mutter was receiving.
To dump the Wayland calls made by an application, we can just use the WAYLAND_DEBUG env var, like this:
$ WAYLAND_DEBUG=1 <application>
This should dump a lot of information into the terminal. This might or might not be useful to you.
One obvious way to test changes in Mutter is to build and install Mutter system-wide, then reboot. Rebooting takes almost 5 minutes to me. Clearly not a good approach. But Mutter has a nested mode where it can run inside another Mutter session.
To run a nested Mutter Wayland session:
$ mutter –nested –wayland
If your changes are making Mutter crash, you might want to run it with GDB. But Mutter is built with Autotools, which of course makes every single thing more complicated than it should be. You’ll notice that src/mutter is not an executable, but a wrapper script. To run Mutter under GDB, do that:
$ libtool –mode=execute gdb mutter
> r –nested –wayland
This will open a window with your new raw Mutter session. To run any graphical application against this new nested Mutter, as long as the toolkit supports Wayland, run:
Notice that Mutter is accepting whatever the new geometry is. It doesn’t check if the new geometry differs from the current. When the geometry doesn’t change, we should not report anything to the compositor. If the compositor is GNOME Shell, things get even worse: we go through the JS trampoline, which is slow, when could have avoided it.
Apparently, GTK reports geometry changes even when they don’t happen, e.g. when hovering whatever area of the window. Every single one of these hundreds of geometry changes that didn’t actually change per second would go though IPC to Mutter, which will mindlessly jump into the compositor’s JS trampoline just to do… nothing. Because the geometry didn’t actually changed.
The second point that was actually freaking me out was that Mutter was waking up my discrete GPU quite often. On a PRIME system, this means the GPU is put to sleep after a few seconds without being used. Every wakeup would produce an incredibly annoying stutter.
This was only happening on Wayland.
After further investigation, I came up with this temporary fix until Mutter becomes smarter about how it should handle GPUs. This one is already merged, and will be available on the next GNOME 3.28 release!
It’s been a long time I don’t write here. These past months were excruciatingly busy and intense, and lots of things happened but I didn’t manage to keep up with the blog posts. I’ll try to condense everything that happened and is still happening and will happen here.
Calendar & To Do
I spent a good part of January polishing and fixing bugs in Calendar and To Do. Just to name a few:
The support for weather forecast in Calendar was polished.
Calendar’s codebase was modernized and cleaned up. This has no user-visible side effects (except, of course, the bugs that are avoided because of that), but maintaining a clean and modern codebase is absolutely essential to keep the project healthy, the maintainers motivated, and the new contributors excited.
Many warnings and crashes were fixed.
GNOME To Do
The Todoist integration was reworked, and is much more stable and functional now. More improvements will land before 3.28, but this was already a remarkable rework.
The Todo.txt integration also received some attention, but is not yet where I want it to be. The support for subtasks was temporarily dropped until we figure out a way to implement it correctly. If anyone knows something about it, please comment below.
The Flatpak support matured a lot in the past few days, and now the Flatpak Nightly version will enable tracing by default. This will simplify the lives of users that want to test it and report bugs; and maintainers (read: me) that want to fix stuff before the stable release.
I’m feeling a bit pressured to put these apps in a good shape for GNOME 3.28, specially To Do, since it was selected to be installed by default on Ubuntu and I don’t want the new users having a bad and unstable experience. I also don’t want to deal with hundreds of possibly bugs after the release.
Settings (aka Control Center)
I’ve been working a lot recently on GNOME Settings, and reviewed (quite literally) more than a hundred patches in the past couple of weeks. Lots of interesting stuff landed:
GNOME Settings switched to Meson. The build times were cut down by a factor of 5, it is amazing!
A new Background panel is in the works, and appearently reaching a good state. Hopefully it’ll be ready before 3.28.
A new privacy option is about to be added (we’re just figuring out the wording) that blocks phoning home to detect the network status. Privacy-aware users will enjoy that new option.
Lots of smaller cleanups and code refactorings.
Now, something happened to Settings these days; it lost its maintainer. I’ve been trying to act as a maintainer during this blackout, and I’d be happy to continue doing that. Fortunately, there are many other heroes involved (shouts to Bastien Nocera, Debarshi Ray, Robert Ancell, Julian Sparber, Ondrej Holy, and many others for your contributions and being great maintainers.)
Hopefully Settings is already in a good shape for 3.28, and will get even more solid in the following weeks.
A New Master
Big news: I’ve finished writing my Masters’ thesis, and it’s over now. It was a hell of a ride, and seeing in retrospective, I think enrolling a Masters did me more harm than good.
I’m finishing it with a bittersweet taste in my mouth; I’ve learned a lot, but, for many reasons, it was a bad experience overall that led me to a few burnouts and episodes of night terror and depression. I made the mistake of not stopping when I should, and advancing in this shitswamp had pretty catastrophic implications, including physical ones (in one of these nights of terror, I cut my own hand with a knife, and it was painful to use the computer for a couple of weeks.)
It is now over, and I’ll need to recover from the past 2.5 years. Which leads us to…
But hey, it’s only for a month! I’ll be taking some weeks off and disconnecting from everything (including GNOME, Endless, family and everything else), a time that I’ll spend backpacking through some places around the world. I’ll refrain to tell where since I want to avoid being recognized (it’s not like I’m famous, but who knows!). I’ll be with my wife, and only her, during this period.
I just hope my apps don’t fall apart during this time. For someone who is routinely connected and helping others on IRC channels, disconnecting will be an interesting experience; perhaps agonizing in the first few days, but only time will tell.
It’s been a long time with no news. I guess work and masters are really getting in the way… good news is that I’ll finish masters in 2 months, and will have some free time to devote to this beloved project.
“Bad” news is that, after almost 6 years, I’ll finally take some time to have a real vacation. I’ll stay 3 weeks out of the loop in February, a time where I’ll be traveling to the other side of the world, watching the sunset at the beach with my wife. Without a computer. While it’s unfortunate to the community, I think this time is necessary for my mental health – I’ve gone way too many times through the almost-burned-out state recently.
But even with all of these thing in our way, thanks to the help of awsome old and new contributors, Calendar and To Do received a lot of new features!
Lets begin with my beloved Calendar. My focus for the past weeks was rewriting the Month view. It was a hard, painful process, but I can say for sure now that, of the very few responsive widgets in GNOME, the Month view is the best one! 😛
The most substantial changes were:
The day numbers are at the top of each cell now. This is thanks to the hard design work of Allan Day, Jean-François and Lapo.
Each cell now only shows the overflow button when absolutely necessary. When implementing this new behavior, a few longstanding issues were fixed.
The Month view now finally has a fully working, sane code to deal with RTL languages.
When clicking the +N button, the cell “zooms in” and display the list of events. This is a big design improvement over the popovers that we were using.
Code-wise, the Month view code that position the events is an order of magnitude simpler and easier to read. It may sound like a purely technical matter, but it has user-visible effects too: easier, cleaner code means more features and less issues in the future.
Of course, no words can make people as excited as a sequence of pictures! Lets check this out:
The animations were implemented usuing the animation framework in libdazzle, all thanks to Christian Hergert’s work on GNOME Builder. Kudos!
For the next cycle, thanks to the hard work of a new and awsome contributor Florian Brosch, this is what’s coming next:
We’re on track to land the features that were proposed for this cycle. You can check out the plans at the Roadmap page of Calendar. You can also get help us with these tasks with design, code and testing!
GNOME To Do also received a lot of attention already. We’re going through a big redesign, thanks to the leading design work of Tobias Bernard, and the results are already gratifying.
The immeditaly noticeable change is the tasklist view:
The rows are entirely draggable now. I’ll continue working on these features, but more importantly, I want people to take some of this work over and contribute to the project!
Talking about managing tasks, GNOME To Do was moved to GitLab! I can’t state how much of an improvement it is over the previous Bugzilla approach. We now have an updated and organized Kanban Board:
The reason for that is to have a consolidated workflow:
A designer moves the task to “Design” column and works on it.
Once design is settled, a developer moves the task to the “Development” column and fixes/implements the task.
When the task if implemented, the developer moves the task to “Code Review” column, and a maintainer will review the code.
Once the code is reviewed and the code landed, the task is moved to the “QA” column, where a tester will pick up and test it.
When all the regressions and issues of that task are fixed, the task is closed
So far, the experience with this workflow has been outstanding. We were able to find out much more bugs due to QA being a first-class citizen in the process. Filing bugs is now a breeze too! There are bug templates already available, and I took the burden and made a colossal cleanup and organization of the bug list:
While many of these changes are super exciting, this is just the first part of the cycle. There are much more to work on, and the more people get involved, the more we will accomplish. Things are moving in a fast pace, and I’m incredibly happy with the direction of these projects.
To help pushing community involvement, I went ahead and wrote a page describing how can you help testing. With Flatpak, this is ridiculously easy – and yet, absolutely necessary! So, don’t hesitate to get in touch and help us shaping the next GNOME version.
A late night announcement: the improved tiling patches (shown in a previous blog post) were merged in Mutter and and GTK+3, and will be available in GNOME 3.26.1 / GTK 3.22.23 (not yet released; should be available this week).
I’d like to thank Florian Muellner, Matthias Clasen, Jonas Adahl and AlexGS for all their support, time, code reviews and testing.
I’m a fan of productivity. It is not a coincidence that I’m the maintainer of Calendar and To Do. And even though I’m not a power user, I’m a heavy user of productivity applications.
For some time now, I’m finding the overall experience of GNOME To Do clumsy and far from ideal. Recently, I received a thank you email from a fellow user, and I asked they what they think that could be improved.
It was not a surprise when they said To Do’s interface is clumsy too.
That motivated me to experiment and bother our designers about ways to improve GNOME To Do. With the great help of Tobias Bernard, a super awsome contributor, we could figure out a way to improve the current situation.
Opaque Task Rows
One of the problems of GNOME To Do was the translucent task rows. Priorities would be semi-transparent colors applied on top of transparent rows.
Of course this mess could lead to things like this:
After some investigation, a lot of experimentation and feedback from multiple design team members, we could come up with this:
I personally think this is a small, but huge improvement over the previous state. When you have to stare at tasklists for hours, the minor annoyances are what causes the biggest frustrations.
Another big aspect of To Do that was the task editor panel. This was initially made based on some old mockups, but this proved to not be the ideal experience.
The biggest problem was that there were no connection between the editor and the task. Of course there is an arrow pointing to the task row, but consider that:
The task title is edited in the task row
All other fields are edited in the side panel
The arrow might now be obvious to spot
The real representation of the task was the row, not the panel
So Tobias suggested me inline editing of tasks. I went ahead and implemented it, and the result looked actually very good!
The necessary width was reduced, and now the window can be shrinked to small sizes. And it works nicely on Dark Themes too:
This work already landed on master, and will be part of GNOME To Do 3.28. And, of course, our traditional sequence of images:
Any comments? Thoughts? Please let me know in the comments! And don’t ever forget, you can always get involved – you just need to get in touch, and join us at #gnome-todo at irc.gnome.org.