Leak Hunting and Mutter Hacking

Greetings GNOMErs!

Last week, when I upgraded to GNOME 3.28, I was sad to notice an extremely annoying bug in Mutter/GNOME Shell: every once in a while, a micro-stuttering happened. This was in additions to another bug that was disappointing me for quite a while: the tiling/maximize/unmaximize animations were not working on Wayland too.

About the former, it may not look like the end of the world, but trust me when I say that a split-second delay every ~10s is the perceived difference of a butter smooth and a trashy experience.

Of course, this is free software, we are free people, and I have this habit of fixing up whatever is bothering me. Naturally, I decided to fix them. I decided to document my journey for people that want to try Mutter/GNOME Shell development be less scared.

Animations

I decided to start working on the animations, since there was a comment written by Jonas Ådahl to bug 780292 that was a leading clue to whatever the issue was. Time to open GNOME Builder, and clone Mutter.

Mutter + Wayland + (Mutter + Wayland + (App))

While testing these changes, I obviously needed to run Mutter and see what was happening. Since we’re talking about Wayland, I was specifically interested in seeing which messages were being sent by the application and which message Mutter was receiving.

To dump the Wayland calls made by an application, we can just use the WAYLAND_DEBUG env var, like this:

$ WAYLAND_DEBUG=1 <application>

This should dump a lot of information into the terminal. This might or might not be useful to you.

One obvious way to test changes in Mutter is to build and install Mutter system-wide, then reboot. Rebooting takes almost 5 minutes to me. Clearly not a good approach. But Mutter has a nested mode where it can run inside another Mutter session.

To run a nested Mutter Wayland session:

$ mutter –nested –wayland

If your changes are making Mutter crash, you might want to run it with GDB. But Mutter is built with Autotools, which of course makes every single thing more complicated than it should be. You’ll notice that src/mutter is not an executable, but a wrapper script. To run Mutter under GDB, do that:

$ libtool –mode=execute gdb mutter

(gdb opens)

> r –nested –wayland

This will open a window with your new raw Mutter session. To run any graphical application against this new nested Mutter, as long as the toolkit supports Wayland, run:

$ WAYLAND_DISPLAY=wayland-1 <application>

Inspecting Mutter and Wayland

This was the trickiest part to me. Mutter has some env vars to control debugging, but I could not use them properly. Either it would dump too much info, or nothing useful at all.

I then decided to go the dumb way and just add dozens of prints around the code.

If you’re aware of any better way to do that, please leave a comment!

The Issue

The root of the issue was in this function:

static void
zxdg_surface_v6_set_window_geometry (struct wl_client   *client,
                                     struct wl_resource *resource,
                                     int32_t             x,
                                     int32_t             y,
                                     int32_t             width,
                                     int32_t             height)
{
  MetaWaylandSurface *surface = surface_from_xdg_surface_resource (resource);

  surface->pending->has_new_geometry = TRUE;
  surface->pending->new_geometry.x = x;
  surface->pending->new_geometry.y = y;
  surface->pending->new_geometry.width = width;
  surface->pending->new_geometry.height = height;
}

Can you spot the issue here?

Look again.

Notice that Mutter is accepting whatever the new geometry is. It doesn’t check if the new geometry differs from the current. When the geometry doesn’t change, we should not report anything to the compositor. If the compositor is GNOME Shell, things get even worse: we go through the JS trampoline, which is slow, when could have avoided it.

Apparently, GTK reports geometry changes even when they don’t happen, e.g. when hovering whatever area of the window. Every single one of these hundreds of geometry changes that didn’t actually change per second would go though IPC to Mutter, which will mindlessly jump into the compositor’s JS trampoline just to do… nothing. Because the geometry didn’t actually changed.

This was fixed by this commit.

Stuttering

The second point that was actually freaking me out was that Mutter was waking up my discrete GPU quite often. On a PRIME system, this means the GPU is put to sleep after a few seconds without being used. Every wakeup would produce an incredibly annoying stutter.

This was only happening on Wayland.

After further investigation, I came up with this temporary fix until Mutter becomes smarter about how it should handle GPUs. This one is already merged, and will be available on the next GNOME 3.28 release!

Memory Leak

Oh, dear, the infamous memory leak… I’ll just leave this link to the GitLab comment. Go figure.

Advertisements

6 thoughts on “Leak Hunting and Mutter Hacking

  1. Aside from your fix to check if any geometry changes have occurred, shouldn’t surface->pending->has_new_geometry = TRUE; be called after the assignments to new_geometry? Just assuming that its conditional check somewhere else in the code may not be a sequential operation and cause a race condition.

    Like

    1. Since I was a little more curious tonight than usual, and yes I know I shouldn’t use a blog comment system for this but I wanted to see where has_new_geometry was being used… That brought me into meta_wayland_zxdg_surface_v6_commit() and I see another possible bug near its end.

      if (!meta_rectangle_equal (&new_geometry, &priv->geometry))
      {
      pending->has_new_geometry = TRUE;
      priv->geometry = new_geometry;
      }

      it should be either…

      pending->geometry = new_geometry;
      pending->has_new_geometry = TRUE;

      and be used on the next commit, or….

      priv->geometry = new_geometry;
      priv->has_set_geometry = TRUE;

      because as it stands now, when the next commit occurs the pending status will have has_new_geometry set to true, but its new_geometry will not contain the calculated geometry.

      Like

  2. Thanks for you work! Could you also have a look at post-resume performance? That’s my current most annoying issue with gnome-shell/mutter’s performance. Very often after resume everything is choppy (moving windows, switching workplaces, scrolling in firefox). It feels like 20 FPS instead of 60 FPS (but that’s just my feeling, I don’t know how to measure that). If you use X11, restarting gnome-shell using “r” makes everything smooth again. If you use Wayland, you’re screwed and must log out. This happens on all my computers (intel and amd graphics) and also to my colleagues, so it shouldn’t be hard to reproduce, it really seems a universal problem.

    Like

  3. Please pat yourself on the back man. With this blog post you really show how you have ventured into places that many has never dared. I can only imagine how painful these memory/stutter issues must be to debug, but you should know that for every one of those fixes you landed, a 1000 users might experience performance improvement. For every insight you share here and in bug reports, you might inspire 2 other developers to start looking into the same issue. Your heroic effort really deserves a lot of respect. I look forward to hear from your next adventure already!! 🙂

    Like

  4. I really want to thank you for the work you’re doing on Gnome! It takes a lot of effort and dedication to go to such lengths to find these bugs and improve the desktop, you’re doing incredible work for the whole community.

    I came across this project on Github that helps find memory leaks in Gjs code. I’m not sure how helpful it would be to you here, but I thought it would be a good idea to share it in case it makes your work that bit easier: https://github.com/andyholmes/gjs-heapgraph

    Godspeed 😀

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s