This week (November 8th – 12th) is the Endless Orange Week, a program where the entire Endless team engages in projects designed to grow our collective learning related to our skills, work and mission.
My project for this program was improving XDG portals. I set myself out to work on the following problems:
- Improve the ScreenCast portal by introducing a new feature to restore previous screencasts.
- Add an portal-based audio access mechanism
- Modernize libportal
Let’s have a look at what these features are, and what’s the progress.
Portal-based Screen Casting
One of the portals introduced for applications to capture the contents of a monitor or a window was the ScreenCast portal. This portal has gained some relevance, since it is the primary – and in most cases, the only – way to capture windows and monitors on Wayland. This is the mechanism used by OBS Studio, Chromium, Firefox, etc.
This portal has a well defined set of steps centered around “sessions”:
- Applications initiate the process by asking the ScreenCast portal to create a screencast session. The portal replies with a handler for this session
- Applications then configure this session. They set if they want monitors or windows or both; the way they’d like to receive cursor (metadata, embedded in each frame, or hidden); and whether they allow multiple sources or not. The portal replies acknowledging this configuration.
- Finally, applications start the screencast session. This is when a dialog shows up asking you to select a window or a monitor. The portal replies with a list of streams. Each stream has a corresponding PipeWire node, width and height, and position.
This process is repeated every time an application wants to screencast. It’s a robust series of steps, and has served us well so far, but having to select a monitor or window every time can be a frustrating experience.
For some use cases, this process is problematic. Take Steam’s recent introduction of PipeWire-based Remote Play: the whole purpose of this feature is to allow playing remotely, potentially without physical access to your computer. Evidently, in this case, showing a dialog to select a monitor is not going to work if the person is probably not in front of the machine.
This is where my new proposal to the ScreenCast portal comes in.
The mechanism proposed there is composed of two new properties: (i) a persist mode, where applications can tell the portal that they want to restore this screencast session later; and (ii) a restore token to restore a previous screencast session.
In summary, when configuring (step 2) a screencast session, applications can tell the portal “hey, I’d like to restore this session later”; in this case, after you select a monitor or window and start the stream (step 3), the portal will give the app what I called a restore token. Applications should store this token however they want (ideally using the platform’s preferred preferences systems, such as GSettings for GNOME).
Applications that have a restore token should use them when configuring the screencast session (step 2). The portal will receive this token, and try to restore the previous session’s windows and monitors. If that fails, e.g. when you changed monitors or the windows is not open, the selection dialog is presented again. From the application’s perpective, it doesn’t know (nor does it matter) if the previous session is restored or not, as the application will receive a list of streams and PipeWire nodes regardless of what it happens.
If you look closely, it is possible to endlessly restore the same session by passing both the token to restore a previous session, and asking the portal to restore it again later.
I was not confident of this mechanism when proposing it, but after implementing it on the GNOME portals, I realized this mechanism is actually really robust. It allows for very sophisticated features to be implemented by portal implementations without any additional property.
My initial implementation is merely a “Remember this decision” checkbox, purely to test this feature, so this will change soon. After discussing with peers both in the design and development fronts, a few ideas on how to manage these permissions came up:
- Revoking these permissions after a reasonably long period of time – 6 months, or more, or less. That can help preventing applications to have access to a certain window or monitor forever.
- Improve the indication of ongoing screencasts. GNOME currently shows an orange icon in the titlebar, but we’re considering other alternatives such as notifications when starting and stopping streams, adding a minimum visibility time to the indicator, etc.
- Add a way to manage these permissions in GNOME Settings, either with new panel under the Privacy group, or a new section inside the Applications panel.
The goal behind these ideas is allow fine control over what is being shared with who, and when, without being annoying. There is a lot to do in this front, but I’m hopeful that most use cases will be covered.
And here it is, a proof-of-concept that this idea works:
Another area that is a bit lacking in the portals front is device access. We currently have the Device portal, with which applications can ask for permission to access specific devices such as cameras, speakers, and microphones.
PipeWire has some support for this type of device access in place, but currently only for cameras. We need the same kind of access control for microphones and speakers, specially microphones, and that’s what I’ve been working on during the week.
I initially approached this by proposing a new Audio portal, which was basically a copy-pasted version of the Camera portal. The basics of these portals are:
- Applications request audio access, which queries permissions for using audio devices; applications should say whether they want to use speakers, or microphones, or both.
- Portals potentially ask you if you would allow that application to access these devices.
- Application gets a PipeWire connection with only the allowed devices exposed.
This is exactly how the Camera portal operates. Peer reviews, however, raised one very important point: there are situations where we can’t have separate PipeWire connections for cameras and microphones. So we had to come up with another solution that could satisfy these requirements.
In the end, after some rounds of discussions, we agreed on making this live in the Device portal. This is still under review, and things may change, but it seems to me that we found the best course of action with this last round of discussions.
What does that mean in terms of UX? Not much, I’m afraid, except that people would be able to have greater control over what sandboxed applications can access. You could, for example, prevent a proprietary app you don’t trust from accessing your cameras and microphones.
libportal is a small library written to help application developers interact with portals without having to manually call the D-Bus APIs. Some of these D-Bus APIs can be quite verbose and complicated to deal with, and libportal abstracts that away in a nice and simple way.
At last, libportal’s CI now produces Flatpak bundles for the GTK3, GTK4, and Qt 5 test apps (#55). This should make it easier to ask for people to test new changes, since they can just download a Flatpak bundle and test it.
I think this was a successful week, there was significant progress in these fronts and half of these pull requests were merged already. I was hoping that the other ones could land before the week ends; sadly they didn’t, but we’re very close to that. In special, the screencast session restore work is just in a stone’s throw from landing!
Thanks Endless OS Foundation for allowing me to work on portals as part of the Endless Orange Week program!